CS 6501 Natural Language Processing (Spring 2024)

Logistics

Instructor: Yu Meng (yumeng5[at]virginia[dot]edu)
Teaching Assistants: Afsara Benazir (hys4qm[at]virginia[dot]edu) Zhepei Wei (tqf5qb[at]virginia[dot]edu)
Time: Mondays & Wednesdays 3:30pm - 4:45pm
Location: Mechanical & Aerospace Engineering Building 339

Course Overview

This advanced graduate-level course offers a comprehensive exploration of cutting-edge developments in the field of natural language processing (NLP). With Large Language Models (LLMs) serving as the foundation for state-of-the-art NLP systems, we will cover various topics aiming at gaining a better understanding of LLMs’ design, capabilities, limitations, and future prospects. Key areas include model architecture and design, training methodologies (e.g., pretraining, instruction tuning, RLHF), emergent capabilities (e.g., in-context learning, reasoning), parametric knowledge with retrieval-augmented generation, efficiency (e.g., parameter-efficient training, sparse methods), language agents, and ethics. This course will be highly research-driven with a substantial focus on presenting and discussing important papers and conducting research projects.

Grading

Paper Presentation (30%) (Signup Sheet): In each lecture starting from the 3rd lecture, a group of 2 or 3 students will be tasked to present 4 papers selected for the topic they signed up for. The primary objective is to impart knowledge to the rest of the class. The presentation duration is strictly limited to 60 minutes, followed by a 10-minute question-and-answer session with the audience. The presentation will be assessed based on the following criteria (everyone from the same presentation group receives the same score):
- Clarity: Whether the presentation effectively communicates the core concepts and insights contained in the papers.
- Completeness: Whether the presentation adequately covers the essential messages in the 4 assigned papers within the allocated timeframe.
- Teamwork: Whether the presentation is prepared and delivered by all team members.
- Question answering: Whether the presenters effectively answer the questions raised by the audience.
Tips for presentation preparation: It is not necessary to cover every detail of the papers; rather, emphasis should be placed on conveying general ideas and insights: For theoretical papers, you don’t need to go over each proof in detail, but need to explain the major conclusions/insights of the theoretical analysis. For empirical papers, you don’t need to present every piece of experiment results, but need to articulate how the empirical findings support the major claims. A good presentation should highlight
- The major contributions of the paper
- Why these contributions are deemed important (e.g., did they reveal any previously unknown facts or change people’s opinions on a widely acknowledged phenomenon?)
- The most important technical details (e.g., the motivation behind a new training objective/model architecture design and the corresponding implementation)
- The limitations of the work and how they might be addressed in the future
Deadline for slides submission: Send your slides to the instructor and TAs via email at least 48 hours before your presentation (e.g., if presenting on Monday, slides should be submitted by Saturday 3:30 pm). You will receive feedback from the instructor to improve your slides, and if necessary, the instructor may schedule a meeting with your team to go over the slides. Late submissions after the deadline will result in a 50% presentation grade deduction.
Participation (20%): For each lecture starting from the 3rd lecture, everyone is required to complete the following two mini-assignments (regardless of whether or not you have signed up to present for that lecture):
- Pre-lecture question: Your task is to read the 4 papers to be introduced in the lecture, and submit a question you have when you read them. The question must not be trivial (e.g., “Does the proposed method work?” / “What is the aim of the paper?”). You are also welcome to raise these questions in the 10-minute question-and-answer session during class. The deadline is one day before the lecture (e.g., For Monday lectures, you need to submit the question by Sunday 11:59 pm).
- Post-lecture feedback: After attending the lecture, you are required to provide feedback to the presenters. You should comment on the clarity, completeness, depth, etc. Your feedback doesn’t have to be long, but should be specific and constructive. The deadline is each Friday (both Monday & Wednesday feedback is due Friday 11:59 pm).
Project (50%): By the end of this course, you are required to complete a research project, present your results, and submit a project report. You are required to work in a team of 2 or 3 (any deviation from this team size requires prior approval from the instructor). There are two acceptable project types:
- A comprehensive survey report: The survey should carefully examine and summarize existing literature on a topic covered in this course, and provide detailed and insightful discussions on the unresolved issues, challenges, and potential future opportunities within the chosen topic.
- A hands-on project: The project is not constrained to the course topics but must be centered around NLP. The project does not have to involve large language models either. For example, you may choose to train or analyze smaller-scale language models for specific tasks. You are eligible to receive extra credits if the final project reaches a publishable state.
The project grading breakdown (50%) is as follows:
- Project proposal (Guideline): 5% (Deadline: 2/5)
- Mid-term report (Guideline): 10% (Deadline: 3/13)
- Final presentation (Deadline: 4/24; Guideline) and final report (Deadline: 5/8; Guideline): 35%

Schedule

Date	Topic	Papers	Slides	Supplemental Reading
Introduction to Language Models
1/17	Course Overview	-	overview	-
1/22	Language Model Architecture and Pretraining	* Distributed Representations of Words and Phrases and their Compositionality (word2vec) * Attention Is All You Need (Transformer) * Language Models are Unsupervised Multitask Learners (GPT-2) * BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding * RoBERTa: A Robustly Optimized BERT Pretraining Approach * ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators * BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension * Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (T5)	lm_basics	* (Blog) The Illustrated Transformer * (Blog) Transformer Inference Arithmetic
1/24	Large Language Models and In-Context Learning	* Language Models are Few-Shot Learners (GPT-3) * Llama 2: Open Foundation and Fine-Tuned Chat Models * An Explanation of In-context Learning as Implicit Bayesian Inference * Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?	llm_icl	* (Blog) Llama 2: an incredible open LLM * (Tech Report) GPT-4 Technical Report
1/29	Model Calibration	* How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering * Surface Form Competition: Why the Highest Probability Answer Isn't Always Right * Teaching Models to Express Their Uncertainty in Words * Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation	calibration	* (Blog) Calibrating LLMs * (Paper) Calibrate Before Use: Improving Few-Shot Performance of Language Models
1/31	Scaling and Emergent Ability	* Training Compute-Optimal Large Language Models * Scaling Data-Constrained Language Models * Emergent Abilities of Large Language Models * Are Emergent Abilities of Large Language Models a Mirage?	scaling	* (Blog) Scaling Laws and Emergent Properties * (Blog) Are the emergent abilities of LLMs like GPT-4 a mirage?
Reasoning with Language Models
2/5	Chain-of-Thought Generation	* Chain-of-Thought Prompting Elicits Reasoning in Large Language Models * Least-to-Most Prompting Enables Complex Reasoning in Large Language Models * Self-Consistency Improves Chain of Thought Reasoning in Language Models * Large Language Models Can Self-Improve	cot	* (Blog) Comprehensive Guide to Chain-of-Thought Prompting * (Paper) Large Language Models are Zero-Shot Reasoners
2/7	Advanced Reasoning	* PAL: Program-aided Language Models * Tree of Thoughts: Deliberate Problem Solving with Large Language Models * Solving Quantitative Reasoning Problems with Language Models * Let's Verify Step by Step	adv_reasoning	* (Blog) Tree of Thoughts (ToT) * (Blog) Minerva: Solving Quantitative Reasoning Problems with Language Models
Knowledge and Factuality
2/12	Parametric Knowledge in Language Models	* Language Models as Knowledge Bases? * How Much Knowledge Can You Pack Into the Parameters of a Language Model? * Transformer Feed-Forward Layers Are Key-Value Memories * Locating and Editing Factual Associations in GPT	knowledge	* (Paper) Editing Factual Knowledge in Language Models * (Paper) Fast Model Editing at Scale
2/14	Retrieval-Augmented Language Generation (RAG)	* Generalization through Memorization: Nearest Neighbor Language Models * Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks * Dense Passage Retrieval for Open-Domain Question Answering * Improving language models by retrieving from trillions of tokens	rag	* (Paper) REPLUG: Retrieval-Augmented Black-Box Language Models * (Paper) Lost in the Middle: How Language Models Use Long Contexts
Language Model Alignment
2/19	Multi-Task Instruction Tuning	* Finetuned Language Models Are Zero-Shot Learners * Multitask Prompted Training Enables Zero-Shot Task Generalization * Cross-Task Generalization via Natural Language Crowdsourcing Instructions * Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks	multitask	(Blog) A Stage Review of Instruction Tuning
2/21 [Guest Lecture] Shunyu Yao (Princeton): Language Agents: From Next Token Prediction to Digital Automation
2/26	Chat-Style Instruction Tuning	* Self-Instruct: Aligning Language Models with Self-Generated Instructions * LIMA: Less Is More for Alignment * AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback * Self-Alignment with Instruction Backtranslation	chat_instruction	* (Blog) Teach Llamas to Talk: Recent Progress in Instruction Tuning * (Paper) How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
2/28	Reinforcement Learning from Human Feedback (RLHF)	* Training language models to follow instructions with human feedback * Direct Preference Optimization: Your Language Model is Secretly a Reward Model * Fine-Grained Human Feedback Gives Better Rewards for Language Model Training * Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback	rlhf	* (Blog) Illustrating Reinforcement Learning from Human Feedback (RLHF) * (Blog) Preference Tuning LLMs with Direct Preference Optimization Methods
3/2 - 3/10 (Spring Recess, No Class)
Language Model Agents
3/11	Task Execution via Reasoning, Tools and Conversations	* ReAct: Synergizing Reasoning and Acting in Language Models * Toolformer: Language Models Can Teach Themselves to Use Tools * AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation * Reflexion: Language Agents with Verbal Reinforcement Learning	agent	* (Blog) ReAct: Synergizing Reasoning and Acting in Language Models * (Blog) Breaking Down Toolformer * (Blog) Superpower LLMs with Conversational Agents
3/13	Language Models for Code	* InCoder: A Generative Model for Code Infilling and Synthesis * Code Llama: Open Foundation Models for Code * Teaching Large Language Models to Self-Debug * LEVER: Learning to Verify Language-to-Code Generation with Execution	code_lm	* (Blog) Large Language Models for Code Generation – Part 1 * (Blog) Cracking the Code LLMs
3/18	Multimodal Language Models	* Flamingo: a Visual Language Model for Few-Shot Learning * VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks * Visual Instruction Tuning (LLaVA) * NExT-GPT: Any-to-Any Multimodal LLM	multimodal	* (Blog) Fuyu-8B: A Multimodal Architecture for AI Agents * (Blog) Understanding LLaVA: Large Language and Vision Assistant
3/20 [Guest Lecture] Zhaofeng Wu (MIT): Generalization in the LLM Era
Efficient Language Modeling
3/25	Sparse Models	* Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity * Longformer: The Long-Document Transformer * Efficient Streaming Language Models with Attention Sinks * SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot	sparse	* (Tech Report) Mixtral of Experts * (Blog) Mixture of Experts Explained
Ethical Considerations and Evaluations of Language Models
3/27	Privacy and Legal Issues	* Extracting Training Data from Large Language Models * Large Language Models Can Be Strong Differentially Private Learners * Quantifying Memorization Across Neural Language Models * SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore	privacy	* (Blog) Privacy in the age of generative AI * (Blog) Extracting Training Data from ChatGPT
4/1	Security and Jailbreaking	* DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models * Universal and Transferable Adversarial Attacks on Aligned Language Models * Poisoning Language Models During Instruction Tuning * GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher	security	* (Blog) Jailbreaking Large Language Models: Techniques, Examples, Prevention Methods * (Blog) Adversarial Attacks on LLMs
4/3	Bias and Mitigation	* RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models * Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP * Red Teaming Language Models with Language Models * Whose Opinions Do Language Models Reflect?	bias	* (Blog) Understanding and Mitigating Bias in Large Language Models (LLMs) * (Blog) Navigating The Biases In LLM Generative AI: A Guide To Responsible Implementation (LLMs)
4/8, 4/10 (No Class)
4/15 [Guest Lecture] Caleb Ziems (Stanford): Can Large Language Models Transform Computational Social Science?
4/17 [Guest Lecture] Tianyu Gao (Princeton): Long-Context Language Modeling with Parallel Context Encoding
4/22 [Guest Lecture] Chenyan Xiong (CMU): Parallel Pretraining for Large Language Models [Slides]
4/24, 4/29 Project Presentations

Useful Materials

For NLP background: Speech and Language Processing
For deep learning background: Deep Learning

Yu Meng (孟瑜)