CS 6501 Natural Language Processing (Spring 2024)

Logistics

Course Overview

This advanced graduate-level course offers a comprehensive exploration of cutting-edge developments in the field of natural language processing (NLP). With Large Language Models (LLMs) serving as the foundation for state-of-the-art NLP systems, we will cover various topics aiming at gaining a better understanding of LLMs’ design, capabilities, limitations, and future prospects. Key areas include model architecture and design, training methodologies (e.g., pretraining, instruction tuning, RLHF), emergent capabilities (e.g., in-context learning, reasoning), parametric knowledge with retrieval-augmented generation, efficiency (e.g., parameter-efficient training, sparse methods), language agents, and ethics. This course will be highly research-driven with a substantial focus on presenting and discussing important papers and conducting research projects.

Grading

Schedule

DateTopicPapersSlidesSupplemental Reading
Introduction to Language Models
1/17Course Overview-overview
-
1/22Language Model Architecture and Pretraining * Distributed Representations of Words and Phrases and their Compositionality (word2vec)
* Attention Is All You Need (Transformer)
* Language Models are Unsupervised Multitask Learners (GPT-2)
* BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
* RoBERTa: A Robustly Optimized BERT Pretraining Approach
* ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
* BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
* Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer (T5)
lm_basics
* (Blog) The Illustrated Transformer
* (Blog) Transformer Inference Arithmetic
1/24 Large Language Models and In-Context Learning * Language Models are Few-Shot Learners (GPT-3)
* Llama 2: Open Foundation and Fine-Tuned Chat Models
* An Explanation of In-context Learning as Implicit Bayesian Inference
* Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
llm_icl
* (Blog) Llama 2: an incredible open LLM
* (Tech Report) GPT-4 Technical Report
1/29 Model Calibration * How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering
* Surface Form Competition: Why the Highest Probability Answer Isn't Always Right
* Teaching Models to Express Their Uncertainty in Words
* Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation
calibration
* (Blog) Calibrating LLMs
* (Paper) Calibrate Before Use: Improving Few-Shot Performance of Language Models
1/31 Scaling and Emergent Ability * Training Compute-Optimal Large Language Models
* Scaling Data-Constrained Language Models
* Emergent Abilities of Large Language Models
* Are Emergent Abilities of Large Language Models a Mirage?
scaling
* (Blog) Scaling Laws and Emergent Properties
* (Blog) Are the emergent abilities of LLMs like GPT-4 a mirage?
Reasoning with Language Models
2/5 Chain-of-Thought Generation * Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
* Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
* Self-Consistency Improves Chain of Thought Reasoning in Language Models
* Large Language Models Can Self-Improve
cot
* (Blog) Comprehensive Guide to Chain-of-Thought Prompting
* (Paper) Large Language Models are Zero-Shot Reasoners
2/7 Advanced Reasoning * PAL: Program-aided Language Models
* Tree of Thoughts: Deliberate Problem Solving with Large Language Models
* Solving Quantitative Reasoning Problems with Language Models
* Let's Verify Step by Step
adv_reasoning
* (Blog) Tree of Thoughts (ToT)
* (Blog) Minerva: Solving Quantitative Reasoning Problems with Language Models
Knowledge and Factuality
2/12 Parametric Knowledge in Language Models * Language Models as Knowledge Bases?
* How Much Knowledge Can You Pack Into the Parameters of a Language Model?
* Transformer Feed-Forward Layers Are Key-Value Memories
* Locating and Editing Factual Associations in GPT
knowledge
* (Paper) Editing Factual Knowledge in Language Models
* (Paper) Fast Model Editing at Scale
2/14 Retrieval-Augmented Language Generation (RAG) * Generalization through Memorization: Nearest Neighbor Language Models
* Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
* Dense Passage Retrieval for Open-Domain Question Answering
* Improving language models by retrieving from trillions of tokens
rag
* (Paper) REPLUG: Retrieval-Augmented Black-Box Language Models
* (Paper) Lost in the Middle: How Language Models Use Long Contexts
Language Model Alignment
2/19 Multi-Task Instruction Tuning * Finetuned Language Models Are Zero-Shot Learners
* Multitask Prompted Training Enables Zero-Shot Task Generalization
* Cross-Task Generalization via Natural Language Crowdsourcing Instructions
* Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
multitask
(Blog) A Stage Review of Instruction Tuning
2/21 [Guest Lecture] Shunyu Yao (Princeton): Language Agents: From Next Token Prediction to Digital Automation
2/26 Chat-Style Instruction Tuning * Self-Instruct: Aligning Language Models with Self-Generated Instructions
* LIMA: Less Is More for Alignment
* AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
* Self-Alignment with Instruction Backtranslation
chat_instruction
* (Blog) Teach Llamas to Talk: Recent Progress in Instruction Tuning
* (Paper) How Far Can Camels Go? Exploring the State of Instruction Tuning on Open Resources
2/28 Reinforcement Learning from Human Feedback (RLHF) * Training language models to follow instructions with human feedback
* Direct Preference Optimization: Your Language Model is Secretly a Reward Model
* Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
* Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
rlhf
* (Blog) Illustrating Reinforcement Learning from Human Feedback (RLHF)
* (Blog) Preference Tuning LLMs with Direct Preference Optimization Methods
3/2 - 3/10 (Spring Recess, No Class)
Language Model Agents
3/11 Task Execution via Reasoning, Tools and Conversations * ReAct: Synergizing Reasoning and Acting in Language Models
* Toolformer: Language Models Can Teach Themselves to Use Tools
* AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation
* Reflexion: Language Agents with Verbal Reinforcement Learning
agent
* (Blog) ReAct: Synergizing Reasoning and Acting in Language Models
* (Blog) Breaking Down Toolformer
* (Blog) Superpower LLMs with Conversational Agents
3/13 Language Models for Code * InCoder: A Generative Model for Code Infilling and Synthesis
* Code Llama: Open Foundation Models for Code
* Teaching Large Language Models to Self-Debug
* LEVER: Learning to Verify Language-to-Code Generation with Execution
code_lm
* (Blog) Large Language Models for Code Generation – Part 1
* (Blog) Cracking the Code LLMs
3/18 Multimodal Language Models * Flamingo: a Visual Language Model for Few-Shot Learning
* VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
* Visual Instruction Tuning (LLaVA)
* NExT-GPT: Any-to-Any Multimodal LLM
multimodal
* (Blog) Fuyu-8B: A Multimodal Architecture for AI Agents
* (Blog) Understanding LLaVA: Large Language and Vision Assistant
3/20 [Guest Lecture] Zhaofeng Wu (MIT): Generalization in the LLM Era
Efficient Language Modeling
3/25 Sparse Models * Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
* Longformer: The Long-Document Transformer
* Efficient Streaming Language Models with Attention Sinks
* SparseGPT: Massive Language Models Can Be Accurately Pruned in One-Shot
sparse
* (Tech Report) Mixtral of Experts
* (Blog) Mixture of Experts Explained
Ethical Considerations and Evaluations of Language Models
3/27 Privacy and Legal Issues * Extracting Training Data from Large Language Models
* Large Language Models Can Be Strong Differentially Private Learners
* Quantifying Memorization Across Neural Language Models
* SILO Language Models: Isolating Legal Risk In a Nonparametric Datastore
privacy
* (Blog) Privacy in the age of generative AI
* (Blog) Extracting Training Data from ChatGPT
4/1 Security and Jailbreaking * DecodingTrust: A Comprehensive Assessment of Trustworthiness in GPT Models
* Universal and Transferable Adversarial Attacks on Aligned Language Models
* Poisoning Language Models During Instruction Tuning
* GPT-4 Is Too Smart To Be Safe: Stealthy Chat with LLMs via Cipher
security
* (Blog) Jailbreaking Large Language Models: Techniques, Examples, Prevention Methods
* (Blog) Adversarial Attacks on LLMs
4/3 Bias and Mitigation * RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
* Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
* Red Teaming Language Models with Language Models
* Whose Opinions Do Language Models Reflect?
bias
* (Blog) Understanding and Mitigating Bias in Large Language Models (LLMs)
* (Blog) Navigating The Biases In LLM Generative AI: A Guide To Responsible Implementation (LLMs)
4/8, 4/10 (No Class)
4/15 [Guest Lecture] Caleb Ziems (Stanford): Can Large Language Models Transform Computational Social Science?
4/17 [Guest Lecture] Tianyu Gao (Princeton): Long-Context Language Modeling with Parallel Context Encoding
4/22 [Guest Lecture] Chenyan Xiong (CMU): Parallel Pretraining for Large Language Models [Slides]
4/24, 4/29 Project Presentations

Useful Materials