CS 4501 Natural Language Processing (Fall 2024)
Logistics
- Instructor: Yu Meng (yumeng5[at]virginia[dot]edu); Office Hour: After class every Monday and Wednesday
- Teaching Assistants:
- Xu Ouyang (ftp8nr[at]virginia[dot]edu); Office Hour: 11:00am - 12:00pm every Wednesday (Zoom link)
- Zhepei Wei (tqf5qb[at]virginia[dot]edu); Office Hour: 8:00am - 9:00am every Monday (Zoom link)
- Wenqian Ye (pvc7hs[at]virginia[dot]edu); Office Hour: 4:00pm - 5:00pm every Friday (Zoom link)
- Lecture Time: Mondays, Wednesdays & Fridays 2:00pm - 2:50pm
- Location: Thornton Hall E303
Course Overview
In this course, we will explore the foundational concepts and the latest advancements in Natural Language Processing (NLP). The course aims to provide a thorough understanding of both traditional and modern approaches to language modeling, from N-gram language models to large language models (LLMs). We’ll start by covering the fundamentals of language representation, including classic methods such as word embeddings and vector space models. As we progress, we will introduce the Transformer architecture, which underpins today’s most powerful LLMs, and explore how techniques like pretraining and fine-tuning have transformed the field. In the latter part of the course, we will examine advanced topics like reasoning, factual knowledge integration, and Reinforcement Learning from Human Feedback (RLHF), which are essential for building and deploying NLP systems in practical, real-world scenarios.
Grading
- Assignments (60%): There will be around 5 assignments (with different weights) for the entire semester.
- All assignments must be completed individually
- Assignments will be a combination of concept questions + coding questions
- Submission via Canvas (as LaTeX report; handwritten submissions not accepted)
- DO NOT procrastinate on assignments! The coding questions (especially in the latter part of this course) take time to implement and run!
- Project (35%): By the end of this course, you are required to complete a project related to NLP, present your results, and submit a project report. You are required to work in a team of 2 or 3 (any deviation from this team size requires prior approval from the instructor). The rule of thumb is to demonstrate that you are able to apply the knowledge learned from this course. The workload for the project is expected to be more extensive than individual assignments. Some example project choices (non-exclusive):
- Use word embeddings to analyze sentence semantics (e.g., sentiment analysis)
- Fine-tune BERT and evaluate its performance for any task you like (e.g., named entity recognition, relation extraction)
- Benchmark LLMs (either open-weights or proprietary) for challenging tasks/applications
- Use LLM APIs to create agents for an interesting application (e.g., coding/personal assistants)
- …
The project grading breakdown (35%) is as follows:
- Project proposal: 2% (Deadline: 09/20 11:59pm)
- Mid-term report: 8% (Deadline: 10/18 11:59pm)
- Final presentation (Deadline: 12/01 11:59pm) and final report (Deadline: 12/13 11:59pm): 25%
Participation (5%+; points earned beyond 5% will become extra credit): We encourage everyone to actively participate in the class. There are several ways of earning the participation credit:
- Guest lecture attendance on Zoom (6%):
- We will have 3 guest lectures for this semester
- Each guest lecture can give you up to 2% participation credit (1% attendance + 1% asking questions – details shared before guest lectures)
- End-of-semester teaching feedback (2%):
- At the end of the semester, anyone who completes the teaching feedback survey will get 2%
- Answering technical questions raised by classmates (5%):
- We encourage and appreciate help from students to answer questions posted by classmates
- Every helpful answer to technical questions will earn 1% (Slido and Piazza both count)
- If you answer anonymously, we won’t be able to track your contributions!
- The maximum credit you can get in this category is 5%
- Guest lecture attendance on Zoom (6%):
Late Day Policy (Only Applied to Assignments, Not Projects!):
- You have 7 free days total for all assignments
- After you’ve used up your 7 late days, a penalty of 20% will be deducted from the assignment grade for each additional day it is late
- You cannot use > 3 late days (72 hours) per assignment unless given permission in advance
- Late days cannot be applied to project deadlines! You will get a 0 for any project checkpoints that you missed
Policy on Using LLMs/GenAI:
Collaborative coding with LLMs is allowed, but if you directly copy the answers generated by LLMs (for either conceptual or coding questions), you’ll get a 0 for that entire assignment!
Schedule (Subject to Changes!)
Week | Date | Topic | Slides | Slido Link | Deadline |
---|---|---|---|---|---|
1 | 08/28 | Course Logistics & Overview | overview_0828 | 08/28 | |
08/30 | Course Overview (Continued) | overview_0830 | 08/30 | ||
2 | 09/02 | Intro to Language Modeling & N-gram Language Models | lm_intro_0902 | 09/02 | Assignment 1 out: LaTeX script |
09/04 | N-gram Language Models (Continued) | ngram_lm_0904 | 09/04 | ||
09/06 | Smoothing & Evaluation of N-gram Language Models | smooth_eval_0906 | 09/06 | ||
3 | 09/09 | Intro to Word Senses & Semantics | semantics_intro_0909 | 09/09 | Assignment 2 out: LaTeX script |
09/11 | Classic Word Representations | classic_rep_0911 | 09/11 | Assignment 1 due: 09/11 11:59pm | |
09/13 | Vector Space Models | vector_0913 | 09/13 | ||
4 | 09/16 | Word Embeddings | word_emb_0916 | 09/16 | |
09/18 | Word Embeddings: Word2Vec | word2vec_0918 | 09/18 | ||
09/20 | Word Embeddings (Continued) | word_emb_0920 | 09/20 | Project proposal due: 09/20 11:59pm (Guideline) | |
5 | 09/23 | Intro to Sequence Modeling & Neural Language Models | seq_model_0923 | 09/23 | |
09/25 | Recurrent Neural Networks (RNNs) | rnn_0925 | 09/25 | Assignment 2 due: 09/25 11:59pm | |
09/27 | RNNs (Continued) | rnn_0927 | 09/27 | Assignment 3 out: LaTeX script | |
6 | 09/30 | Intro to Transformers | transformer_0930 | 09/30 | |
10/02 | (No Class) | ||||
10/04 | Self-attention | self_attn_1004 | 10/04 | ||
7 | 10/07 | Transformer Language Model | transformer_lm_1007 | 10/07 | |
10/09 | Language Model Pretraining & Fine-tuning | ||||
10/11 | Large Language Models (LLMs) & In-context Learning | Assignment 3 due: 10/11 11:59pm | |||
8 | 10/14 | Fall Reading Days (No Class) | |||
10/16 | Advanced In-context Learning | ||||
10/18 | Scaling Laws & Emergent Abilities of LLMs | Project mid-term report due: 10/18 11:59pm (Guideline) | |||
9 | 10/21 | Reasoning with LLMs | |||
10/23 | Factual Knowledge in LLMs | ||||
10/25 | Hallucinations and Retrieval-Augmented Generation (RAG) | ||||
10 | 10/28 | RAG (Continued) | |||
10/30 | Long-context LLMs | ||||
11/01 | Instruction Tuning | ||||
11 | 11/04 | Instruction Tuning (Continued) | |||
11/06 | Reinforcement Learning from Human Feedback (RLHF) | ||||
11/08 | Guest Lecture: Wei Xiong (UIUC) | ||||
12 | 11/11 | Multi-modal LLMs | |||
11/13 | LLMs for Coding | ||||
11/15 | Guest Lecture: Jiaming Shen (Google DeepMind) | ||||
13 | 11/18 | Detection of LLM Generation & Watermarking | |||
11/20 | Efficient Methods for LLMs | ||||
11/22 | Guest Lecture: Yizhong Wang (University of Washington) | ||||
14 | 11/25 | Recap + Future of NLP | |||
11/27 | Thanksgiving Recess (No Class) | ||||
11/29 | Thanksgiving Recess (No Class) | Project presentation slides due: 12/01 11:59pm | |||
15 | 12/02 | Project Presentation | |||
12/04 | Project Presentation | ||||
12/06 | Project Presentation | Project report due: 12/13 11:59pm |
Useful Materials
- For NLP background: Speech and Language Processing
- For deep learning background: Deep Learning
Students with Disabilities or Learning Needs
It is my goal to create a learning experience that is as accessible as possible. If you anticipate any issues related to the format, materials, or requirements of this course, please meet with me outside of class so we can explore potential options. Students with disabilities may also wish to work with the Student Disability Access Center (SDAC) to discuss a range of options to removing barriers in this course, including official accommodations. You may email an SDAC advisor at cmacmasters@virginia.edu to schedule an appointment. For general questions please visit the SDAC website: sdac.studenthealth.virginia.edu. If you have already been approved for accommodations through SDAC, please send me your accommodation letter and meet with me so we can develop an implementation plan together.
Religious Accommodations
It is the University’s long-standing policy and practice to reasonably accommodate students so that they do not experience an adverse academic consequence when sincerely held religious beliefs or observances conflict with academic requirements.
Students who wish to request academic accommodation for a religious observance should submit their request to me by email as far in advance as possible. Students who have questions or concerns about academic accommodations for religious observance or religious beliefs may contact the University’s Office for Equal Opportunity and Civil Rights (EOCR) at UVAEOCR@virginia.edu or 434-924-3200.
Harassment, Discrimination, and Interpersonal Violence
The University of Virginia is dedicated to providing a safe and equitable learning environment for all students. If you or someone you know has been affected by power-based personal violence, more information can be found on the UVA Sexual Violence website that describes reporting options and resources available - www.virginia.edu/sexualviolence.
The same resources and options for individuals who experience sexual misconduct are available for discrimination, harassment, and retaliation. UVA prohibits discrimination and harassment based on age, color, disability, family medical or genetic information, gender identity or expression, marital status, military status, national or ethnic origin, political affiliation, pregnancy (including childbirth and related conditions), race, religion, sex, sexual orientation, or veteran status. UVA policy also prohibits retaliation for reporting such behavior.
If you witness or are aware of someone who has experienced prohibited conduct, you are encouraged to submit a report to Just Report It (justreportit.virginia.edu) or contact EOCR, the office of Equal Opportunity and Civil Rights.
If you would prefer to disclose such conduct to a confidential resource where what you share is not reported to the University, you can turn to Counseling & Psychological Services (“CAPS”) and Women’s Center Counseling Staff and Confidential Advocates (for students of all genders).
As your professor and as a person, know that I care about you and your well-being and stand ready to provide support and resources as I can. As a faculty member, I am a responsible employee, which means that I am required by University policy and by federal law to report certain kinds of conduct that you report to me to the University’s Title IX Coordinator. The Title IX Coordinator’s job is to ensure that the reporting student receives the resources and support that they need, while also determining whether further action is necessary to ensure survivor safety and the safety of the University community.
Student Support Team
You have many resources available to you when you experience academic or personal stresses. In addition to the course staff, the School of Engineering and Applied Science has staff members located in Thornton Hall who you can contact to help manage academic or personal challenges. Please do not wait until the end of the semester to ask for help!
Community and Identity
The Center for Diversity in Engineering (CDE) is a student space dedicated to advocating for underrepresented groups in STEM. It exists to connect students with the academic, financial, health, and community resources they need to thrive both at UVA and in the world. The CDE includes an open study area, event space, and staff members on site. Through this space, we affirm and empower equitable participation toward intercultural fluency and provide the resources necessary for students to be successful during their academic journey and future careers.