KDD'23 Tutorial: Pretrained Language Representations for Text Understanding: A Weakly-Supervised Perspective

Yu Meng, Jiaxin Huang, Yu Zhang, Yunyi Zhang, Jiawei Han
Computer Science Department, University of Illinois at Urbana-Champaign
Time: Aug 9, 2023 10:00 AM - 1:00 PM (PDT)

Abstract

Language representations pretrained on general-domain corpora and adapted to downstream task data have achieved enormous success in building natural language understanding (NLU) systems. While the standard supervised fine-tuning of pretrained language models (PLMs) has proven an effective approach for superior NLU performance, it often necessitates a large quantity of costly human-annotated training data. For example, the enormous success of ChatGPT and GPT-4 can be largely credited to their supervised fine-tuning with massive manually-labeled prompt-response training pairs. Unfortunately, obtaining large-scale human annotations is in general infeasible for most practitioners. To broaden the applicability of PLMs to various tasks and settings, weakly-supervised learning offers a promising direction to minimize the annotation requirements for PLM adaptions.

In this tutorial, we cover the recent advancements in pretraining language models and adaptation methods for a wide range of NLU tasks. Our tutorial has a particular focus on weakly-supervised approaches that do not require massive human annotations. We will introduce the following topics in this tutorial:

  1. pretraining language representation models that serve as the fundamentals for various NLU tasks;
  2. extracting entities and hierarchical relations from unlabeled texts;
  3. discovering topical structures from massive text corpora for text organization;
  4. understanding documents and sentences with weakly-supervised techniques.

Slides

  • Introduction [Slides]
  • Part I: Language Foundation Models [Slides]
  • Part II: Embedding-Driven Topic Discovery [Slides]
  • Part III: Weakly-Supervised Text Classification [Slides]
  • Part IV: Language Models for Knowledge Base Construction [Slides]
  • Part V: Advanced Text Mining Applications [Slides]

Presenters

Yu MengYu Meng, Ph.D. student, Computer Science, UIUC. His research focuses on mining structured knowledge from massive text corpora with minimum human supervision. He received the Google PhD Fellowship (2021) in Structured Data and Database Management.




Jiaxin HuangJiaxin Huang, Ph.D. student, Computer Science, UIUC. Her research focuses on mining structured knowledge from massive text corpora. She received the Microsoft Research PhD Fellowship (2021) and the Chirag Foundation Graduate Fellowship (2018) in Computer Science, UIUC.








Yu ZhangYu Zhang, Ph.D. student, Computer Science, UIUC. His research focuses on weakly supervised text mining with structural information. He received the Dissertation Completion Fellowship (2023), the Yunni and Maxine Pao Memorial Fellowship (2022), and WWW Best Poster Award Honorable Mention (2018).



Yunyi ZhangYunyi Zhang, Ph.D. student, Computer Science, UIUC. His research focuses on weakly supervised text mining, text classification, and taxonomy construction.








Jiawei HanJiawei Han, Michael Aiken Chair Professor, Computer Science, UIUC. His research areas encompass data mining, text mining, data warehousing and information network analysis, with over 800 research publications. He is Fellow of ACM, Fellow of IEEE, and has received numerous prominent awards, including ACM SIGKDD Innovation Award (2004) and IEEE Computer Society W. Wallace McDowell Award (2009). He has delivered 50+ conference tutorials or keynote speeches.