WP1 LSS: Large Language Models - Seminar (WS 2023-2024)

Summary

Large Language Models (such as GPT2, GPT3, GPT4, RoBERTa, T5) and Intelligent Chatbots (such as ChatGPT, Bard and Claude) are a very timely topic.

Inhalte:

N-gram language models, neural language modeling, word2vec, RNNs, Transformers, BERT, RLHF, ChatGPT, multilingual alignment, prompting, transfer learning, domain adaptation, linguistic knowledge in large language models

Lernziele:

The participants will first learn the basics of n-gram language models, neural language modeling, RNNs and Transformers. In the second half of the seminar, participants will present an application of a modern large language model, intelligent chatbot or similar system. This class will involve a large amount of reading on both the basics and advanced topics.

Instructor

Alexander Fraser

Email Address: Put My Last Name Here @cis.uni-muenchen.de

CIS, LMU Munich



Schedule

Tuesdays: 16:00 c.t., Oettingenstr. 67 / 165


For a LaTeX template for the Hausarbeit, click here.


If this web page does not seem to be up to date, use the refresh button in your browser.
Date Topic Materials
October 31st Dan Jurafsky and James H. Martin (2023). Speech and Language Processing (3rd ed. draft), Chapter 3, N-gram Language Models pdf
November 7th Y Bengio, R Ducharme, P Vincent, C Jauvin (2003). A neural probabilistic language model. Journal of Machine Learning Research 3, 1137-1155 pdf
November 14th Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean (2013). Efficient Estimation of Word Representations in Vector Space. ICLR paper
November 21st Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin (2017). Attention Is All You Need. NIPS paper
November 28th Lena Voita. NLP Course: Sequence to Sequence (seq2seq) and Attention. Web Tutorial webpage
Also November 28th Referat Topics Presentation and Writeup
Kathy Hämmerl
Dr Marion Di Marco
Dr Viktor Hangya
Faeze Ghorbanpour
December 5th Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova (2019) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT paper



Referatsthemen (name: topic)


Date Topic Reference Materials Presenter Hausarbeit Received
December 12th InstructGPT (AF) Long Ouyang, Jeff Wu, et al. (2022). Training language models to follow instructions with human feedback. arXiv. paper AF
(same as above) Factuality in LLMs (KH) Gao, Tianyu et al. (2023). Enabling Large Language Models to Generate Text with Citations. EMNLP paper Jana Grimm yes
January 9th, 2024 Decoding Strategies (KH) Gian Wiher, Clara Meister, and Ryan Cotterell (2022). On Decoding Strategies for Neural Text Generators. Transactions of the Association for Computational Linguistics, 10:997–1012. paper Oliver Kraus yes
(same as above) Position of Relevant Information (VH) Liu et al. (2023). Lost in the Middle: How Language Models Use Long Contexts. Transactions of the Association for Computational Linguistics paper Huixin Chen yes
January 16th, 2024 Importance of Data Understanding (FG) Elazar et al. (2023). What’s In My Big Data?. In arXiv preprint arXiv:2310.20707. paper Lea Hirlimann yes
(same as above) Emergent Capabilities of LLMs (VH) Wei et al. (2022). Emergent Abilities of Large Language Models. Transactions on Machine Learning Research paper Shuo Xu yes
January 23rd, 2024 Inequality Between Languages (KH) Ahia, Orevaoghene et al. (2023). Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models. CoRR, ArXiv abs/2305.13707 paper Zhijun Ying yes
(same as above) Data Pruning for LLM Training (FG) Marion et al. (2023). When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale. In arXiv preprint arXiv:2309.04564. paper Kristina Kuznetsova yes
January 30th, 2024 Subword Segmentation (MDM) Valentin Hofmann, Janet Pierrehumbert, Hinrich Schuetze (2021). Superbizarre Is Not Superb: Derivational Morphology Improves BERT’s Interpretation of Complex Words. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing paper Pingjun Hong yes