WP1 LSS: Large Language Models - Seminar (WS 2023-2024)

Summary

Large Language Models (such as GPT2, GPT3, GPT4, RoBERTa, T5) and Intelligent Chatbots (such as ChatGPT, Bard and Claude) are a very timely topic.

Inhalte:

N-gram language models, neural language modeling, word2vec, RNNs, Transformers, BERT, RLHF, ChatGPT, multilingual alignment, prompting, transfer learning, domain adaptation, linguistic knowledge in large language models

Lernziele:

The participants will first learn the basics of n-gram language models, neural language modeling, RNNs and Transformers. In the second half of the seminar, participants will present an application of a modern large language model, intelligent chatbot or similar system. This class will involve a large amount of reading on both the basics and advanced topics.

Instructor

Alexander Fraser

Email Address: Put My Last Name Here @cis.uni-muenchen.de

CIS, LMU Munich

Schedule

Tuesdays: 16:00 c.t., Oettingenstr. 67 / 165

For a LaTeX template for the Hausarbeit, click here.

If this web page does not seem to be up to date, use the refresh button in your browser.

Date Topic Materials

October 31st Dan Jurafsky and James H. Martin (2023). Speech and Language Processing (3rd ed. draft), Chapter 3, N-gram Language Models pdf

November 7th Y Bengio, R Ducharme, P Vincent, C Jauvin (2003). A neural probabilistic language model. Journal of Machine Learning Research 3, 1137-1155 pdf

November 14th Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean (2013). Efficient Estimation of Word Representations in Vector Space. ICLR paper

November 21st Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin (2017). Attention Is All You Need. NIPS paper

November 28th Lena Voita. NLP Course: Sequence to Sequence (seq2seq) and Attention. Web Tutorial webpage

Also November 28th Referat Topics Presentation and Writeup
Kathy Hämmerl
Dr Marion Di Marco
Dr Viktor Hangya
Faeze Ghorbanpour

December 5th Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova (2019) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT paper

Referatsthemen (name: topic)

Date Topic Reference Materials Presenter Hausarbeit Received

December 12th InstructGPT (AF) Long Ouyang, Jeff Wu, et al. (2022). Training language models to follow instructions with human feedback. arXiv. paper AF

(same as above) Factuality in LLMs (KH) Gao, Tianyu et al. (2023). Enabling Large Language Models to Generate Text with Citations. EMNLP paper Jana Grimm yes

January 9th, 2024 Decoding Strategies (KH) Gian Wiher, Clara Meister, and Ryan Cotterell (2022). On Decoding Strategies for Neural Text Generators. Transactions of the Association for Computational Linguistics, 10:997–1012. paper Oliver Kraus yes

(same as above) Position of Relevant Information (VH) Liu et al. (2023). Lost in the Middle: How Language Models Use Long Contexts. Transactions of the Association for Computational Linguistics paper Huixin Chen yes

January 16th, 2024 Importance of Data Understanding (FG) Elazar et al. (2023). What’s In My Big Data?. In arXiv preprint arXiv:2310.20707. paper Lea Hirlimann yes

(same as above) Emergent Capabilities of LLMs (VH) Wei et al. (2022). Emergent Abilities of Large Language Models. Transactions on Machine Learning Research paper Shuo Xu yes

January 23rd, 2024 Inequality Between Languages (KH) Ahia, Orevaoghene et al. (2023). Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models. CoRR, ArXiv abs/2305.13707 paper Zhijun Ying yes

(same as above) Data Pruning for LLM Training (FG) Marion et al. (2023). When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale. In arXiv preprint arXiv:2309.04564. paper Kristina Kuznetsova yes

January 30th, 2024 Subword Segmentation (MDM) Valentin Hofmann, Janet Pierrehumbert, Hinrich Schuetze (2021). Superbizarre Is Not Superb: Derivational Morphology Improves BERT’s Interpretation of Complex Words. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing paper Pingjun Hong yes

Date	Topic	Materials
October 31st	Dan Jurafsky and James H. Martin (2023). Speech and Language Processing (3rd ed. draft), Chapter 3, N-gram Language Models	pdf
November 7th	Y Bengio, R Ducharme, P Vincent, C Jauvin (2003). A neural probabilistic language model. Journal of Machine Learning Research 3, 1137-1155	pdf
November 14th	Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean (2013). Efficient Estimation of Word Representations in Vector Space. ICLR	paper
November 21st	Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin (2017). Attention Is All You Need. NIPS	paper
November 28th	Lena Voita. NLP Course: Sequence to Sequence (seq2seq) and Attention. Web Tutorial	webpage
Also November 28th	Referat Topics	Presentation and Writeup Kathy Hämmerl Dr Marion Di Marco Dr Viktor Hangya Faeze Ghorbanpour
December 5th	Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova (2019) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. NAACL-HLT	paper

Date	Topic	Reference	Materials	Presenter	Hausarbeit Received
December 12th	InstructGPT (AF)	Long Ouyang, Jeff Wu, et al. (2022). Training language models to follow instructions with human feedback. arXiv.	paper	AF
(same as above)	Factuality in LLMs (KH)	Gao, Tianyu et al. (2023). Enabling Large Language Models to Generate Text with Citations. EMNLP	paper	Jana Grimm	yes
January 9th, 2024	Decoding Strategies (KH)	Gian Wiher, Clara Meister, and Ryan Cotterell (2022). On Decoding Strategies for Neural Text Generators. Transactions of the Association for Computational Linguistics, 10:997–1012.	paper	Oliver Kraus	yes
(same as above)	Position of Relevant Information (VH)	Liu et al. (2023). Lost in the Middle: How Language Models Use Long Contexts. Transactions of the Association for Computational Linguistics	paper	Huixin Chen	yes
January 16th, 2024	Importance of Data Understanding (FG)	Elazar et al. (2023). What’s In My Big Data?. In arXiv preprint arXiv:2310.20707.	paper	Lea Hirlimann	yes
(same as above)	Emergent Capabilities of LLMs (VH)	Wei et al. (2022). Emergent Abilities of Large Language Models. Transactions on Machine Learning Research	paper	Shuo Xu	yes
January 23rd, 2024	Inequality Between Languages (KH)	Ahia, Orevaoghene et al. (2023). Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models. CoRR, ArXiv abs/2305.13707	paper	Zhijun Ying	yes
(same as above)	Data Pruning for LLM Training (FG)	Marion et al. (2023). When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale. In arXiv preprint arXiv:2309.04564.	paper	Kristina Kuznetsova	yes
January 30th, 2024	Subword Segmentation (MDM)	Valentin Hofmann, Janet Pierrehumbert, Hinrich Schuetze (2021). Superbizarre Is Not Superb: Derivational Morphology Improves BERT’s Interpretation of Complex Words. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing	paper	Pingjun Hong	yes