Leonie Weissweiler

/​ˈl​eː​o​n​i ​ˈvaɪ̯​svaɪ̯​​l​​ɐ/

I am a fourth-year PhD student at the Center for Information and Language Processing at LMU Munich. My supervisor is Hinrich Schütze.

Previously, I completed my B.Sc. and M.Sc. degrees in Computational Linguistics and Computer Science at LMU, with scholarships from the German Academic Scholarship Foundation and the Max Weber Program. My M.Sc. thesis, supervised by Hinrich Schütze, was on the application of Complementary Learning Systems Theory to NLP. I spent the final year of my bachelor's degree as a visiting student at Homerton College, University of Cambridge, where I wrote my B.Sc. thesis on Character-Level RNNs under the supervision of Anna Korhonen.

I'm no longer actively at CIS. Please contact Yihong Liu about my former classes, and Chunlan Ma about Erasmus and TUM Anwendungsfach.

Current Research Interests
  • Construction Grammar and NLP
  • Emergent structure in Language
  • Interactions between Cognitive Linguistics and NLP
  • Computational Approaches to Morphosyntax
  • Automated Crosslingual Linguistic Analysis



Leonie Weissweiler, Nina Böbel, Kirian Guiller, Santiago Herrera, Wesley Scivetti, Arthur Lorenzi, Nurit Melnik, Archna Bhatia, Hinrich Schütze, Lori Levin, Amir Zeldes, Joakim Nivre, William Croft, Nathan Schneider (2024). UCxn: Typologically Informed Annotation of Constructions Atop Universal Dependencies. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation. (LREC-COLING)
The Universal Dependencies (UD) project has created an invaluable collection of treebanks with contributions in over 140 languages. However, the UD annotations do not tell the full story. Grammatical constructions that convey meaning through a particular combination of several morphosyntactic elements -- for example, interrogative sentences with special markers and/or word orders -- are not labeled holistically. We argue for (i) augmenting UD annotations with a 'UCxn' annotation layer for such meaning-bearing grammatical constructions, and (ii) approaching this in a typologically informed way so that morphosyntactic strategies can be compared across languages. As a case study, we consider five construction families in ten languages, identifying instances of each construction in UD treebanks through the use of morphosyntactic patterns. In addition to findings regarding these particular constructions, our study yields important insights on methodology for describing and identifying constructions in language-general and language-particular ways, and lays the foundation for future constructional enrichment of UD treebanks.

Shijia Zhou, Leonie Weissweiler, Taiqi He, Hinrich Schütze, David Mortensen, Lori Levin (2024). Constructions Are So Difficult That Even Large Language Models Get Them Right for the Wrong Reasons. Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation. (LREC-COLING)
In this paper, we make a contribution that can be understood from two perspectives: from an NLP perspective, we introduce a small challenge dataset for NLI with large lexical overlap, which minimises the possibility of models discerning entailment solely based on token distinctions, and show that GPT-4 and Llama 2 fail it with strong bias. We then create further challenging sub-tasks in an effort to explain this failure. From a Computational Linguistics perspective, we identify a group of constructions with three classes of adjectives which cannot be distinguished by surface features. This enables us to probe for LLM's understanding of these constructions in various ways, and we find that they fail in a variety of ways to distinguish between them, suggesting that they don't adequately represent their meaning or capture the lexical properties of phrasal heads.

David Mortensen, Valentina Izrailevitch, Yunze Xiao, Hinrich Schütze, Leonie Weissweiler (2024). Counting the Bugs in ChatGPT's Wugs: A Multilingual Investigation into the Morphological Capabilities of a Large Language Model. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. (LREC-COLING)
Lexical-syntactic flexibility, in the form of conversion (or zero-derivation) is a hallmark of English morphology. In conversion, a word with one part of speech is placed in a non-prototypical context, where it is coerced to behave as if it had a different part of speech. However, while this process affects a large part of the English lexicon, little work has been done to establish the degree to which language models capture this type of generalization. This paper reports the first study on the behavior of large language models with reference to conversion. We design a task for testing lexical-syntactic flexibility—the degree to which models can generalize over words in a construction with a non-prototypical part of speech. This task is situated within a natural language inference paradigm. We test the abilities of five language models—two proprietary models (GPT-3.5 and GPT-4), three open-source models (Mistral 7B, Falcon 40B, and Llama 2 70B). We find that GPT-4 performs best on the task, followed by GPT-3.5, but that the open source language models are also able to perform it and that the 7B parameter Mistral displays as little difference between its baseline performance on the natural language inference task and the non-prototypical syntactic category task, as the massive GPT-4.

Leonie Weissweiler*, Valentin Hofmann*, Anjali Kantharuban, Anna Cai, Ritam Dutt, Amey Hengle, Anubha Kabra, Atharva Kulkarni, Abhishek Vijayakumar, Haofei Yu, Hinrich Schuetze, Kemal Oflazer, David Mortensen (2023). Counting the Bugs in ChatGPT's Wugs: A Multilingual Investigation into the Morphological Capabilities of a Large Language Model. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. (EMNLP)
Large language models (LLMs) have recently reached an impressive level of linguistic capability, prompting comparisons with human language skills. However, there have been relatively few systematic inquiries into the linguistic capabilities of the latest generation of LLMs, and those studies that do exist (i) ignore the remarkable ability of humans to generalize, (ii) focus only on English, and (iii) investigate syntax or semantics and overlook other capabilities that lie at the heart of human language, like morphology. Here, we close these gaps by conducting the first rigorous analysis of the morphological capabilities of ChatGPT in four typologically varied languages (specifically, English, German, Tamil, and Turkish). We apply a version of Berko's (1958) wug test to ChatGPT, using novel, uncontaminated datasets for the four examined languages. We find that ChatGPT massively underperforms purpose-built systems, particularly in English. Overall, our results -- through the lens of morphology -- cast a new light on the linguistic capabilities of ChatGPT, suggesting that claims of human-like language skills are premature and misleading.

Yihong Liu, Haotian Ye, Leonie Weissweiler, Hinrich Schütze (2023). Crosslingual Transfer Learning for Low-Resource Languages Based on Multilingual Colexification Graphs. Findings of the 2023 Conference on Empirical Methods in Natural Language Processing. (EMNLP)
Colexification in comparative linguistics refers to the phenomenon of a lexical form conveying two or more distinct meanings. In this paper, we propose simple and effective methods to build multilingual graphs from colexification patterns: ColexNet and ColexNet+. ColexNet's nodes are concepts and its edges are colexifications. In ColexNet+, concept nodes are in addition linked through intermediate nodes, each representing an ngram in one of 1,334 languages. We use ColexNet+ to train high-quality multilingual embeddings ColexNet+ that are well-suited for transfer learning scenarios. Existing work on colexification patterns relies on annotated word lists. This limits scalability and usefulness in NLP. In contrast, we identify colexification patterns of more than 2,000 concepts across 1,335 languages directly from an unannotated parallel corpus. In our experiments, we first show that ColexNet has a high recall on CLICS, a dataset of crosslingual colexifications. We then evaluate ColexNet+ on roundtrip translation, verse retrieval and verse classification and show that our embeddings surpass several baselines in a transfer learning setting. This demonstrates the benefits of colexification for multilingual NLP.

Xinpeng Wang, Leonie Weissweiler, Hinrich Schütze, Barbara Plank (2023). How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives. Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics. (ACL)
Recently, various intermediate layer distillation (ILD) objectives have been shown to improve compression of BERT models via Knowledge Distillation (KD). However, a comprehensive evaluation of the objectives in both task-specific and task-agnostic settings is lacking. To the best of our knowledge, this is the first work comprehensively evaluating distillation objectives in both settings. We show that attention transfer gives the best performance overall. We also study the impact of layer choice when initializing the student from the teacher layers, finding a significant impact on the performance in task-specific distillation. For vanilla KD and hidden states transfer, initialisation with lower layers of the teacher gives a considerable improvement over higher layers, especially on the task of QNLI (up to an absolute percentage change of 17.8 in accuracy). Attention transfer behaves consistently under different initialisation settings. We release our code as an efficient transformer-based model distillation framework for further studies.

Yihong Liu, Haotian Ye, Leonie Weissweiler, Philipp Wicke, Renhao Pei, Robert Zangenfeind, Hinrich Schütze (2023). A Crosslingual Investigation of Conceptualization in 1335 Languages. Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics. (ACL)
Languages differ in how they divide up the world into concepts and words; e.g., in contrast to English, Swahili has a single concept for `belly' and `womb'. We investigate these differences in conceptualization across 1,335 languages by aligning concepts in a parallel corpus. To this end, we propose Conceptualizer, a method that creates a bipartite directed alignment graph between source language concepts and sets of target language strings. In a detailed linguistic analysis across all languages for one concept (`bird') and an evaluation on gold standard data for 32 Swadesh concepts, we show that Conceptualizer has good alignment accuracy. We demonstrate the potential of research on conceptualization in NLP with two experiments. (1) We define crosslingual stability of a concept as the degree to which it has 1-1 correspondences across languages, and show that concreteness predicts stability. (2) We represent each language by its conceptualization pattern for 83 concepts, and define a similarity measure on these representations. The resulting measure for the conceptual similarity of two languages is complementary to standard genealogical, typological, and surface similarity measures. For four out of six language families, we can assign languages to their correct family based on conceptual similarity with accuracy between 54% and 87%.

Leonie Weissweiler, Taiqi He, Naoki Otani, David R. Mortensen, Lori Levin, Hinrich Schütze (2023). Construction Grammar Provides Unique Insight into Neural Language Models. Proceedings of the First International Workshop on Construction Grammars and NLP (CxGs+NLP, GURT/SyntaxFest 2023), pages 85–95, Washington, D.C.. Association for Computational Linguistics.
Construction Grammar (CxG) has recently been used as the basis for probing studies that have investigated the performance of large pretrained language models (PLMs) with respect to the structure and meaning of constructions. In this position paper, we make suggestions for the continuation and augmentation of this line of research. We look at probing methodology that was not designed with CxG in mind, as well as probing methodology that was designed for specific constructions. We analyse selected previous work in detail, and provide our view of the most important challenges and research questions that this promising new field faces.

Leonie Weissweiler, Valentin Hofmann, Abdullatif Köksal, Hinrich Schütze (2022). The Better Your Syntax, the Better Your Semantics? Probing Pretrained Language Models for the English Comparative Correlative . Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 10859–10882, Abu Dhabi, United Arab Emirates. Association for Computational Linguistics. (EMNLP)
PDF Source code on Github
Construction Grammar (CxG) is a paradigm from cognitive linguistics emphasising the connection between syntax and semantics. Rather than rules that operate on lexical items, it posits constructions as the central building blocks of language, i.e., linguistic units of different granularity that combine syntax and semantics. As a first step towards assessing the compatibility of CxG with the syntactic and semantic knowledge demonstrated by state-of-the-art pretrained language models (PLMs), we present an investigation of their capability to classify and understand one of the most commonly studied constructions, the English comparative correlative (CC). We conduct experiments examining the classification accuracy of a syntactic probe on the one hand and the models' behaviour in a semantic application task on the other, with BERT, RoBERTa, and DeBERTa as the example PLMs. Our results show that all three investigated PLMs are able to recognise the structure of the CC but fail to use its meaning. While human-like performance of PLMs on many NLP tasks has been alleged, this indicates that PLMs still suffer from substantial shortcomings in central domains of linguistic knowledge.

Leonie Weissweiler, Valentin Hofmann, Masoud Jalili Sabet, Hinrich Schütze (2022). CaMEL: Case Marker Extraction without Labels. Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 5506–5516, Dublin, Ireland. Association for Computational Linguistics (ACL)
PDF BibTeX Source code on Github
We introduce CaMEL (Case Marker Extraction without Labels), a novel and challenging task in computational morphology that is especially relevant for low-resource languages. We propose a first model for CaMEL that uses a massively multilingual corpus to extract case markers in 83 languages based only on a noun phrase chunker and an alignment system. To evaluate CaMEL, we automatically construct a silver standard from UniMorph. The case markers extracted by our model can be used to detect and visualise similarities and differences between the case systems of different languages as well as to annotate fine-grained deep cases in languages in which they are not overtly marked.

Leonie Weissweiler, Alexander Fraser (2017). Developing a Stemmer for German Based on a Comparative Analysis of Publicly Available Stemmers. Proceedings of the German Society for Computational Linguistics and Language Technology (GSCL)
PDF BibTeX Source code on Github
Stemmers, which reduce words to their stems, are important components of many natural language processing systems. In this paper, we conduct a systematic evaluation of several stemmers for German using two gold standards we have created and will release to the community. We then present our own stemmer, which achieves state-of-the-art results, is easy to understand and extend, and will be made publicly available both for use by programmers and as a benchmark for further stemmer development.


Testing the Limits of LLMs with Construction Grammar talk given at Bielefeld University.

Everything is a Construction: New Goals for Syntactic and Semantic Probing, talk given at ETH Zürich, University of Vienna, and Bar Ilan University.

Construction Grammar Provides Unique Insight into Neural Language Models talk given at CxGs+NLP Workshop at the GURT 2023

The Better Your Syntax, the Better Your Semantics? Probing Pretrained Language Models for the English Comparative Correlative,video recorded for EMNLP 2022

The Past, Present, and Future of NLP from a Linguistic Perspective, talk given at 8th International ScaDS Summer School 2022 and MunichNLP

CaMEL: Case Marker Extraction without Labels, talk given at ACL 2022

Erlebe Computerlinguistik, short video explaining Computational Linguistics to high school graduates, recorded for LMU and Aelius Förderwerk e.V.


Thesis supervision

WS 2023/24

Übung zu Profilierungsmodul II

SS 2023

Kolloquium Computerlinguistisches Arbeiten
Blockseminar Evaluation of Large Language Models

WS 2022/23

Übung zu Profilierungsmodul II

SS 2022

Kolloquium Computerlinguistisches Arbeiten

WS 2021/22

Übung zu Vertiefung der Grundlagen der Computerlinguistik
Übung zu Profilierungsmodul II

SS 2021

Kolloquium Computerlinguistisches Arbeiten

WS 2020/21

Übung zu Vertiefung der Grundlagen der Computerlinguistik
Übung zu Profilierungsmodul II

SS 2017

Kolloquium Computerlinguistisches Arbeiten
Übung zu Mathematische Grundlagen der Computerlinguistik

WS 2016/17

Übung zur Einführung in die Programmierung