This research project received funding through the 2024-2025 Institute for Diversity Science Seed Grant Program
Principal Investigator: Shamya Karumbaiah, Department of Educational Psychology, University of Wisconsin-Madison
![Shamya Karumbaiah](https://ids.wisc.edu/wp-content/uploads/sites/1882/2023/07/Shamya_Karumbaiah_sqaure_2-scaled-e1688662971291-300x300.jpeg)
Abstract:
Multilingual students often express their ideas using more than one language at a time, also known as “translanguaging” (e.g., Spanglish). Recognizing and engaging with these ideas is important to support multilingual students’ learning and identity development. Yet, teachers often feel unprepared to support students who bring a diverse linguistic repertoire to the classroom. Multilingual Large Language Models (MLLMs) present new opportunities to address these linguistic barriers. MLLMs are artificial intelligence models that have been “trained” on vast amounts of data from multiple languages to perform linguistic tasks such as text generation, prediction, and summarization.
Our preliminary investigations reveal poor performance of MLLMs in detecting ideas expressed in Spanglish as compared to English or Spanish. This is a form of algorithmic bias. Such biases are detrimental to student learning in educational AI interventions as they may underestimate student knowledge, leading to missed learning opportunities and loss of engagement.
Current approaches to test for algorithmic bias in MLLMs fall short. This is due to the lack of annotated data for that can be used to audit for bias in the ways the models are used (e.g., educational assessment). Unlike predictive models built for a particular purpose using, say, human-labeled data (e.g., teacher assessment), generative AI models such as MLLMs are trained to be general-purpose and often only prompt-engineered to perform predictions (with little to no human-labeled data).
To overcome this challenge for bias audits, we propose a new conceptualization of bias in which linguistic variation is a potential source of bias. We then measure bias as the difference in how well MLLMs detect ideas in scenarios with and without linguistic variations (e.g., difference in detection accuracy when the same idea is expressed in English or Spanish, versus Spanglish). We identify Spanglish linguistic variations by examining: 1) literature on translanguaging pedagogy practices (e.g., cognates, word walls), 2) code switching literature (e.g., noun transfers, morphosyntactic variations), and 3) authentic data from a bilingual classroom.
Our project contributes to the identification and mitigation of biases when MLLMs are used with multilingual learners by: 1) overcoming the need for human annotations (by comparing idea detection with and without linguistic variation), 2) overcoming the need for student-level demographic data (by using linguistic variations pertaining to the population), and 3) creating a benchmark of relevant variations for future development of MLLMs.