AIBAT: AI Behavior Analysis Tool for Teacher-Driven Contextual Evaluation of Language Models in Education

Artificial Intelligence in Education (2025, Conference Paper)*

Authors: Shamya Karumbaiah, Yaxuan Yin, Aayush Bharadwaj

*Won Best Paper Award at the International Conference on Artificial Intelligence in Education

Abstract: With the increasing reliance of AIED on opaque, black-box scaffolds such as large language models to support student learning, there is a growing concern about their limitations when used in diverse pedagogical contexts. This opacity often undermines educators’ trust and shapes their perceptions, contributing to resistance toward the adoption of AI scaffolds in schools. To address these challenges, we developed AIBAT, a workflow and system designed to support educators in auditing and critically evaluating the potential benefits and harms of AI systems within their specific pedagogical contexts (e.g., subject matter, grade level, English proficiency). With AIBAT, teachers can specify expected behaviors—i.e., what they anticipate the AI scaffold should do—and test the system against those expectations. We conducted an exploratory user study with 14 teachers to examine how AIBAT facilitates the identification and sensemaking of AI-related risks, while enabling educators to use evidence to calibrate their trust in AI scaffolds. Our findings reveal that teachers valued the ability to engage with AI decisions rather than passively accepting them, describing the process as a “conversation” that enhanced transparency, trust, and a sense of control. We identify key opportunities to foster meaningful user engagement in AI auditing processes and discuss broader implications for promoting responsible and effective teacher participation in the evaluation and deployment of AI systems in educational settings.