AI Safety and Model Evaluation

The work connects AI safety to questions long central to the humanities: ambiguity, interpretation, persuasion, value conflict, narrative framing, and the instability of meaning under small changes in wording. Across ethics-based audits, FATE evaluation, CAISI/NIST standards work, and syntactic framing studies, the research treats language not as a surface feature of AI behavior but as a core safety domain.

NIST AI Safety Institute Consortium / CAISI

Co-leading the MLA-sponsored team, 2024–present

Elkins co-leads the team representing the Modern Language Association at the U.S. AI Safety Institute Consortium, now housed within CAISI. The MLA-sponsored team brings expertise in language, writing, interpretation, ethics, and humanistic inquiry to evaluations of model behavior in complex linguistic and ethical scenarios.

The team’s ethics-based audit results were presented on its behalf during the opening keynote at the consortium’s first plenary at the University of Maryland. This placed humanistic model-evaluation work inside one of the central U.S. standards conversations about trustworthy AI.

NIST → MLA → MLA →

How Well Can GenAI Predict Human Behavior?

Notre Dame–IBM Technology Ethics Lab, 2024

Supported by the Notre Dame–IBM Technology Ethics Lab’s 2024 Call for Proposals on “The Ethics of Large-Scale Models,” this project examines how LLMs reason in high-stakes decision-making over humans. The project, “How Well Can GenAI Predict Human Behavior? Auditing State-of-the-Art Large Language Models for Fairness, Accuracy, Transparency, and Explainability (FATE),” was awarded to Jon Chun and Katherine Elkins, with Yong Suk Lee of Notre Dame’s Keough School of Global Affairs as faculty collaborator.

The study uses recidivism prediction as a test case because such systems can influence sentencing, bail, and early release decisions. It compares human experts, statistical machine-learning models, and LLMs while auditing performance through the FATE framework: Fairness, Accuracy, Transparency, and Explainability. Lee’s expertise in technology, labor economics, AI ethics, AI governance, and the social effects of automation strengthens the project’s focus on how predictive systems operate in human institutions.

Notre Dame → Notre Dame →

Informed AI Regulation

Chun and Elkins, arXiv 2402.01651, 2024

In June 2023, when much academic debate still treated large language models as unable to reason, Chun and Elkins conducted the first ethics-based audit of leading commercial and open-source LLM chatbots for moral reasoning and normative values. The study applies the Mökander and Floridi ethics-based auditing framework to eight frontier models, including GPT-4, and compares how models respond to morally charged scenarios.

The audit finds that more sophisticated ethical reasoning emerges with scale, appearing in GPT-4 but not in smaller models, while also documenting cultural bias and authoritarian tendencies in some systems. Its regulatory claim is empirical: deployed systems encode value commitments, and informed regulation should begin from observed model behavior rather than abstract assumptions about what LLMs can or cannot do.

The paper has since become a reference point for evaluating the moral behavior of language models. It anchors the corpus surveyed in Beyond Verdicts (AAAI 2026), supplies the confidence-scoring method used by Liu, Liu, and Yu across 1,613 decision scenarios (COLING 2025), and is cited in work on norm inconsistency (Jain, Calacci, and Wilson, AIES 2024), the LLM Ethics Whitepaper (Ungless et al., 2024), and research using LLMs as ethics reviewers (Hey, Walsh, and Mustafaraj, AIES 2025). Later work extends the audit’s methods into trustworthy medical and educational AI, responsible chatbot design, and graduate research across four countries.

arXiv → DOI →

Syntactic Framing Fragility

Chun and Elkins, 2025

This work documents systematic failures in how frontier language models respond to syntactic framing changes in safety-critical moral prompts. It develops the Syntactic Framing Fragility metric for deployment-critical audit and shows that surface-syntactic perturbations can alter model judgment in safety-relevant scenarios.

The project extends ethics-based auditing from value comparison to linguistic robustness, showing how small changes in negation and framing can expose vulnerabilities in model reasoning and alignment.

Selected work and projects

NIST AI Safety Institute Consortium / CAISI

How Well Can GenAI Predict Human Behavior?

Informed AI Regulation

Syntactic Framing Fragility