Can You Trust AI with Life and Death Decisions?
Max Kleiman-Weiner honored for study on how LLMs handle moral dilemmas
Can you trust AI to make an ethical life-or-death decision? What if that decision (say, to spare the life of one group of people vs. another) isn’t obvious, or varies by context or situation?
University of Washington Foster School of Business Assistant Professor of Marketing and International Business Max Kleiman-Weiner explores whether you can trust AI (or not) in his latest research.
Kleiman-Weiner co-authored the groundbreaking study, “Language Model Alignment in Multilingual Trolley Problems,” which revealed that AI models do not necessarily align with human responses to such moral dilemmas. The work was honored with the Best Paper Award on Pluralistic Alignment at the NeurIPS 2024 Workshop. It was published at The International Conference on Learning Representations (ICLR) with a spotlight honor.
“AI systems need to represent our own beliefs and values,” says Kleiman-Weiner. “These systems are constantly making judgment calls that affect people’s safety and well-being … whether it’s self-driving cars navigating traffic or recommendation engines moderating content, they’re weighing moral values. And for these systems to be acceptable and successful, that process has to be done appropriately.”
Kleiman-Weiner and his colleagues’ research posed ethical conundrums to 19 different large language models (LLMs) in over 100 languages to explore how closely AI aligns with human judgment and how widely the results differ across cultures.

Posing the “trolley problem” to LLM models
The LLMs were presented with variations on the classic “trolley problem,” a well-known thought exercise. In a “trolley problem,” you’re the driver. You’re operating a runaway vehicle. You must choose whether to save five people by letting one person die, or to do nothing and let the five die. This thought exercise raises questions about how we make tough moral choices.

The researchers built off a large-scale existing data set with human-generated replies to different versions of such dilemmas, including factoring in whether to save one group of people (old vs. young, men vs. women) vs. another. They posed the same questions to the LLMs and found the answers did not necessarily align with how people responded. This is meaningful in that it indicates a gap between AI and human judgment, which can have significant real-world consequences.
“The least aligned models differed from the average person’s answer on nearly every axis,” says Kleiman-Weiner. “The models were more likely to spare people who were higher status, to spare young people, to spare women, and to spare fit people compared to crowdsourced human judgments.”
The scope of the research included detailed documentation of the aggregate differences across six dimensions: age, gender, social status, fitness levels, and species. While this rigorous documentation is valuable, the most compelling insights come from diving into the “why” of the model’s behavior.
Understanding the how and why of AI model responses
“My background is in cognitive science, so I’m interested in understanding how the mind works,” Kleiman-Weiner says. “And we’re able to apply many of the same techniques to these artificial minds … how do these models make moral judgments in such a high-stakes context, and how do these systems explain their behavior?”
Kleiman-Weiner explains that LLMs are rooted in predictive behavior, meaning guessing what word best comes next without moral judgment.
Programmers then layer in “supervised fine-tuning,” making the bots better conversationalists and establishing guardrails against dangerous or offensive responses.
The final step, “reinforcement learning from human feedback,” has the model learn from millions of tests, in which users select the best of one of two answers.

Revealing cultural differences in LLM responses
The second component of the research evaluated how the models differ based on which language is used in the queries.
The study found that language plays a role in these variations, with specific patterns emerging. Researchers sorted over 100 languages into four distinct clusters, each with its own tendencies.
For example, Cluster A, which includes the Filipino language, values animal lives more than the other three clusters. English was grouped with Chinese in Cluster C, which is distinguished by a lower tendency to favor saving women vs. men, though all four models are unanimous in choosing to save female lives at a higher percentage. The researchers did not find significant differences based on “high resource languages” (i.e., those widely spoken) vs. languages with fewer speakers.
“Language matters,” Kleiman-Weiner says of the study’s findings. “If you are operating in an international organization or business, you need to ensure that the way you’re testing these systems before deployment is sensitive to these multilingual sensibilities.”
Alongside his research, Kleiman-Weiner brings these insights into the classroom, teaching a consumer insights class to Foster Full-Time MBA students.
Learn more about Foster faculty research here.
Language Model Alignment in Multilingual Trolley Problems was authored by Zhijing Jin, Max Kleiman-Weiner, Giorgio Piatti, Sydney Levine, Jiarui Liu, Fernando Gonzalez, Francesco Ortu, András Strausz, Mrinmaya Sachan, Rada Mihalcea, Yejin Choi, and Bernhard Schölkopf. It was awarded Best Paper at the NeurIPS 2024 Workshop on Pluralistic Alignment and received a Spotlight at The International Conference on Learning Representations (ICLR).