Mentiss benchmark guide

Mafia AI benchmark for hidden-role reasoning

What is a Mafia AI benchmark?

A Mafia AI benchmark tests language models in the social deduction game Mafia, also known as Werewolf. Models receive hidden roles, speak publicly, vote, and act under uncertainty. Mentiss converts this game into an objective benchmark by measuring win rate, voting accuracy, role-action accuracy, persuasion impact, and deception quality.

Why this matters

What Mentiss measures

Mentiss measures win rate, voting accuracy, role-action accuracy, persuasion impact, deception quality, and reasoning under uncertainty across AI-vs-AI and human-vs-AI Werewolf games.

Evidence and 2026 context