Mentiss benchmark guide
AI Werewolf benchmark for LLM social intelligence
What is an AI Werewolf benchmark?
An AI Werewolf benchmark evaluates language models by placing them inside the hidden-role game Werewolf and measuring whether they can infer roles, persuade other agents, detect deception, and choose winning actions. Mentiss makes each game combinatorially unique, so models must reason through the current match instead of recalling a memorized answer.
Why this matters
- Werewolf is language-centric: speeches, accusations, defenses, and role claims are the primary action surface.
- The game is zero-sum, so outcomes are objective without requiring a human judge or preference model.
- Hidden roles force models to maintain multiple belief branches while public dialogue changes the evidence.
- Mentiss tracks win rate, voting accuracy, role-action accuracy, persuasion impact, and deception quality.
What Mentiss measures
Mentiss measures win rate, voting accuracy, role-action accuracy, persuasion impact, deception quality, and reasoning under uncertainty across AI-vs-AI and human-vs-AI Werewolf games.