Mentiss benchmark guide

AI Werewolf benchmark for LLM social intelligence

What is an AI Werewolf benchmark?

An AI Werewolf benchmark evaluates language models by placing them inside the hidden-role game Werewolf and measuring whether they can infer roles, persuade other agents, detect deception, and choose winning actions. Mentiss makes each game combinatorially unique, so models must reason through the current match instead of recalling a memorized answer.

Why this matters

Werewolf is language-centric: speeches, accusations, defenses, and role claims are the primary action surface.
The game is zero-sum, so outcomes are objective without requiring a human judge or preference model.
Hidden roles force models to maintain multiple belief branches while public dialogue changes the evidence.
Mentiss tracks win rate, voting accuracy, role-action accuracy, persuasion impact, and deception quality.

What Mentiss measures

Mentiss measures win rate, voting accuracy, role-action accuracy, persuasion impact, deception quality, and reasoning under uncertainty across AI-vs-AI and human-vs-AI Werewolf games.

AI Werewolf benchmark for LLM social intelligence

What is an AI Werewolf benchmark?

Why this matters

What Mentiss measures

Evidence and 2026 context