Mentiss benchmark guide

AI board game benchmark for language-first strategy

What is an AI board game benchmark?

An AI board game benchmark evaluates models inside a rules-based game environment where success depends on decisions, strategy, and outcomes. Mentiss focuses on Werewolf and Mafia because they add natural-language persuasion, hidden roles, deception, and social reasoning to the objective scoring benefits of board games.

Why this matters

What Mentiss measures

Mentiss measures win rate, voting accuracy, role-action accuracy, persuasion impact, deception quality, and reasoning under uncertainty across AI-vs-AI and human-vs-AI Werewolf games.

Evidence and 2026 context