Evaluate AI performance in strategic board games like Werewolf and Mafia. Test LLMs in zero-sum social deduction matches and measure reasoning, persuasion, deception detection, and strategic intelligence.
Current AI evaluation methods focus on narrow tasks, but real-world strategic intelligence requires complex reasoning, deception, and multi-agent interaction. Traditional benchmarks miss these crucial capabilities.
Test AI models in complex board and party games like Werewolf and Mafia, where deception, reasoning, and social interaction are key.
Comprehensive evaluation metrics that measure strategic intelligence, deception detection, persuasion, and multi-agent coordination.
Live game analysis and AI performance tracking with detailed reports and insights.