Which AI model fits your Swiss use case?

10 models. 6 dimensions. 3 languages. 395 scenarios. Updated quarterly.

Last updated: Q1 2026 · Swiss-Bench v2.0

Overall Model Rankings

#ModelHAAS
Swiss-Bench AI Model Rankings, Q1 2026 (10 models)
Rank Model Type HAAS Status Best At Updated
1 Gemini 2.5 Flash Closed Source 60.1 Ready Documentation Q1 2026
2 Qwen 3.5 Plus Open Source 59.4 Ready Safety Q1 2026
3 Claude Sonnet 4 Closed Source 58.3 Ready Compliance Q1 2026
4 GLM 5 Open Source 55.5 Evaluate Documentation Q1 2026
5 MiniMax M2.5 Open Source 50.2 Evaluate Swiss Languages Q1 2026
6 GPT-oss 120B Open Source 49.6 Evaluate Compliance Q1 2026
7 MiMo-V2-Flash Open Source 48.7 Evaluate Performance Q1 2026
8 DeepSeek V3 Open Source 48.4 Gap Compliance Q1 2026
9 GPT-4o Closed Source 48.2 Gap Robustness Q1 2026
10 Mistral Large 3 Open Source 47.4 Gap Swiss Languages Q1 2026

HAAS Dimensions: D1 Performance (25%) · D2 Robustness (20%) · D3 Safety (15%) · D4 Compliance (20%) · D5 Swiss Language (10%) · D6 Documentation (10%)

Each model is ranked by HAAS composite score and classified using percentile ranking: top 30% = Ready, middle 40% = Evaluate, bottom 30% = Gap.

Swiss-Bench v2.0: 395 scenarios across Swiss Legal, FINMA Regulatory, and SFAO Audit domains. HAAS = Helvetic AI Assurance Score (6 dimensions, 0–100). Percentile-based classification across 10 models. Methodology →

Q1 2026 Highlights

Most Ready
Gemini 2.5 Flash
Highest HAAS score (60.1) across all 6 dimensions. Strongest in Documentation.
Best Open Source
Qwen 3.5 Plus
Top open-weight model (HAAS 59.4). Viable for on-premise deployment with full data sovereignty.
Strongest Compliance
Claude Sonnet 4
Highest D4 Compliance dimension score (80.1). Best fit for regulated environments requiring audit trail compliance.

Based on Swiss-Bench v2.0 (Q1 2026). 395 scenarios, 3-judge panel, structured scoring. Updated quarterly.

Dimension, Language & Domain Breakdowns

HAAS Dimension Breakdown

Model D1 Perf. D2 Robust. D3 Safety D4 Compl. D5 Lang. D6 Doc.
Gemini 2.5 Flash 53.3 72.1 20.6 70.8 100 51.5
Qwen 3.5 Plus 51.5 77.1 33.3 55 100 51.1
Claude Sonnet 4 41.2 88.4 9.5 80.1 93.6 35.2
GLM 5 44.2 76.5 13.5 68.1 92.2 42.5
MiniMax M2.5 37.4 71.7 6.3 67.9 94.4 25.4
GPT-oss 120B 31.5 78.9 2.4 72.8 93.1 16.8
MiMo-V2-Flash 38.8 67.8 3.2 68.8 89.3 22.3
DeepSeek V3 35.9 67.8 2.4 69.4 89 27.5
GPT-4o 19.2 91.9 11.1 63.8 74.9 31.3
Mistral Large 3 17.9 77.3 7.9 70.1 100 22.3

Visual Comparison

Gemini 2.5 Flash
D1
D2
D3
D4
D5
D6
Qwen 3.5 Plus
D1
D2
D3
D4
D5
D6
Claude Sonnet 4
D1
D2
D3
D4
D5
D6
GLM 5
D1
D2
D3
D4
D5
D6
MiniMax M2.5
D1
D2
D3
D4
D5
D6
GPT-oss 120B
D1
D2
D3
D4
D5
D6
MiMo-V2-Flash
D1
D2
D3
D4
D5
D6
DeepSeek V3
D1
D2
D3
D4
D5
D6
GPT-4o
D1
D2
D3
D4
D5
D6
Mistral Large 3
D1
D2
D3
D4
D5
D6

Per-Language Comparison

Model German (DE) French (FR) Italian (IT)
Gemini 2.5 Flash 39.7% 41.9% 52.6%
Qwen 3.5 Plus 45.3% 41.6% 51.5%
Claude Sonnet 4 27.3% 33.4% 42.8%
GLM 5 34.3% 33.1% 42.8%
MiniMax M2.5 26% 24.7% 34.5%
GPT-oss 120B 16% 19.6% 28.9%
MiMo-V2-Flash 20% 24.7% 29.4%
DeepSeek V3 18% 25.7% 39.2%
GPT-4o 16% 25% 33.5%
Mistral Large 3 14.7% 19.6% 27.3%

Per-Domain Comparison

Model Swiss Legal FINMA SFAO Audit
Gemini 2.5 Flash 71.0% 24.2% 19.8%
Qwen 3.5 Plus 70.7% 29.2% 16.7%
Claude Sonnet 4 60.4% 12.9% 14.6%
GLM 5 62.1% 16.9% 14.6%
MiniMax M2.5 50.6% 9.0% 15.6%
GPT-oss 120B 42.6% 4.8% 1.0%
MiMo-V2-Flash 48.2% 6.2% 5.2%
DeepSeek V3 50.0% 9.0% 5.2%
GPT-4o 44.7% 8.7% 5.2%
Mistral Large 3 34.6% 9.3% 5.2%

Get the full Swiss-Bench breakdown

HAAS dimension scores, per-language and per-domain comparisons with traffic-light classifications for all 10 models.

No spam. We only use your email to send quarterly updates. Unsubscribe anytime.
Swiss-Bench methodology and scoring criteria are documented on our Methodology page →

Our methodology, expert-verified ground truth, and statistical framework are described in our published ArXiv paper (Uenal, 2026).

Need scores for YOUR domain? Our AI Model Evaluation runs Swiss-Bench against your specific use case. 5-model comparison, domain-specific scenarios, actionable recommendation.

Ready for an independent evaluation?

Start with an AI Model Evaluation or a full SOTA Model Sweep. Within two weeks you'll know which model works best for your Swiss use case.

Evaluation from CHF 8,000 · SOTA Sweep from CHF 20,000
contact@ai-helvetic.ch