Back to Tools

AI LLM Leaderboard

Compare the latest Large Language Models across multiple benchmarks and performance metrics

Total Models
71

Latest models included

Benchmarks
8

Comprehensive tests

Top Score
99.0%

Kimi K2.5 (HumanEval)

Fastest Model
250

TPS (GPT-4o mini Realtime)

Filters & Search
Model Rankings - 71 Models
#OrganizationCategoryMMLUGPQAMMMUHellaSwagHumanEvalBBHardGSM8KMATHCost/1KTPSContextTrend
1
Gemini 3.1 Pro
2026-02
Top GPQABest ARC-AGI-2Best Value Frontier
New
Googlemultimodal
92.2%
93.1%94.3%85.2%N/AN/AN/AN/A96.1%$0.002602M
2
GPT-5.4
2026-03
Best Computer UseTop OSWorld
New
OpenAIreasoning
91.6%
93%92.8%83.5%N/AN/AN/AN/A97%$0.0025701M
3
Claude Opus 4.6
2026-02
Best SWE-BenchTop Coding128K Output
New
Anthropiccoding
91.3%
92.4%91.3%85.1%N/AN/AN/AN/A95.2%$0.005451M
4
Grok 4
2026-01
Best HLEMulti-AgentReal-time X Data
New
xAIreasoning
90.2%
92.7%84.6%N/AN/AN/AN/AN/A93.3%$0.00275128K
5
Kimi K2.5
2026-01
Open SourceBest HumanEval OpenTop SWE-Bench Open
New
Moonshot AIcoding
89.5%
92%87.6%N/AN/A99%N/AN/A98%$0.001585262K
6
Claude Sonnet 4.6
2026-02
Best GDPval-AANear-Opus Performance
New
Anthropicreasoning
88.8%
91%88.5%82%N/AN/AN/AN/A93.5%$0.003801M
7
o1
2024-09
Top MMLUPremium
OpenAIreasoning
88.4%
92.3%78%N/AN/AN/AN/AN/A94.8%$0.0615128K
8
GLM-5
2026-01
Open SourceMIT LicenseBest Chatbot Arena Open
New
Zhipu AIcoding
88%
92%86%N/AN/A96.5%N/AN/A94.5%$0.00180200K
9
Qwen 3.5 397B
2026-02
Open SourceApache 2.0Top Open GPQA
New
Alibabareasoning
87.7%
91.8%88.4%N/AN/A94.2%N/AN/A96.5%$0.00190256K
10
MiniMax M2.5
2026-01
Open SourceBest SWE-Bench Open
New
MiniMaxcoding
87.5%
90.8%85.2%N/AN/A94.5%N/AN/A96%$0.001275128K
11
DeepSeek R1
2025-01
Best MATHBest Reasoning
New
DeepSeekreasoning
86.5%
90.8%71.5%N/AN/AN/AN/AN/A97.3%$0.0086564K
12
o3-mini
2024-12
Fastest TPSBest HumanEval
New
OpenAIcoding
86%
86%75%N/AN/A97%N/AN/AN/A$0.02189128K
13
Claude 3.7 Sonnet
2024-11
Best MATH
New
Anthropicreasoning
85.5%
86.1%84.8%75%N/AN/AN/AN/A96.2%$0.0265200K
14
o4-mini
2024-12
Best MATH
New
OpenAIreasoning
85.2%
N/A81.4%81.6%N/AN/AN/AN/A92.7%$0.0285128K
15
Gemini 2.5 Pro
2024-12
LatestHigh GPQA
New
Googlemultimodal
85.2%
89.8%84%81.7%N/AN/AN/AN/AN/A$0.02551M
16
o3
2024-12
Top MATH
New
OpenAIreasoning
85%
N/A83.3%82.9%N/AN/AN/AN/A88.9%$0.0618128K
17
o1-preview
2024-09
Preview
OpenAIreasoning
84.9%
90.8%78.3%N/AN/AN/AN/AN/A85.5%$0.04520128K
18
DeepSeek V3.2
2025-12
Open SourceMIT LicenseBest Budget
New
DeepSeekcoding
84.2%
90.5%72.1%N/AN/A91.5%N/AN/A95%$0.000149564K
19
DeepSeek V3 (0324)
2025-03
Open SourceUpdatedMIT License
New
DeepSeekcoding
84.2%
89%68.4%N/AN/A87.5%N/AN/A92%$0.0049564K
20
DeepSeek R1 Zero
2025-01
Open SourceRL-Only TrainingMIT License
New
DeepSeekreasoning
83.8%
88.4%67%N/AN/AN/AN/AN/A95.9%$0.0056064K
21
Llama 4 Behemoth
2025-07
Open SourceLargest LlamaTop MMLU Open
New
Metareasoning
83.4%
91.5%74.2%78.3%N/AN/AN/AN/A89.5%$0.0125256K
22
Claude 3.5 Sonnet
2024-06
User's ChoiceBest GSM8K
Anthropicreasoning
82.3%
88.7%59.4%68.3%89%92%93.1%96.4%71.1%$0.015170200K
23
GPT-4o
2024-05
Least LatencyMultimodal
OpenAImultimodal
82.2%
88.7%53.6%69.1%94.2%90.2%91.3%89.8%76.6%$0.01585128K
24
o1-mini
2024-09
Good Coding
OpenAIcoding
81.9%
85.2%60%N/AN/A92.4%N/AN/A90%$0.02545128K
25
Gemini 2.0 Flash
2024-12
FastGood Performance
New
Googlemultimodal
81.8%
87%59%N/AN/A91%N/AN/A90%$0.011101M
26
Claude Opus 4
2024-12
LatestPremium
New
Anthropicreasoning
81%
88.8%83.3%76.5%N/AN/AN/AN/A75.5%$0.04545200K
27
Gemini 2.0 Flash Thinking
2025-02
Extended ThinkingBest Budget Reasoning
New
Googlereasoning
80.7%
85%70.3%73.8%N/AN/AN/AN/A93.5%$0.0035501M
28
Grok 3 Mini
2025-02
Extended ThinkingCost-Effective
New
xAIreasoning
80.7%
83%69.7%N/AN/AN/AN/AN/A89.5%$0.003100128K
29
DeepSeek V3
2024-12
Open SourceGood MATH
New
DeepSeekcoding
80.1%
88.5%59.1%N/AN/A82.6%N/AN/A90.2%$0.0049564K
30
Claude 3.5 Sonnet v2
2025-02
Computer UseTop SWE-BenchUpgraded
New
Anthropicreasoning
79.3%
88.7%65%70.7%N/A93.7%N/AN/A78.3%$0.01575200K
31
Llama 3.1 405B
2024-07
Open SourceLargest Open
Metareasoning
78.9%
88.6%51.1%64.5%87%89%81.3%96.8%73.8%$0.01535128K
32
Claude Sonnet 4
2024-12
LatestHigh GPQA
New
Anthropicreasoning
78.8%
86.5%83.8%74.4%N/AN/AN/AN/A70.5%$0.02560200K
33
Qwen 2.5-Max
2025-02
MoE ArchitectureTop Chinese Open
New
Alibabareasoning
78.1%
87%52.5%N/AN/A88%N/AN/A85%$0.001695128K
34
GPT-4 Turbo
2024-04
Highly PreferredBalanced
OpenAIreasoning
77.6%
86.5%48%63.1%94.2%90.2%87.6%91%72.2%$0.0345128K
35
Llama 3.1 Nemotron 70B
2025-01
Open SourceRLHF TunedTop Arena
New
NVIDIAreasoning
77.5%
85%55.8%N/AN/A90%N/AN/A79%$0.003570128K
36
GPT-4o (2025)
2025-05
UpdatedBest VoiceImage Gen
New
OpenAImultimodal
77.2%
89.5%55%70.2%N/A91.5%N/AN/A80%$0.012590128K
37
Claude 3 Opus
2024-03
PremiumComplete Benchmarks
Anthropicreasoning
77.2%
86.8%50.4%59.4%95.4%84.9%86.8%95%60.1%$0.04545200K
38
GPT-4.1
2024-11
Latest GPT
New
OpenAIreasoning
77.1%
90.2%66.3%74.8%N/AN/AN/AN/AN/A$0.0450128K
39
Gemini 2.0 Pro Experimental
2024-12
ExperimentalGood MATH
New
Googlemultimodal
77.1%
79.1%64.7%72.7%N/AN/AN/AN/A91.8%$0.015601M
40
Qwen 2.5 72B
2025-01
Open SourceApache 2.0Best Open Coding
New
Alibabacoding
76.4%
86.1%49%N/AN/A87.2%N/AN/A83.1%$0.000988128K
41
Claude 3.7 Sonnet (Normal)
2024-11
Balanced
New
Anthropicreasoning
76.3%
83.2%68%71.8%N/AN/AN/AN/A82.2%$0.01585200K
42
Phi-4
2025-01
Open SourceBest-in-Class 14BSTEM Strong
New
Microsoftreasoning
75.9%
84.8%56.1%N/AN/A82.6%N/AN/A80.4%$0.000712016K
43
Llama 4 Maverick
2024-12
Open Source
New
Metareasoning
75.9%
84.6%69.8%73.4%N/AN/AN/AN/AN/A$0.00585128K
44
Llama 3.3 70B
2024-10
Open SourceGood Coding
Metacoding
75.5%
86%50.5%N/AN/A88.4%N/AN/A77%$0.00690128K
45
Mistral Large 2
2025-01
Open WeightsMultilingualFunction Calling
New
Mistral AIreasoning
75.4%
84%49.6%N/AN/A92%N/AN/A76%$0.00680128K
46
Mistral Medium 3
2025-05
EnterpriseMultilingualNew
New
Mistral AIreasoning
75.2%
83.5%51%N/AN/A90%N/AN/A76.5%$0.00495128K
47
GPT-4.1 mini
2024-11
Cost-Effective
New
OpenAIreasoning
75.1%
87.5%65%72.7%N/AN/AN/AN/AN/A$0.01595128K
48
Grok-2
2024-08
Good Coding
xAIcoding
74.8%
87.5%56%66.1%N/A88.4%N/AN/A76.1%$0.0175128K
49
Grok 3
2024-12
Latest
New
xAIreasoning
74.3%
N/A75.4%73.2%N/AN/AN/AN/AN/A$0.01270128K
50
Gemini 1.5 Pro
2024-02
Largest ContextComplete Benchmarks
Googlemultimodal
73.6%
81.9%46.2%62.2%92.5%71.9%84%91.7%58.5%$0.0125382M
51
Gemini 2.5 Flash Lite
2024-12
Latest
New
Googlemultimodal
71.8%
84.5%66.7%72.9%N/AN/AN/AN/A63.1%$0.01751M
52
GPT-4
2023-03
Most ExpensiveClassic
OpenAIreasoning
71.4%
86.4%35.7%56.8%95.3%67%83.1%92%52.9%$0.18258K
53
Claude 3.5 Haiku (2025)
2025-04
UpdatedFastest ClaudeComputer Use
New
Anthropicconversation
70.3%
73.5%43.2%N/AN/A90.5%N/AN/A74%$0.004150200K
54
Llama 3.2 90B
2024-09
Open Source
Metareasoning
69.6%
86%46.7%60.3%N/AN/AN/A86.9%68%$0.00880128K
55
Command R+ (2025)
2025-03
RAG OptimizedTool UseEnterprise
New
Coherereasoning
69.4%
82.3%46%N/AN/A80.5%N/AN/A68.9%$0.002585128K
56
Claude 3 Sonnet
2024-03
BalancedComplete Benchmarks
Anthropicreasoning
69.1%
79%40.4%53.1%89%73%82.9%92.3%43.1%$0.01290200K
57
Gemini 1.5 Flash
2024-05
FastComplete Benchmarks
Googlemultimodal
68.6%
78.9%39.5%56.1%81.3%67.5%89.2%68.8%67.7%$0.008951M
58
Mistral Small 3
2025-03
Open WeightsUltra EfficientApache 2.0
New
Mistral AIconversation
68.1%
81.5%42%N/AN/A83%N/AN/A66%$0.00113032K
59
Gemma 3 27B
2025-03
Open SourceMultimodalApache 2.0
New
Googlemultimodal
67.9%
78.4%42%68.5%N/A79%N/AN/A71.5%$0.0003110128K
60
GPT-4o mini
2024-07
Cost-Effective
OpenAIconversation
67.8%
82%40.2%59.4%N/A87.2%N/AN/A70.2%$0.007120128K
61
Llama 4 Scout
2024-12
Least ExpensiveOpen Source
New
Metaconversation
67%
74.3%57.2%69.4%N/AN/AN/AN/AN/A$0.0003120128K
62
Amazon Nova Pro
2025-01
AWS NativeMultimodalCost-Effective
New
Amazonmultimodal
66.9%
80%44%63.5%N/A79%N/AN/A68%$0.0008100300K
63
Claude 3.5 Haiku
2024-11
FastCost-Effective
New
Anthropicconversation
66%
65%41.6%N/AN/A88.1%N/AN/A69.2%$0.005140200K
64
Claude 3 Haiku
2024-03
FastComplete Benchmarks
Anthropicconversation
65.3%
75.2%33.3%50.2%85.9%75.9%73.7%88.9%38.9%$0.004160200K
65
Phi-4 Mini
2025-04
Open SourceEdge Deployable3.8B Params
New
Microsoftconversation
63.4%
75.6%37.3%N/AN/A73%N/AN/A67.5%$0.000120016K
66
GPT-4.1 nano
2024-11
Ultra Fast
New
OpenAIconversation
61.9%
80.1%50.3%55.4%N/AN/AN/AN/AN/A$0.00515032K
67
Gemma 3 12B
2025-03
Open SourceLightweightOn-Device
New
Googleconversation
60.2%
74.2%37%59.8%N/A72%N/AN/A58%$0.0001160128K
68
Amazon Nova Lite
2025-01
AWS NativeUltra FastCheapest Multimodal
New
Amazonconversation
56.6%
73%33%55%N/A68%N/AN/A54%$0.00006180300K
69
o3-pro
2025-01
Upcoming
New
OpenAIreasoning
N/A
N/AN/AN/AN/AN/AN/AN/AN/A$0.0820128K
70
GPT-4o Realtime
2024-10
RealtimeVoice
OpenAIconversation
N/A
N/AN/AN/AN/AN/AN/AN/AN/A$0.02200128K
71
GPT-4o mini Realtime
2024-10
RealtimeVoiceCost-Effective
OpenAIconversation
N/A
N/AN/AN/AN/AN/AN/AN/AN/A$0.008250128K

Try our other free tools!

Explore more powerful AI tools to enhance your productivity and creativity.

GENERATOR

AI FAQ GENERATOR

AI FAQ GENERATOR
Generate comprehensive FAQ sections for your website or product using AI. Create helpful answers to common questions automatically.
GENERATOR

AI ANSWER GENERATOR

AI ANSWER GENERATOR
Create intelligent, contextual answers to any question or query. Perfect for customer support and knowledge base creation.
EDITOR

AI HUMANIZE TEXT

AI HUMANIZE TEXT
Transform robotic or AI-generated text into natural, human-sounding language. Improve relatability and tone with just one click.
GENERATOR

AI EMAIL RESPONSE GENERATOR

AI EMAIL RESPONSE GENERATOR
Generate professional email responses tailored to your specific needs. Save time with smart, contextual email automation.

STOP ANSWERING REPETITIVE QUESTIONS MANUALLY.

Let WhisperChat handle common support instantly — while you stay in control.

START FREE
WHISPERCHAT AI
Trusted by growing 700+ businesses to reduce support workload without hiring
© 2026 WHISPERCHAT AIBACK TO TOP