YesAllInAI
排行榜

AI 排行榜

按使用场景、评测家族和运行约束浏览模型排名。

Model analytics

Visual signals across quality, cost, latency, context, and provider coverage.

Capability profile
Price vs speed
Context window
Provider mix
Value index

Benchmark heatmap

Category-normalized benchmark proxy scores by model.

Model
GPQA
MMLU
MMLU-Pro
AIME
MATH
HumanEval
MMMU
LiveCodeBench
SWE-Bench Verified
Claude 3.5 Sonnet
92
92
92
88
88
94
86
94
94
GPT-4o
90
91
91
89
89
88
92
88
88
Gemini 1.5 Pro
88
89
89
86
86
84
91
84
84
Llama 3.1 405B
88
88
88
86
86
85
61
85
85
Mistral Large 2
87
87
87
84
84
89
0
89
89
排名模型供应商分数代码数学上下文价格速度
1Claude 3.5 Sonnet
codinglong-contextwriting
Anthropic92.19488200K$3/$1579 tok/s
2GPT-4o
multimodalfastagent-ready
OpenAI91.48889128K$2.5/$10104 tok/s
3Gemini 1.5 Pro
long-contextmultimodalgoogle
Google89.284862M$3.5/$10.562 tok/s
4Llama 3.1 405B
open-weightself-hostablereasoning
Meta87.88586128K$2.7/$2.741 tok/s
5Mistral Large 2
multilingualcodingagents
Mistral AI86.98984128K$2/$686 tok/s
6DeepSeek-Coder V2
codingopen-weightcost-efficient
DeepSeek86.59387128K$0.14/$0.2872 tok/s
YesAllInAI - LLM rankings, benchmarks, and model intelligence