排行榜
AI 排行榜
按使用场景、评测家族和运行约束浏览模型排名。
AI Leaderboards
Overall model intelligence across tracked signals.
分数: score
LLM Leaderboard
General-purpose language models ranked by composite score.
分数: score
Open LLM Leaderboard
Open-weight and self-hostable models.
分数: score
Best AI for Coding
Repository, coding challenge, and agentic software tasks.
分数: coding
Best AI for Math
Formal and competition-style math performance.
分数: math
Best AI for Image Generation
Vision and multimodal proxy coverage for image workflows.
分数: vision
Best AI for Writing
Long-form writing, editing, and style control.
分数: writing
Model analytics
Visual signals across quality, cost, latency, context, and provider coverage.
Capability profile
Price vs speed
Context window
Provider mix
Value index
Benchmark heatmap
Category-normalized benchmark proxy scores by model.
Model
GPQA
MMLU
MMLU-Pro
AIME
MATH
HumanEval
MMMU
LiveCodeBench
SWE-Bench Verified
Claude 3.5 Sonnet
92
92
92
88
88
94
86
94
94
GPT-4o
90
91
91
89
89
88
92
88
88
Gemini 1.5 Pro
88
89
89
86
86
84
91
84
84
Llama 3.1 405B
88
88
88
86
86
85
61
85
85
Mistral Large 2
87
87
87
84
84
89
0
89
89
| 排名 | 模型 | 供应商 | 分数 | 代码 | 数学 | 上下文 | 价格 | 速度 |
|---|---|---|---|---|---|---|---|---|
| 1 | Claude 3.5 Sonnet codinglong-contextwriting | Anthropic | 92.1 | 94 | 88 | 200K | $3/$15 | 79 tok/s |
| 2 | GPT-4o multimodalfastagent-ready | OpenAI | 91.4 | 88 | 89 | 128K | $2.5/$10 | 104 tok/s |
| 3 | Gemini 1.5 Pro long-contextmultimodalgoogle | 89.2 | 84 | 86 | 2M | $3.5/$10.5 | 62 tok/s | |
| 4 | Llama 3.1 405B open-weightself-hostablereasoning | Meta | 87.8 | 85 | 86 | 128K | $2.7/$2.7 | 41 tok/s |
| 5 | Mistral Large 2 multilingualcodingagents | Mistral AI | 86.9 | 89 | 84 | 128K | $2/$6 | 86 tok/s |
| 6 | DeepSeek-Coder V2 codingopen-weightcost-efficient | DeepSeek | 86.5 | 93 | 87 | 128K | $0.14/$0.28 | 72 tok/s |