排行榜

AI 排行榜

按使用场景、评测家族和运行约束浏览模型排名。

AI Leaderboards

Overall model intelligence across tracked signals.

LLM Leaderboard

General-purpose language models ranked by composite score.

Open LLM Leaderboard

Open-weight and self-hostable models.

Best AI for Coding

Repository, coding challenge, and agentic software tasks.

Best AI for Math

Formal and competition-style math performance.

Best AI for Image Generation

Vision and multimodal proxy coverage for image workflows.

Best AI for Writing

Long-form writing, editing, and style control.

分数: writing

Model analytics

Visual signals across quality, cost, latency, context, and provider coverage.

Capability profile

Price vs speed

Context window

Provider mix

Value index

Benchmark heatmap

Category-normalized benchmark proxy scores by model.

Model

GPQA

MMLU

MMLU-Pro

AIME

MATH

HumanEval

MMMU

LiveCodeBench

SWE-Bench Verified

Claude 3.5 Sonnet

92

92

92

88

88

94

86

94

94

GPT-4o

90

91

91

89

89

88

92

88

88

Gemini 1.5 Pro

88

89

89

86

86

84

91

84

84

Llama 3.1 405B

88

88

88

86

86

85

61

85

85

Mistral Large 2

87

87

87

84

84

89

0

89

89

排名	模型	供应商	分数	代码	数学	上下文	价格	速度
1	Claude 3.5 Sonnet codinglong-contextwriting	Anthropic	92.1	94	88	200K	$3/$15	79 tok/s
2	GPT-4o multimodalfastagent-ready	OpenAI	91.4	88	89	128K	$2.5/$10	104 tok/s
3	Gemini 1.5 Pro long-contextmultimodalgoogle	Google	89.2	84	86	2M	$3.5/$10.5	62 tok/s
4	Llama 3.1 405B open-weightself-hostablereasoning	Meta	87.8	85	86	128K	$2.7/$2.7	41 tok/s
5	Mistral Large 2 multilingualcodingagents	Mistral AI	86.9	89	84	128K	$2/$6	86 tok/s
6	DeepSeek-Coder V2 codingopen-weightcost-efficient	DeepSeek	86.5	93	87	128K	$0.14/$0.28	72 tok/s

YesAllInAI - LLM rankings, benchmarks, and model intelligence