YesAllInAI
Independent AI model intelligence

The data and trust layer for global AI models.

YesAllInAI tracks model quality, price, context, speed, benchmark movement, community evidence, and ecosystem routing signals for teams choosing what to build with next.

6
Tracked models
9
Benchmarks
1
Open models
7
Data sources
Live model board
Composite score: quality, cost, context, speed, arena
RankModelScoreContextPriceSignal
#1Claude 3.5 Sonnet
Anthropic · Coding
92.1200K$3/$15verified
#2GPT-4o
OpenAI · Frontier
91.4128K$2.5/$10verified
#3Gemini 1.5 Pro
Google · Frontier
89.22M$3.5/$10.5verified
#4Llama 3.1 405B
Meta · Open
87.8128K$2.7/$2.7verified
#5Mistral Large 2
Mistral AI · Open
86.9128K$2/$6verified
Coding leader
Claude 3.5 Sonnet
94 coding score
Best value
DeepSeek-Coder V2
Highest score per blended dollar
Lowest cost watchlist
DeepSeek-Coder V2, Llama 3.1 405B, Mistral Large 2, GPT-4o
Low-cost routing candidates

模型可视化分析

用图表查看质量、成本、延迟、上下文和供应商覆盖情况。

能力画像
价格 / 速度
上下文窗口
供应商分布
性价比指数

LLM intelligence leaderboard

Rankings built for selection, routing, procurement, and community verification.

全部模型
排名模型供应商分数代码数学上下文价格速度
1Claude 3.5 Sonnet
codinglong-contextwriting
Anthropic92.19488200K$3/$1579 tok/s
2GPT-4o
multimodalfastagent-ready
OpenAI91.48889128K$2.5/$10104 tok/s
3Gemini 1.5 Pro
long-contextmultimodalgoogle
Google89.284862M$3.5/$10.562 tok/s
4Llama 3.1 405B
open-weightself-hostablereasoning
Meta87.88586128K$2.7/$2.741 tok/s
5Mistral Large 2
multilingualcodingagents
Mistral AI86.98984128K$2/$686 tok/s
6DeepSeek-Coder V2
codingopen-weightcost-efficient
DeepSeek86.59387128K$0.14/$0.2872 tok/s

Benchmark coverage

Every score should move toward source-linked, reviewable, versioned evidence.

评测类别领先模型最佳分数更新时间
GPQA
Graduate-level science reasoning benchmark.
ReasoningClaude 3.5 Sonnet67.2%2026-05-01
MMLU
Massive multitask language understanding.
KnowledgeGPT-4o88.7%2026-05-01
MMLU-Pro
Harder MMLU variant with more options.
KnowledgeClaude 3.5 Sonnet78.1%2026-05-01
AIME
Competition math reasoning.
MathGPT-4o76.4%2026-05-01
MATH
Multi-level mathematical problem solving.
MathLlama 3.1 405B73.8%2026-05-01
HumanEval
Python function synthesis tasks.
CodingDeepSeek-Coder V290.2%2026-05-01

Trust layer for AI model data

YesAllInAI is designed to combine third-party benchmarks, crawler evidence, reviewer approval, and community submissions before data reaches public rankings.

Contribute data
Verified evals

Official sources, benchmark links, reviewer approval, and versioned scoring.

Crawler + curator

Automated source adapters feed a human review queue before public release.

Community proof

Users submit corrections, model launches, eval notes, and source evidence.

Incentive ready

Contribution reputation prepares the path for future Web3 rewards.

Benchmark heatmap

Category-normalized benchmark proxy scores by model.

Model
GPQA
MMLU
MMLU-Pro
AIME
MATH
HumanEval
MMMU
LiveCodeBench
SWE-Bench Verified
Claude 3.5 Sonnet
92
92
92
88
88
94
86
94
94
GPT-4o
90
91
91
89
89
88
92
88
88
Gemini 1.5 Pro
88
89
89
86
86
84
91
84
84
Llama 3.1 405B
88
88
88
86
86
85
61
85
85
Mistral Large 2
87
87
87
84
84
89
0
89
89
Yes广场

社区正在讨论的 AI 模型动态

查看附近用户、创作者和 AI 团队分享的模型观点、路由策略、评测心得。

进入 Yes广场
YesAllInAI - LLM rankings, benchmarks, and model intelligence