4Mathematics -- Search, Discover, Learn Mathematics

News

Bench LM
benchlm. ai > compare > gpt-5-4-vs-gpt-oss-120b

GPT-5. 4 vs GPT-OSS 120 B: Benchmarks, Pricing, Speed (July 2026)

13+ hour, 36+ min ago (340+ words) Head-to-head evidence from 20 shared benchmark results across 6 categories. Overall scores shown here use Bench LM's provisional ranking lane. Evidence parity. GPT-5. 4 and GPT-OSS 120 B share 20 comparable benchmark results. 0 of 8 categories are comparable. 34 results are unique to GPT-5. 4; 8 to GPT-OSS 120 B. Benchmark…...

Symbols: xhigh,moe,raas

Bench LM
benchlm. ai > benchmarks > frontier Math V2 Tiers13

Frontier Math v2 (Tiers 1-3) Leaderboard & Scores " July 2026

2+ day, 17+ hour ago (292+ words) As of July 9, 2026, GPT-5. 6 Sol leads the Frontier Math v2 (Tiers 1-3) leaderboard with 89. 000%, followed by GPT-5. 6 Terra (84. 900%) and GPT-5. 6 Luna (78. 600%). According to Bench LM. ai, GPT-5. 6 Sol leads the Frontier Math v2 (Tiers 1-3) benchmark with a score of 89. 000%, followed by GPT-5. 6 Terra (84. 900%) and…...

Symbols: btc-usd,otcmkts:spasf,shift-2

Bench LM
benchlm. ai > benchmarks > frontier Math

Frontier Math Benchmark 2026: 4 LLM scores

2+ mon, 17+ hour ago (239+ words) As of May 11, 2026, GPT-5. 5 Pro leads the Frontier Math leaderboard with 52. 4%, followed by GPT-5. 5 (51. 7%) and GPT-5. 4 Pro (50%). According to Bench LM. ai, GPT-5. 5 Pro leads the Frontier Math benchmark with a score of 52. 4%, followed by GPT-5. 5 (51. 7%) and GPT-5. 4 Pro (50%). The top…...

Symbols: nasdaq:crwv

Bench LM
benchlm. ai > multimodal-grounded

Multimodal & Grounded Benchmarks 2026: MMMU, Office QA, Char Xiv

2+ mon, 4+ day ago (558+ words) Bottom line: Multimodal is one of the fastest-evolving categories. Models that can read screenshots, charts, and documents are essential for enterprise copilots. Bench LM summaries for multimodal & grounded plus the practical tradeoffs users check next: open weights, price, speed, latency,…...

Symbols: btc-usd

Bench LM
benchlm. ai > benchmarks > swe Verified

SWE-bench Verified Benchmark 2026: 44 LLM scores

3+ mon, 4+ day ago (267+ words) As of May 1, 2026, Claude Mythos Preview leads the SWE-bench Verified leaderboard with 93. 9%, followed by Claude Opus 4. 7 (Adaptive) (87. 6%) and GPT-5. 3 Codex (85%). Claude Opus 4. 7 (Adaptive) According to Bench LM. ai, Claude Mythos Preview leads the SWE-bench Verified benchmark with a score of 93. 9%, followed…...

Symbols: btc-usd,eth-usd,nasdaq:crwv