Real A/B Tests
AI Guessing vs Researched Facts
Same questions. Same models. Different knowledge sources.
See the proof.
2 detailed tests
110 questions tested
3 models tested
50×
Faster than web search
20×
Fewer tokens used
100%
Correct on Bun benchmark
We're not claiming perfection—no one in AI can. But when every answer is researched from official sources, your AI stops guessing and starts knowing.
Detailed Test Results
Click through to see full methodology and every question tested
Bun 1.3 Runtime
December 2025Advanced Bun questions covering Redis, S3, WebSocket, SQLite, Workers, and more. Tests both training data and web search capabilities.
AgentsKB
50/50
With Web Search
47-49/50
Training Only
6-21/50
All three models got the same question wrong with web search - they misread the Redis URL priority order from the docs.
Models tested:
Opus 4.5 Sonnet 4.5 Haiku 4.5
| 50 questions PostgreSQL: Basics + PG 17/18
December 202550 questions on PostgreSQL basics (defaults, limits, internals) + 10 questions on PostgreSQL 17/18 features released after model training cutoffs.
AgentsKB
60/60
With Web Search
58-59/60
Training Only
44-53/60
On basics: 84-98% accurate. On PG 17/18 post-cutoff features: 20-40%. All models hallucinated a fake utility (pg_overwritecontrolfile) that doesn't exist.
Models tested:
Opus 4.5 Sonnet 4.5 Haiku 4.5
| 60 questions The Core Insight
Why pre-researched answers beat real-time search
Web Search (Mid-Task)
- Model is distracted - solving your problem, not researching deeply
- Grabs first reasonable result and moves on
- Can misread docs (like the Bun Redis URL order)
- ~10 seconds, 10-15K tokens per search
AgentsKB (Pre-Researched)
- Research done with full focus on each question
- Multiple sources consulted and synthesized
- Interpretation done once, correctly
- <1s, ~500 tokens per answer
It's not a tradeoff. Pre-researched answers are better accuracy AND faster AND cheaper.