Is Sakana AI's Fugu multi-LLM orchestrator a genuine frontier-matching breakthrough, or does it contain false claims and underperform in practice?
Experts disagree — here's where each side lands:
Yes — legitimate frontier-matching system
Fugu routes across swappable model pools to match frontier models while bypassing export-control risks, a real engineering advance.
No — contains verifiable false claims
Sakana AI's Fugu Ultra announcement contains a verifiable false claim, echoing prior fraudulent 'AI Scientist' papers.
No — underperforms frontier models in real use
Fugu ensemble router takes 30+ minutes per coding test and underperforms frontier models like Claude in real-world use.
What to do: Independently verify Fugu's benchmark claims before adopting it for production workloads; the latency and credibility concerns are concrete, while the capability claims remain contested.