Meta’s Llama 4 is mindblowing… but did it cheat?

Updated: April 26, 2025

Fireship


Summary

Meta recently introduced the Llama family of large language models with an impressive context window of 10 million tokens, quickly rising to the top of the LM Arena leaderboard. However, controversy arose as it was discovered that Meta manipulated rankings by fine-tuning the models, sparking skepticism in the community. Additionally, the rollout of Llama 4 deviated from expected policies, leading to a stir in the AI industry. The CEO of Shopify addressed concerns around AI and emphasized the importance of continuous learning in this rapidly evolving field. Moreover, comparison between open models like Scout and Gemini showcased their performance on benchmarks and real-world applications, while Augment emerged as an AI agent for large-scale code tasks, promising to enhance code quality and productivity significantly.


Meta's Unleashed Model

Meta unleashed the Llama family of large language models with a context window of 10 million tokens, dominating the LM Arena leaderboard.

Cheesing the LM Arena

Meta found a way to manipulate the LM Arena rankings with a fine-tuned model, leading to controversy.

Llama 4 Incident

The incident where Llama 4 was not in line with expected policies, causing a stir in the community.

AI Doubting Boomers

An internal memo from the CEO of Shopify addresses the skepticism towards AI and the need for continuous learning.

Open Models vs. Closed Models

Comparison between open models like Scout and Gemini and their performance on benchmarks and real-world applications.

Augment Code Engine

Introduction to Augment, the first AI agent for large-scale code tasks, enhancing code quality and productivity.


FAQ

Q: What is the significance of the Llama family of large language models with a context window of 10 million tokens?

A: The Llama family of large language models with a context window of 10 million tokens dominated the LM Arena leaderboard.

Q: What controversy arose related to Meta's manipulation of the LM Arena rankings?

A: Meta found a way to manipulate the LM Arena rankings with a fine-tuned model, leading to controversy.

Q: What incident caused a stir in the community regarding Llama 4?

A: The incident where Llama 4 was not in line with expected policies caused a stir in the community.

Q: What does the internal memo from the CEO of Shopify address?

A: The internal memo from the CEO of Shopify addresses the skepticism towards AI and the need for continuous learning.

Q: What is the comparison between open models like Scout and Gemini?

A: The comparison between open models like Scout and Gemini includes their performance on benchmarks and real-world applications.

Q: What is Augment, and what role does it play in large-scale code tasks?

A: Augment is the first AI agent for large-scale code tasks, enhancing code quality and productivity.

Logo

Get your own AI Agent Today

Thousands of businesses worldwide are using Chaindesk Generative AI platform.
Don't get left behind - start building your own custom AI chatbot now!