HN
Today

Gemini 3 Deep Think

Google unveils Gemini 3 Deep Think, an advanced AI reasoning mode boasting unprecedented benchmark scores, including 84.6% on ARC-AGI-2 and gold-level performance in science Olympiads. This release fuels intense Hacker News discussions about the accelerating pace of AI development and the shifting goalposts for AGI. However, amidst the excitement over its raw intelligence, many users lament Gemini's inconsistent real-world usability and product experience.

729
Score
457
Comments
#2
Highest Rank
29h
on Front Page
First Seen
Feb 12, 5:00 PM
Last Seen
Feb 13, 10:00 PM
Rank Over Time
18323322444344445566791312141416192224

The Lowdown

Google has announced Gemini 3 Deep Think, a major upgrade to its specialized reasoning mode, designed to tackle complex challenges in science, research, and engineering. The model, developed in collaboration with scientists, aims to move beyond abstract theory to practical applications by handling messy, incomplete data and problems without clear guardrails.

  • Unprecedented Benchmarks: Deep Think sets new records across rigorous academic benchmarks, achieving 84.6% on ARC-AGI-2 (ARC Prize Foundation verified) and 48.4% on Humanity's Last Exam (without tools). It also reached an Elo of 3455 on Codeforces and gold medal-level performance in the 2025 International Math, Physics, and Chemistry Olympiads.
  • Real-World Applications: Early testers have successfully used Deep Think to identify a logical flaw in a mathematics paper, optimize crystal growth methods for semiconductor materials, and accelerate the design of physical components from sketches to 3D-printable models.
  • Accessibility: The updated Deep Think is immediately available to Google AI Ultra subscribers within the Gemini app. Researchers, engineers, and enterprises can also express interest in an early access program to utilize Deep Think via the Gemini API.

This release highlights Google's continued push to advance AI capabilities, particularly in domains requiring deep reasoning and problem-solving, making sophisticated AI more accessible for practical, high-impact applications.

The Gossip

Benchmark Breakthroughs & AGI Buzz

This theme captures the excitement and skepticism surrounding Gemini 3 Deep Think's impressive benchmark scores, particularly its 84.6% on ARC-AGI-2. Commenters debate whether these results truly indicate a leap towards Artificial General Intelligence or if they represent "benchmarkmaxxing" and a continuous "moving of the goalposts." The rapid rate of AI model releases also fuels discussions about an impending "singularity" and the true meaning of "general intelligence."

Google's Gemini: Power vs. Practicality

Many users acknowledge Gemini's impressive underlying model capabilities and benchmark achievements but express frustration with Google's product execution. Critics highlight poor user experience, inconsistent API behavior, issues with context retention, and difficulties with tool-calling and instruction following. This leads to a common sentiment that while Google's AI models are powerful in theory, their practical application often falls short compared to competitors like ChatGPT or Claude.

The AI Labor Landscape: Jobs & Jeopardy

The rapid advancement of AI models like Gemini 3 Deep Think prompts significant discussion and concern about the future of human labor. Commenters ponder job displacement, particularly in skilled fields like software engineering, and the potential for reduced wages. There's a prevailing fear that efficiency gains will disproportionately benefit capital owners, leading to societal instability, and skepticism towards a utopian vision of universal basic income.