DeepMind’s latest achievement is an AI system named AlphaProof, which has demonstrated remarkable capabilities in handling mathematical proofs, achieving performance comparable to silver medalists at the 2024 International Mathematical Olympiad. This breakthrough is significant because it showcases the AI’s ability to understand and tackle complex mathematical problems, a feat previously unattainable by computers.
The challenge lies in the fact that while computers excel at calculations, they struggle with the logical reasoning and structured understanding required for advanced mathematics. Humans, on the other hand, can construct semi-formal or fully formal proofs, showcasing their deep comprehension of mathematical structures. DeepMind’s goal was to create an AI that could match this level of understanding.
To achieve this, DeepMind utilized a software package called Lean, which helps mathematicians write precise definitions and proofs in a formal language. The team trained a large language model to translate mathematical statements from natural language to Lean, generating approximately 80 million formalized statements. This process was crucial for providing the AI with a structured and formal training environment.
AlphaProof was designed using the architecture of DeepMind’s AlphaZero AI system, which has proven successful in chess, Go, and shogi. The AI was trained to build proofs in Lean, treating it as a game to be mastered through trial and error. During its learning phase, AlphaProof proved and disproved problems in its database, learning from its mistakes.
The system employed two main components: a large neural net with billions of parameters and a tree search algorithm. The neural net learned to work in the Lean environment, rewarded for proven statements and penalized for reasoning steps, encouraging concise and elegant proofs. The tree search algorithm explored all possible actions to advance the proof, focusing computational resources on the most promising branches.
However, AlphaProof’s performance had its limitations. It required human assistance to make problems compatible with Lean, and it struggled with geometry problems, relying on a specialized AI called AlphaGeometry 2. Additionally, AlphaProof’s computational requirements were high, demanding hundreds of TPU-days per problem, making it impractical for most research groups and aspiring mathematicians.
Despite these challenges, DeepMind is optimistic about AlphaProof’s potential. The team aims to optimize the AI to be less resource-intensive and make it available to the research community. They plan to release an AlphaProof tool and a small trusted testers program to assess its utility for mathematicians.
The future of AlphaProof holds promise, but it also raises questions about the balance between computational efficiency and the depth of mathematical understanding. As DeepMind continues to refine its AI, the research community eagerly awaits the potential impact on advanced mathematics.