Mathematical reasoning
Current LLMs are not capable of mathematical reasoning on a college level or above. In particular they are very bad at proving theorems.

The best performing model combines LLM with a rule-bound symbolic deduction engine:
  • Alpha geometry


  • The math reasoning limitation is so crucial that there is a 10 million dollar prize attached to building a model succeeding in Math Olympiad problems:

  • AI|MO Prize