Math Reasoning

Evaluating the Mathematical Reasoning Capabilities of Large Language Models: Limitations and Challenges

LLMs have made remarkable progress in various fields, including natural language processing, question answering, and creative tasks, even demonstrating the ability to solve mathematical problems. Recently, OpenAI’s o1 model, which uses CoT (Chain of Thought), has shown significant reasoning capabilities. However, for a long time, the commonly used GSM8K dataset has had a fixed set of questions

Evaluating the Mathematical Reasoning Capabilities of Large Language Models: Limitations and Challenges Read More »

Paper Skimming, , , ,
pexels-photo-28608151-28608151.jpg

Grade-School Math and the Hidden Reasoning Process

Currently, models like OpenAI’s GPT, Anthropic’s Claude, and Meta AI’s LLaMA have achieved over 90% accuracy on the GSM8K dataset. But how do they accomplish this? Is it through memorization of data and problems, or do they truly understand the content of the questions? GSM8K, short for “Grade School Math 8K,” comprises 8,000 math problems

Grade-School Math and the Hidden Reasoning Process Read More »

Research Highlights, , , ,
Scroll to Top