laitimes

"AI tutoring"? Good future MathGPT: 90 points on the elementary school math test!

author:Sohu Technology
"AI tutoring"? Good future MathGPT: 90 points on the elementary school math test!

Produced | Sohu Technology

Author | Pan Qiuxuan

Operations Editor | Liu Yujia

On August 24, in the live broadcast event of the 20th anniversary of Hao Future, CTO Tian Mi announced that MathGPT, a 100-billion-level large model in the field of mathematics that has not been self-developed, was officially launched and opened for public testing.

Good future introduction, MathGPT is a large model in the vertical field of mathematics for mathematics enthusiasts and scientific research institutions around the world, with problem-solving and problem solving algorithms as the core, and it is also the first large model specially built for mathematics in China.

Does MathGPT mean that "AI tutoring" is possible? After actual measurement, Sohu Technology found that MathGPT can successfully complete the primary school examination paper and give clear answers. In the horizontal comparison with GPT-4 and Wen Xin Yiyan, MathGPT is better than Wen Xin Yiyan and not as good as GPT-4 in mathematical application problem ability.

As a large model in the field of mathematics perpendicular to education, MathGPT can disassemble the problem stem in more detail and provide difficult tips than the general large model, which will also help students better absorb knowledge points.

Tian Mi said, "Good Future has 20 years of accumulation in mathematical data and business, the accumulation of a large amount of educational data and the ability to continuously produce educational data, so I chose to do this difficult but correct thing." "Good Future hopes to use its years of accumulation in mathematics and AI to do a good job in the mathematical foundation of the AI big model era."

You can score 90 points in the primary school mathematics test

Previously, the large models of domestic and foreign giants were dubbed "liberal arts students", and they performed well in translation, content summary summary, understanding text, and generating dialogue, but there would be gibberish when it came to logical and computational problems.

The Good Future team pointed out that the problem of "partial subject" of large models is determined by the characteristics of LLM models. They previously said, "LLM large models come from training on massive linguistic texts, so they are best at language processing." ”

Based on this, MathGPT in the future combines a large language model and a calculation engine, and the large language model is responsible for understanding the topic, analyzing it step by step, and calling the calculation engine at the appropriate step to improve the correct rate of problem answering.

Enter the MathGPT homepage, there are three different levels of math problem difficulty in elementary school, junior high school and high school. The types of topics include finding rules, logical analysis, operation solving, random phenomena, expression discussion, etc.

"AI tutoring"? Good future MathGPT: 90 points on the elementary school math test!

Users can choose different types of math problems for targeted training, or they can ask MathGPT questions directly through the dialog box at the bottom. Users can upload math problems through text or pictures to get conversational feedback on solutions, or they can randomly generate math problems and have them answered by the system through the "random question" button.

"AI tutoring"? Good future MathGPT: 90 points on the elementary school math test!

Sohu Technology randomly selected the first 10 questions of a primary school mathematics exam paper, including simple geometry, finding rules and arithmetic questions, MathGPT can answer 9 questions correctly, and sort out the logic and test points of the questions.

"AI tutoring"? Good future MathGPT: 90 points on the elementary school math test!

Functions are solved, equations are not solved

According to the official website of MathGPT, MathGPT's mathematical calculation ability has covered mathematics problems in elementary school, junior high school and high school, and the question types cover many types such as calculation problems, application problems, algebra problems, etc.

Sohu Technology randomly searches for a set of high school math problems and enters instructions by asking questions in pictures.

"AI tutoring"? Good future MathGPT: 90 points on the elementary school math test!

Among the five questions, only the geometric pictures of the third question cannot be directly converted into text instructions, and the other four questions MathGPT can accurately convert formulas and mathematical symbols into words.

"AI tutoring"? Good future MathGPT: 90 points on the elementary school math test!

In terms of accuracy, the answers to the four questions are correct, and the solution process is complete, including the analysis of the stem, the detailed solution of the questions, and the knowledge points assessed by the questions, but the analysis process language of two of the questions appears in English.

"AI tutoring"? Good future MathGPT: 90 points on the elementary school math test!

However, when solving the equation, Sohu Technology enters two simple one-dimensional primary equations to find X or Y values, one of which has an incorrect MathGPT answer, and the solution method is also wrong and confusing.

"AI tutoring"? Good future MathGPT: 90 points on the elementary school math test!

In the second question, although MathGPT gives the correct result, the reasoning in the solution process is wrong.

"AI tutoring"? Good future MathGPT: 90 points on the elementary school math test!

Ability to solve mathematical application problems

Not as good as GPT-4, better than Wen Xin

According to the MathGPT technical report, in the test results of 6 public mathematics evaluation sets such as CEval-Math, AGIEval-Math, APE5K, CMMLU-Math, Gaokao Mathematics and Math401, Good Future MathGPT has achieved the highest scores in many tests. At the same time, MathGPT also performed well in the general test set of C-Eval's middle and high schools.

"AI tutoring"? Good future MathGPT: 90 points on the elementary school math test!

Based on this, Sohu Technology selected three application problems of different difficulty and tested MathGPT, GPT-4 and Wen Xin at the same time.

On the title of chickens and rabbits in the same cage that stumped a large model of "liberal arts students", Sohu Technology ended with "chickens and rabbits in the same cage, a total of 20 heads, a total of 62, how many chickens and rabbits are there?" This question tests three large models, all of which answer correctly. Among them, compared with the problem-solving logic of MathGPT, GPT-4 will be more concise and easy to understand.

"AI tutoring"? Good future MathGPT: 90 points on the elementary school math test!

(Wen Xin replied with a word that chickens and rabbits are in the same cage)

"AI tutoring"? Good future MathGPT: 90 points on the elementary school math test!

(GPT-4 Answer: Chicken and rabbit in the same cage)

"AI tutoring"? Good future MathGPT: 90 points on the elementary school math test!

(MathGPT answer: chicken and rabbit in the same cage)

At the same time, the advantage of MathGPT is that through the three modules of [Analysis], [Detailed Explanation] and [Finishing Touch], the mathematical problems are disassembled for analysis, the problem stem is analyzed and the knowledge points are summarized, and the second solution method "Equation Solution" is also given.

The second question is an application problem solved by setting X and listing a one-dimensional equation, and the stem is: "Jason throws a stone at the surface of the pond." The stone bounced three times on the surface of the pond. If the second bounce is half the distance of the first bounce, the third bounce is a quarter of the distance of the second bounce, and the total distance between bounces is 65 inches, how many inches did the rock move on the first bounce?"

The results show that the answer and solution process of GPT-4 are correct, and the problem solving process and answer of Wen Xinyan are wrong. MathGPT is correct in setting X and listing equations, but an error occurs in the process of solving the equations.

"AI tutoring"? Good future MathGPT: 90 points on the elementary school math test!

(Wen Xin's answer)

"AI tutoring"? Good future MathGPT: 90 points on the elementary school math test!

(GPT-4 answer)

"AI tutoring"? Good future MathGPT: 90 points on the elementary school math test!

(MathGPT answer)

The third question can be solved with the logic of equal substitution, and the stem is: "1 apple = 2 pears, 3 pears = 4 oranges, 6 oranges = 7 bananas, 56 bananas equal to how many apples?" ”

The results show that the GPT-4 process and answer are correct, and the logic of Wen Xin's solution is correct but the answer is wrong. MathGPT can summarize that the knowledge points of the topic are equivalent substitution, but the solution ideas and analysis are wrong.

"AI tutoring"? Good future MathGPT: 90 points on the elementary school math test!

(Wen Xin's answer)

"AI tutoring"? Good future MathGPT: 90 points on the elementary school math test!

(GPT-4 answer)

"AI tutoring"? Good future MathGPT: 90 points on the elementary school math test!

(MathGPT answer)

Tian Mi believes that the essence of large models is a more efficient way to learn from data and apply it. With the blessing of AI capabilities, a new learning method of "students' self-learning + AI Q&A" is possible. As a vertical large model, MathGPT can summarize the solution ideas, thinking methods, and prompt the test points, difficult points, and key points of the question in more detail in addition to the answer. However, if you want to truly become an "AI tutor" and assist students to complete self-study, there is still much room for improvement in accuracy.

According to Tian Mi, with the smooth progress of the public test, MathGPT's problem-solving ability will continue to be improved, and the product-level application based on MathGPT is also accelerating its research and development, which will be released in the near future.