Why is China superior to others?

Surely you have already used some IA LLM model, either chatgpt, gemini or those used to generate images such as Dall-E or OpenArt. But if you have not only used them, you have also compared them and among the candidates were deepseek, China and open and free and … you will have realized that, At least, it is equally efficient as giants such as Google (Gemini) and OpenAi (Chatgpt). But it is probably superior. And now we know the reason.

Those responsible for the Chinese reasoning model, Deepseek-R1 have revealed the deep science that supports their training. And they have done it big: in a study Posted in Nature. The authors, led by Wenfeng Liang, show how they used rewards to train their R1 model and solve problems, which has allowed to overcome some of the expensive computational and climbing barriers that make it difficult to teach AI models to reason as humans.

“Here we demonstrate that the reasoning skills of the LLM can be encouraged by pure reinforcement (AR) learning, eliminating the need for reasoning paths labeled by humans -explains the study -. The proposed ar framework facilitates the development of advanced reasoning patterns, such as self -reflection, verification and dynamic adaptation of strategies. Consequently, consequently, The trained model achieves higher performance in verifiable tasks such as mathematics, programming skills and Stem areassurpassing their trained counterparts by conventional supervised learning. ”

The reasoning, or the logical process of using existing knowledge and the new information to draw conclusions, is a cornerstone of human cognition. LLM models are based on something similar. The difference is that, While humans obtain information from experience, AI does it through our experience.

“This success depends largely on extensive demonstrations noted by humans, and the capacities of the models are still insufficient for more complex problems -adds the study -. This limits scalability and can introduce human biases into models training. Also. could limit the exploration of higher reasoning pathways beyond the capacities of the human mind

To overcome this barrier, Deepseek’s team used reinforcement learning to allow their LLM to develop reasoning skills through self -evolution. Reinforcement learning is a process in which learning is promoted by an agent or model that interacts with the environment through essay and error, so it receives penalties or rewards for certain actions. By applying it to a language model such as Deepseek, once the model generates an output in response to an indication, Receive feedback in the form of reward signals, which allows you to improve your answers. Almost like a “digital chuche”.

“Instead of explicitly teaching the model how to solve a problem, We simply provide adequate incentives and this autonomously develops advanced problem solving strategies”, Explain the authors.

This allowed them to avoid a supervised adjustment stage in the initial training of the model, where a database of ideal indications and responses written by humans is used to adjust the model.

The resulting Deepseek-R1 model, whose training still requires some human supervision, He achieved superior performance in mathematics and programming competences, surpassing his conventionally trained counterparts.

“This design choice originates in our hypothesis that human defined patterns can limit the exploration of the model, while learning by reinforcement can better encourage the appearance of new reasoning capabilities in the LLM”, Confirms the study.

The authors began using a learning process for reinforcement applied to their Deepseek-V3 base model, which allowed the resulting model, Deepseek-R1-Zero, to develop naturally “Drive and sophisticated reasoning behaviors.”

Thanks to this, the Deepseek-R1-Zero model experienced an increase in its average score of approved from 15.6 % to 77.9 % During the training process, overcoming the precision of human participants.

Despite this, the Deepseek-R1-Zero model still faced challenges such as the mixture of languages, because it trained in several languages, including Chinese and English. This prompted the team to carry out additional training to develop the Deepseek-R1 model, which inherited the reasoning capabilities of its predecessor, while adjusting its behavior more to human preferences. This model reached a precision rate of 79.8 % EY experienced improvements in other programming and mathematics parameters.

“This serves as a reminder of the Reinforcement learning potential to achieve higher levels of capacities in LLMracing the way for more autonomous and adaptive models in the future, ”conclude the authors.

Of course, they also clarify that the model still presents limitations that they hope to address in the future, including The impossibility of using external tools such as calculators to improve your performance and its scores in the software engineering parameters.