laitimes

Stanford University's newly opened large-scale model class is worth seeing

author:Silicon Star Man

As the cradle of Silicon Valley, Stanford University's computer science courses are indisputable, such as the computer vision course CS231n: Deep Learning for Computer Vision, led by Li Feifei, which has passed its ninth year and is still full after the start of the course this year.

With the rise of LLMs and generative AI, Transformer has become the new standard in natural language processing and is expanding into areas such as computer vision and audio processing. Another Stanford computer course, CS25: Transformers United V4, has also become one of the hottest and most cutting-edge courses.

Stanford University's newly opened large-scale model class is worth seeing

This course not only covers the theoretical foundations of Transformer, but also delves into its application to real-world problems. Every year, CS25 will invite heavyweight guests in the field of artificial intelligence to give lectures to share academic breakthroughs and the latest trends, including AI godfather Geoff Hinton, OpenAI co-founder Andrej Karpathy, and researchers from Google, NVIDIA and other companies. CS25 will also be open to the public for free, attracting attention both inside and outside Stanford, with millions of online views on its course videos.

Among the guests of this year's CS25, in addition to top researchers from OpenAI, Allen Institute for Artificial Intelligence and other institutions, Dr. Ding Ming of Zhipu AI also received an invitation from Stanford to become the only guest from China.

In fact, the research results of Zhipu AI in the field of large models have not only been widely recognized in the academic community, but its innovative technology has also been applied and verified in the industry. For example, the open-source visual language model CogVLM, which Dr. Ding Ming co-developed, has been adopted by the well-known project Stable Diffusion to improve the accuracy and efficiency of image annotation due to its excellent performance.

A model of symbiosis between academia and business, Zhipu AI leads the innovation of large model technology

The application of large models in scientific research is just beginning to be understood, and its continued innovation and progress requires joint contributions from academia and industry. At present, different innovation entities such as domestic universities, scientific research institutions, and enterprises are actively participating in the research and development of large models, and the cooperative relationship between academia and industry is particularly important. Zhipu AI, which has a lot of roots with Tsinghua University, has its own scientific research genes from the first day of its establishment, which is a model of this kind of symbiosis between academia and business. Since the launch of the new generation of pedestal model GLM-4, Zhipu AI has successively released a lot of research results, involving all aspects of the large model industry such as LLM, multimodal, long text, alignment, evaluation, inference acceleration, and Agent.

A new perspective on evaluating the emergent capabilities of large models

Stanford University's newly opened large-scale model class is worth seeing

A key point of exploration in the research and development of large language models is how to understand and improve the "emergence" of the model—that is, new capabilities that suddenly emerge as the model scales. Conventional wisdom has been that the size of the model and the amount of training data are decisive factors in improving this capability. The paper "Understanding Emergent Abilities of Language Models from the Loss Perspective" proposes a new perspective: Loss is the key to emergence, not the model parameters.

By analyzing the performance of multiple language models of different sizes and data volumes on multiple English and Chinese datasets, Zhipu AI found that low pre-training loss was negatively correlated with the high performance of the model in practical tasks. This discovery not only challenges the previous common sense, but also provides a new direction for the optimization of future models, that is, to stimulate and improve the emergence ability of models by reducing the pre-training loss. This insight provides a theoretical basis for AI researchers and developers to introduce new evaluation indicators and methods in model design and evaluation.

The RLHF technology of GLM-4 is exposed

Stanford University's newly opened large-scale model class is worth seeing

Large language model alignment is an important issue related to AI control and AI security, and only by ensuring that the behavior and output of the model are consistent with human values and intentions can AI systems be safer, responsible, and effective in serving society.

In response, Zhipu AI has developed a technology called ChatGLM-RLHF, which trains language models by integrating human preferences to produce more popular responses. Specifically, we first create a system that collects preference data by comparing different model responses, then use the data to train a reward model to help predict human preferences, and finally use reinforcement learning algorithms to optimize the model so that it can generate more accurate and humanized responses.

Strengthen the mathematical ability of large models

Effectively solving mathematical problems is also a challenge for the application of large language models. Traditional methods, such as reinforcement learning based on human feedback (RLHF), optimize text generation quality, but may ignore the accuracy and logical coherence required for mathematical problem solving. Conversely, Specific Fine-Tuning (SFT) may sacrifice the language processing power of the model.

Zhipu AI's paper "ChatGLM-Math: Enhancing Mathematical Ability" introduces an innovative iterative training method called "Self-Critique", which significantly improves LLM's ability to solve mathematical problems through a self-feedback mechanism, while maintaining its language processing advantages. In addition, the research team has developed the MATHUSEREVAL benchmark set to evaluate the ability of LLMs to solve open mathematical problems in real-world application scenarios, and the test results show the effectiveness and innovation of the method.

AutoWebGLM: A "smarter" intelligent navigation agent

With the rapid development of Internet content and services, it is critical to automate web navigation agents to help users efficiently access information and perform tasks. In terms of dynamic and complex web page processing, the agent needs to adapt to the diversity of user operations and the complexity of HTML content, which is a key problem to be solved. Zhipu AI's paper "AutoWebGLM: Bootstrap And Reinforce A Large Language Model-based Web Navigating Agent" proposes an automated web navigation agent AutoWebGLM based on the large language model ChatGLM3-6B. By using HTML simplification algorithms, hybrid human-AI training methods, and combining reinforcement learning and rejection sampling techniques, the project significantly improves the ability to understand and operate web pages.

Stanford University's newly opened large-scale model class is worth seeing

The AutoWebGLM project not only drives the advancement of automated web navigation technology, but also provides a tool for testing and refining AI web navigation agents by introducing AutoWebBench, a bilingual web navigation benchmark set. This research not only advances automated web navigation technology, but also opens up new possibilities for the application of intelligent agents in the real world.

COG Series Models: Let the Large Model "See" and "Understand Better"

The integration of vision and language has become a key area in the study of large models, involving how to allow machines to better understand and generate image content while seamlessly integrating with natural language. This technological advancement will not only improve the interactivity of AI systems, but also enhance their application in areas such as automated visual tasks, content creation, and decision-making.

Stanford University's newly opened large-scale model class is worth seeing

The Cog family of models represents the latest advances in the field of visual language models (VLMs). CogVLM has been applied to popular image generation technologies such as Stable Diffusion 3 by providing accurate image annotation capabilities, significantly improving the comprehension and description quality of image content. CogAgent has been recognized as a visual agent for its excellence in image recognition and processing, recognized by CVPR 2024 Highlights. CogCoM introduces a chain of operations mechanism, which allows for complex multi-round visual inference, enhancing the applicability and flexibility of the model. CogView3, on the other hand, sets a new standard of performance in text-to-image conversion through its cascaded diffusion framework, significantly outperforming existing technologies and reducing inference time. Overall, the development of these models not only pushes the boundaries of multimodal AI, but also provides powerful new tools for practical applications.

Stanford University's newly opened large-scale model class is worth seeing

These technologies have also been applied to the generative AI assistant Zhipu Qingyan, which provides more accurate visual content recognition and more expressive image creation services, helping users get a more natural and rich experience in multimodal interactions.

Standing at the intersection of the market and academia, Zhipu AI

As John Hannes, the 10th president of Stanford University, said, "The true power of symbiosis between academia and business lies in the ability to combine innovative ideas with business practices."

From basic research, applied research, to the transformation of products and services that users can actually perceive, Zhipu AI has realized the closed-loop transformation of scientific research achievements and realized industrialization, which is very similar to the path of Google, which originated from the Stanford research project.

Google's transformation from a research project initiated by Larry Page and Sergey Brin at Stanford University, which was initially supported by Stanford University, and developed into the PageRank algorithm, the core technology of Google's search engine, in an academic setting, which ultimately changed the way the modern Internet operates.

As a commercial company, the reason why Zhipu AI actively participates in academic exchanges and publishes research results is inseparable from the bloodline relationship between Zhipu AI and the Department of Computer Science of Tsinghua University. From the beginning of its establishment, Zhipu AI has been an enterprise that attaches great importance to scientific research and innovation.

Originated from the academic community and fed back to the academic community. Only by supporting academic research, promoting academic development, and deepening the exploration of basic theories can we provide more efficient and reliable model design and training methods for the industry. Only then can the large model and even the artificial intelligence industry enter the next stage, otherwise we will always face an AI black box, leaving the embarrassing situation of explaining all unpredictable phenomena with "emergence".

For a large-scale model enterprise, the concentration of scientific research talents, the scientific research atmosphere, and the long-term and large investment in academic and technology research and development constitute the cornerstone of success. Only when technology takes root and a solid underlying architecture is established can more commercial applications grow.

On the other hand, the development of large models is a combination of science and engineering, which requires not only in-depth theoretical research to explore the nature of data and algorithms, but also exquisite engineering technology to realize the design, optimization and application of models. To promote the development of large models, it is necessary to allocate resources such as talents and data to more efficient places. As a commercial company with academic genes, Zhipu AI has been exploring the application of large models in vertical fields, and its product line has achieved full benchmarking with OpenAI. By transforming large-scale model technology innovation into landing products directly facing the commercial market, it not only promotes the development of AI applications, but also provides practical needs, financial support, and technical feedback for the academic community.

The combination of academic research and business practice can greatly promote technological progress and the development of new products. Standing at the crossroads of the market and academia, this synergistic effect is happening to Zhipu AI.

The layout of the AI ecosystem broadens the moat

Businesses that build ecosystems are malleable and self-iterating, and great entrepreneurs know how to channel that force so that a company has resources and influence beyond the boundaries of their company and organization, thus building a strong moat.

Whether it is maintaining close cooperation and communication with the academic community, adhering to the concept of open source first, or investing in more than a dozen AI large model startups, the series of layouts of Zhipu AI is essentially to form a "community" in the form of a community, bringing together the forces of all parties to jointly promote the progress of China's large model technology and industry, establish the AI ecology of Zhipu, and finally realize AGI.

"It's better to walk alone than to walk together", which requires enough "muscles" and enough strategic determination.

Stanford University's newly opened large-scale model class is worth seeing

In the SuperBench comprehensive capability evaluation framework developed by the Basic Model Research Center of Tsinghua University and Zhongguancun Laboratory, GLM-4 is second only to Claude-3, ranking second in the performance of large model semantic comprehension ability and first in China.

All this has gradually manifested itself in the technological breakthrough and commercialization process of Zhipu AI.

Read on