laitimes

At ICLR 2024, see the power of China's big model

author:AI Tech Review
At ICLR 2024, see the power of China's big model

国产大模型不再追赶 OpenAI。

Author | Lai Wenxin

Edit | Chan Choi Han

Recently (5.7-5.11), the 12th International Conference on Learning Representation (ICLR) was held at the Exhibition and Convention Center in Vienna, Austria. Since the start of the final review of ICLR 2024 in January, a total of 7,262 submissions have been received, an increase of 46.1% compared to 4,966 in the previous year, nearly doubling. In the rigorous evaluation process, 2,260 papers were finally accepted, and the overall acceptance rate remained at 31%, which was basically the same as last year's 31.8%, of which 367 (5%) and 86 (1.2%) papers were selected for the Spotlights and Oral presentations, respectively. In addition to the surge in the number of papers, large models (LLMs) have also become one of the hot keywords in this year's ICLR. The number of papers submitted on the topic of LLM has skyrocketed, and the research teams from all over the world cover multiple subdivisions, which has attracted the participation of Microsoft, Google, OpenAI, Anthropic, Meta, as well as many technology teams such as Zhipu, Baidu, and Facewall in China. It can be said that this year's first ICLR in the field of artificial intelligence is not only a traditional academic conference, but also a microcosm of the head-to-head competition between large model teams in the global industry. The deadline for ICLR 2024 is September 28, 2023, but LLMs have continued to soar in the AI space for most of the past six months. What's more noteworthy is that, judging from the results and speeches of this year's ICLR papers, after a year of research, the research on large models has not only stayed at the stage of "studying OpenAI" and "catching up with OpenAI". Chinese research teams, in particular, are no longer simply imitating OpenAI.

Instead, LLM's research teams all offered their own thoughts on AGI.

1

LLMs are the absolute protagonists of ICLR, led by Yoshua Bengio and Yann LeCun, leaders in deep learning and two of the Turing Award Triumvirate, and the first conference was held in Scottsdale, Arizona, USA, in 2013. Although ICLR is still relatively young compared to NeurIPS (Neural Information Processing Systems Conference) and ICML (International Conference on Machine Learning), its academic influence and recognition are growing day by day, and it is now recognized as one of the top three conferences in the field of machine learning, along with the former two, and the number of attendees and submissions has increased significantly year by year.

At ICLR 2024, see the power of China's big model

ICLR Historical Data: The day before the https://papercopilot.com/statistics/iclr-statistics/ conference, the official website of ICLR 2024 announced the list of this year's winning papers, with 5 outstanding papers and 11 honorable papers honored. The five outstanding papers focus on image diffusion models, simulated human-computer interaction, pre-training and fine-tuning, modeling of discrete protein sequence data, and Vision Transformers, where pre-training and fine-tuning are related to large models. According to the data of accepted papers published by ICLR, the top 10 most mentioned keywords are: large language model (LLM), reinforcement learning, graph neural network, diffusion model, deep learning, representation learning, generative model, federated learning, language model and interpretability. Among these keywords, LLMs ranked first, with 318 mentions, a full one-third more than reinforcement learning (201), which ranked second, and undoubtedly became the absolute protagonist of ICLR.

At ICLR 2024, see the power of China's big model

The 301 LLM research topics also cover a wide range of specific directions, such as the study of agents, reinforcement learning, other generative models, 3D reconstruction, NLP applications, multimodal applications, carbon footprint modeling, and so on. Among the LLM-related papers accepted by ICLR, there are many amazing new scientific research results or products in the past few months, such as MetaGPT, an open-source multi-agent development framework developed by Chinese teams such as Deepin Fuzhi. MetaGPT simulates an entire virtual software team, including multiple roles such as product managers and engineers, using standard operating procedures designed to automate programming tasks, solve problems with large model applications, and output design, architecture, and code. This paper received a high score of 8.0 in ICLR 2024. SWE-bench, an LLM evaluation framework jointly published by Princeton University and the University of Chicago, was also selected as the Oral paper. This is an evaluation framework consisting of 2294 real-world software engineering problems from GitHub and pull requests from 12 popular Python repositories that measure the ability of LLMs to edit codebases to solve problems given a codebase and a description of the problem to be solved. Solving problems in SWE-bench often requires understanding and coordinating changes between multiple functions or even files at the same time, invoking models to interact with the execution environment, handling extremely long contexts, and performing complex reasoning that goes far beyond traditional code generation tasks. It can be said that the emergence of this evaluation standard has made the performance competition of large models on the market more intuitive. In addition, there is LongLoRA, an efficient fine-tuning method for ultra-long context LLMs proposed by MIT, Hong Kong Chinese and NVIDIA. This is a very effective fine-tuning method, through sparse local attention for fine-tuning, LongLoRA achieves context expansion, saves computational effort, and has similar performance to normal attention fine-tuning. ICLR 2024 also saw a novel combination of LLM and carbon footprint. A team of researchers from Indiana University and Jackson State University found that mlco2, a tool that can predict the carbon footprint of a new neural network before training, has limitations, such as the inability to estimate the carbon footprint of dense or expert hybrid (MoE) LLMs, ignores key architectural parameters, focuses only on GPUs, and fails to model specific carbon footprints. To address these limitations, they developed an end-to-end carbon footprint prediction model designed for intensive and MoE LLMs, significantly improving the accuracy of LLM carbon footprint estimates. Regarding the combination of LLM and 3D reconstruction, LRM, proposed by the Australian National University and Adobe Research Center, is able to predict the 3D model of an object from a single input image in as little as 5 seconds. Unlike previous methods of training on small-scale datasets, LRM uses a highly scalable, transformer-based architecture with 500 million learnable parameters and the ability to predict neural radiance fields (NeRF) directly from the dataset. The research team trained the LRM end-to-end on massive multi-view data containing approximately 1 million objects, including composite renderings from the Objaverse and real-world screenshots from MVImgNet. Whether it is MetaGPT or LongLoRA, the R&D personnel of domestic large models are involved, and the selected Chinese authors abound. At ICLR 2024, China's large-scale model start-up teams such as Zhipu AI, Internet technology giants such as Byte, Baidu, Meituan, Huawei, and Ant were all over the exhibition, occupying 6 of the 32 participating companies. In the keynote speech, Zhipu and other large model companies from China also made in-depth sharing, which attracted wide attention from LLM participants at home and abroad.

At ICLR 2024, see the power of China's big model

It is not difficult to find that the Chinese team has become the main force that cannot be ignored in the upsurge of large-scale model research.

2

Since ICLR saw the "China AGI" ChatGPT detonate the large model boom in 2023, AGI has become the focus of attention. How to get to AGI has become a question that both technology-driven, product-driven, and business-driven teams are vying to answer. From GPT-3 to GPT-3.5, from ChatGPT to GPT-4 and GPT-4V, OpenAI's next step, "GPT-X", once became the hottest topic in the industry, and was once fanatically regarded as "the next step for LLMs". However, as more and more researchers join, China's large-scale model researchers have begun to critically think about the "OpenAI model" and the "GPT route". According to AI Technology Review's communication with a number of large Chinese model teams, they are increasingly convinced that if they blindly catch up with OpenAI, then "we will become OpenAI at most, but we will not be able to surpass OpenAI". For example, a large model team pointed out that large models do not have the ability to "emerge intelligence", and the risk of blindly pursuing a route to achieve model intelligence by expanding the scale of the model is extremely high, and large models need to realize value through specific products and services. The 2023 Stanford team's work on "Are Emergent Capabilities of LLMs a Mirage?" was selected as the best paper by NeurIPS. It is pointed out that the intelligence emergence ability of large models may be an illusion. OpenAI's one-way route and over-reliance on long sequences have also caused the industry to rethink. Taking long text as an example, if the goal of the large model is to achieve AGI, then backwards from the ultimate goal of AGI, the capabilities that AGI should include are not well solved by the existing architecture of OpenAI's large model. By analogy with human abilities, people will become more proficient by doing one thing many times, and they will not forget after mastering a skill (such as riding a bicycle), but the current large models do not have the "experiential memory" similar to that of humans, and long texts and long sequences do not currently show the potential to express this ability. Rather than imitating OpenAI, China's large-scale model entrepreneurs have begun to start from the first principles of AGI and think about a unique technical route that is in line with the Chinese market and services at the same time. Even Zhipu AI, which is regarded by the outside world as a comprehensive benchmark against OpenAI from models to products, has different thinking from OpenAI on how to achieve AGI. This difference can be seen in the content of the keynote speech delivered by the Zhipu team at the ICLR 2024 conference. As the only Chinese LLM team invited to give a keynote speech, Zhipu shared the team's unique thinking on "ChatGLM's Road to AGI" at ICLR. Although the model matrix is similar to OpenAI, the AGI core and path of Zhipu are very different from OpenAI.

At ICLR 2024, see the power of China's big model

Since 2019, the large-scale model research of Zhipu has taken "cognition" as the core, borrowed from human thinking, and divided the model's ability development into "System 1", which is responsible for fast intuition, and "System 2", which is responsible for slow logic. This draws on Yoshua Bengio's earliest theories of "System 1" and "System 2". Zhipu's thinking is: System 1 takes LLM as the core and can quickly respond to simple problems; System 2 is built with a knowledge graph, which can handle complex reasoning tasks, build short-term and long-term memory, and also has functions such as unconscious learning and self-management. This is to enable computer programs to answer simple questions quickly and complex questions through reasoning, just like humans use their left and right brains. In addition, Zhipu's GLM large model takes a two-way autoregressive route, while OpenAI's GPT series takes a one-way autoregressive route. The characteristics of bidirectional autoregression are: when generating tokens, GLM can only focus on one-sided contexts; When using a randomized token control strategy to deal with known tokens, GLM can consider the contexts on both sides at the same time, so as to realize the dual management of one-way and two-way attention mechanisms. This is equivalent to combining the fill-in-the-blank function of BERT with the generative power of GPT to do "cloze in the blanks" in an autoregressive way. As a result, GLM-130B outperforms GPT-3 in some tasks. In addition, the large model technology team of Zhipu also believes that the human brain has a combination of multimodal perception and comprehension capabilities, as well as short-term and long-term memory abilities and reasoning abilities. Therefore, visual language models (VLMs) are also an indispensable link to AGI. CogVLM was born. This is an open-source image understanding model that aims to bridge the gap between LLMs and visual encoders. By combining text information with visual encoding and training the combined module, CogVLM achieves accurate mapping between text and images, which greatly improves the model's ability to understand and generate visual content, and is also used for image annotation in Stable Diffufion 3. The technical team has also developed an innovative cascade framework, CogView3. As the first model to achieve cascaded diffusion in text-to-image generation, CogView3 outperforms SDXL, the most advanced open source text-to-image diffusion model, in human evaluation, with inference times only about half the length, and its distillation variant even takes 1/10 the inference time of SDXL when the performance is comparable. With the addition of CogVLM, the GLM-4V has also been put into use, providing a response that is substantial, whether faced with a picture that contains common sense of the world or a diagram that needs to understand reasoning. In order for GLM-4V to automatically generate different functions, such as adding patterns of long texts to store long-term memory, or continuously learning from feedback, the GLM Large Model Technical Team has developed AgentTuning, which enables the Generic Agent capability for LLMs. Previously, large model training was based on input data to allow it to learn and fine-tune continuously, but the downside of this method is that it cannot be generalized to other broader situations. AgentTuning, on the other hand, only needs a small number of cases and limited labeled data to generalize the trained model to different models. At the same time, the "emergence" of large models is also a problem that the Zhipu technical team has been exploring. During the years of LLM cooking, Scaling Law was made an iron law, and many people believed that it was the increase in model size and the amount of training data that would make the model "intelligent". OpenAI scientist Jason Wei published a paper in the machine learning journal TMLR in 2022, proposing that some of the emerging capabilities of LLMs are only present in large models, and small models do not have them, so the emerging capabilities of large models cannot be predicted by the performance of small models alone, and the emerging capabilities will naturally increase linearly as the size of the model increases. However, the recent study of Zhipu has proposed a new understanding: Loss is the key to emergence, not the model parameter. After labeling the training loss as the X-axis and the model performance as the Y-axis, the researchers found that if the training loss reached the threshold of 2.2, the model performance climbed. It can be seen that the "emergence" of the model is closely related to the size of the model and the amount of training data, and may also be due to the training loss.

At ICLR 2024, see the power of China's big model

Address: https://arxiv.org/pdf/2403.15796.pdf

GLM-4.5 and its successors will integrate SuperIntelligence and SuperAlignment technologies to build a comprehensive multimodal model based on enhanced model security. The iteration of these results comes from the innovative thinking of a team. In his speech at the ICLR conference, Zhipu put forward its own AGI thinking: first, it is to mix multiple modalities such as image, video, and audio on the basis of text, which is the most critical intelligence, and apply LLM to scenarios such as chat and OCR recognition; This is followed by the development of virtual agents to assist users in completing various tasks, then the development of agents that can interact with the real world and get feedback from them, and then maybe even robots, through which the robot interacts with the real world to get real feedback to further achieve AGI...... The Zhipu team also came up with an interesting concept: GLM-OS. In their vision, this is a general-purpose computing system with a large model at its core, which can use existing All-Tools functions, combined with memory and self-feedback mechanisms, to simulate the human Plan-Do-Check-Act (PDCA) cycle for self-optimization. This idea attracted the enthusiastic attention of the audience at the conference, and also demonstrated the forward-looking and thinking ability of the Chinese large model team. Finally, the team shared the GLM-zero technology developed since 2019, which explores the unconscious learning mechanism similar to human learning during sleep, involving self-guidance, reflection and criticism, aiming to deepen the understanding of consciousness, knowledge and learning behavior, and represents an important step in AGI. It is worth noting that today, the price of the Zhipu large model MaaS open platform (bigmodel.cn) that can call the above technical APIs has been greatly reduced, among which the call price of the most cost-effective base large model GLM-3-Turbo model has been reduced by 80%, from the previous 1 yuan can buy 200,000 tokens to 1 yuan can buy 1 million tokens, and the new registered users will also receive a gift from 5 million tokens to 25 million tokens (inclusive). 20 million entry-level and 5 million enterprise-level credits).

At ICLR 2024, see the power of China's big model

3

Writing at the end today, Sam Altman predicted that OpenAI will release a new product on May 13, which is neither the much-anticipated GPT-5 nor the ChatGPT search engine product that was widely circulated some time ago. At a time when large models at home and abroad are still catching up with GPT-4, OpenAI is going to open up new territories. "Catch up with OpenAI, become OpenAI, surpass OpenAI." This seems to have become a curse for domestic large models. However, in the past year, domestic large models such as Zhipu GLM-4, Ali Qwen-Max and Baidu Wenxin Yiyan 4.0 have performed well in various evaluation lists and entered the international stage. The LLM results at the ICLR conference have shown that in 2024, "catching up with OpenAI" will no longer be the core of China's large-scale model companies, and "surpassing OpenAI" and commercialization will be the goals of the domestic team. Comparing the decade of deep learning from 2012 to 2022, it is not difficult to find that the AI development cycle in the era of large models is accelerating. In the accelerated technology cycle, the distance between technology and R&D and commercialization has also been greatly reduced, and new requirements have been placed on innovators. "There is no second OpenAI", but there is "the first ChatGLM", the first Wenxin Yiyan, and the first Tongyi Qianwen...... Perhaps in the past, domestic industry observers were not confident, but after the end of ICLR 2024, the power of domestic large-scale models will go abroad and compete with internationally renowned LLM companies, which will further boost the confidence of domestic LLMs.

The author of this article, anna042023, will continue to pay attention to the development trend of personnel, enterprises, business applications and industries in the field of AI large models.

Without the authorization of "AI Technology Review", it is strictly forbidden to reprint it in any way on the webpage, forum, and community!

Please leave a message in the background of "AI Technology Review" to obtain authorization for reprinting on the official account, and you need to indicate the source and insert the business card of this official account when reprinting.

Read on