laitimes

谷歌发开源模型,不Open的只剩OpenAI

谷歌发开源模型,不Open的只剩OpenAI

谷歌发开源模型,不Open的只剩OpenAI

Produced by Tiger Sniff Technology Group

Author: Qi Jian

Editor|Wang Yipeng

Header Image|DALL-E 3

The world is amazed by Sora, while Google is still silently releasing language models.

On February 21, local time, Google launched a new open-source model series "Gemma" based on Gemini's research and technology. Compared to Gemini, Gemma demonstrates greater efficiency and lightweight design, while still providing a full set of model weights for free and explicitly allowing commercial use.

The models released this time include the Gemma 2B and Gemma 7B, two scale versions of 2 billion and 7 billion. Each release provides pre-trained models and models that are fine-tuned for specific instructions. Users can easily access these models through Kaggle, Google's Colab Notebook, or the Google Cloud platform.

谷歌发开源模型,不Open的只剩OpenAI

Google's technical report says Gemma has outperformed mainstream open-source models on a range of key benchmarks, including versions 7B and 13B of LLaMA-2, as well as the Mistral 7B model. In particular, Gemma has demonstrated good performance in instruction adherence, creative writing, coding tasks, and basic security protocol tests.

In addition, Google has released a series of tools and guidelines designed to encourage the development community to collaborate and use these models responsibly to drive the healthy development of AI technology.

After Google released the open-source Gemma, OpenAI became the only AI company that has not released an open-source model during this wave of AI fever. Under the release post of Google Deepmind co-creator and CEO Demis Hassabis, some people @Sam Altman questioned when OpenAI will be open.

谷歌发开源模型,不Open的只剩OpenAI

How is Gemma different?

The Gemma model provides pre-trained models and checkpoints for conversation, instruction adherence, usefulness, and security fine-tuning. Among them, the 700 million parameter model optimizes efficient deployment and development on GPUs and TPUs, while the 200 million parameter model is more suitable for running on CPUs to meet different computing constraints, application and developer needs.

谷歌发开源模型,不Open的只剩OpenAI

Gemma对比LLaMA 2-7B、13B,以及Mistral-7B

The architecture of the Gemma model is based on the Transformer decoder, optimized for its core parameters, and the context length at the time of training is 8192 tokens.

In addition, Google has made several key improvements to the original Transformer theory, optimizing the processing efficiency, model size, performance, and training stability of the model.

Multi-query attention mechanism: Compared with the traditional multi-head attention, the application of the multi-query attention mechanism in the 200 million-parameter model improves the processing efficiency and model performance, especially in the case of small parameter scale, which can capture and process information more effectively.

Rotational Position Embedding (RoPE): The use of RoPE instead of traditional absolute position embedding, as well as the strategy of sharing embedding between inputs and outputs, effectively reduces the size of the model while maintaining or improving the performance of the model, especially in terms of position sensitivity when working with sequence data.

GeGLU Activation Function: Instead of the traditional ReLU activation function, GeGLU provides stronger nonlinear processing capabilities, which is important to enhance the model's ability to capture complex patterns and relationships, especially in small models to maximize performance.

Innovative application of normalized positions: By applying normalization processing (using RMSNorm) at the inputs and outputs of each Transformer sublayer, the Gemma model improves the stability and effectiveness of training, and the innovation of this method lies in the fact that it provides a more effective means of training deep networks, which helps to improve the generalization ability of the model and reduce the risk of overfitting.

Another noteworthy feature of Gemma is the emphasis on security.

A comprehensive security assessment of the GEMMA model includes in-depth analysis and testing of the model's behavior to ensure that it can operate safely and reliably in different application scenarios. At the same time, responsible AI practices are integrated into Gemma's development process, including ensuring that models are fair, transparent, and explainable. This helps to reduce the bias and unfairness that AI systems can bring, and increases user trust in the model's output.

Accompanying the Gemma model is a detailed set of safety guidelines on how to use the Gemma model safely and effectively. This includes recommended use cases, warnings of potential risks, and strategies for how to mitigate those risks.

As an open-source model, the Gemma project also encourages community collaboration and feedback, allowing researchers and developers to contribute their own insights and improvements through open source. This open cooperation model helps to find and remediate security vulnerabilities in a timely manner, improving the overall security of the model.

In fact, in today's rapidly iterating LLM development environment, the security performance of a lightweight open-source model is an important prerequisite for the model to be exposed to more application scenarios.

AI that falls on mobile phones, computers, and cars

In Gemma's description page, Google calls for "democratized access" to advanced AI models, and specifically emphasizes that Gemma can be deployed in resource-constrained environments, such as laptops, desktops, or the user's own cloud infrastructure.

Nowadays, the attention of lightweight AI models in the industry is heating up rapidly.

In June 2023, Microsoft released a lightweight model Phi with 1.7 billion parameters, and the parameters of the Phi-2 version have been expanded to 2.7 billion since then. In China, two companies have launched lightweight LLMs below 7B, including MiniCPM-2B with wall-facing intelligence, and three versions of 0.5B, 1.8B and 4B in Ali Qwen1.5.

The MiniCPM-2B model of Facewall Intelligence is directly aimed at the mobile phone, and the real landing effect of the model has been tested on a number of common mobile phones.

How MiniCPM works on mobile phones

Although it is the same 2 billion parameters, compared to the MiniCPM-2B that can run in 4G memory mobile phones, the model storage capacity of Gemma-2B is obviously a bit large, and it may be difficult for ordinary mobile phones to run, and the current technical report of Gemma does not mention the output speed on personal devices.

谷歌发开源模型,不Open的只剩OpenAI

Gemma在hugging face的下载页面

The characteristics of fast running speed, low cost, and low dependence on high-end devices make lightweight models significantly easier to commercialize, and the most typical is AI that is implemented on mobile phones, computers, and car machines.

At present, mainstream and non-mainstream consumer electronics and automotive companies are actively deploying AI.

Domestic OPPO and Meizu two mobile phone manufacturers have just updated their AI strategy, of which Meizu even wants to give up traditional mobile phones, only do AI mobile phones; Lenovo, Dell, HP, Asus, etc. have announced their own AI PC strategy, Nvidia recently launched a Chat with RTX that can run locally, graphics card requirements7G video memory, which is mainly called Mistral7B model; car machine, Mercedes-Benz, BMW, Volkswagen and other companies have also launched a car machine system that integrates AI large models, and BYD in China has also recently launched a new vehicle intelligent architecture "Xuanji" and its AI large model "Xuanji AI large model".

The launch of open source lightweight, such as Gemma, MiniCPM, and Qwen 1.5, provides a path for these device manufacturers to deploy AI on devices without having to develop their own large models.

In fact, in the face of complex algorithm research and high training costs, most enterprises do not have the ability to develop large models from scratch.

Retraining or fine-tuning based on open source large models such as LLaMA has become a more practical and cost-effective option. With methods such as Continue Pretrain, finetune, etc., developers can build on existing models and customize them to fit specific application needs. This approach not only reduces development costs, but also accelerates the process of model innovation, allowing even teams with limited resources to compete in the development of large models.

The mainstream form of large-scale model entrepreneurship in China is based on the retraining or fine-tuning of open-source models such as LLaMA. Although developing a large model from scratch is technically attractive, it requires extremely high costs and expertise, as experts say, and the process is complex and error-prone. Therefore, utilizing and contributing to the open source large model community is not only an effective way to achieve rapid iteration and innovation, but also an important means to promote technology sharing and industry progress.

For a long time, the mainstream ecology of AI large model development and model modification has been occupied by LLaMA, and it was not until the emergence of Mistral that it changed slightly. Google's release of open-source Gemma provides developers with more choices and flexibility, which is destined to play a huge role in stimulating the open-source ecosystem and promoting the development and application innovation of open-source large-scale model technology.

Read on