Author | Tina, Nuka-Cola

Today, Apple is in big news for the first time.

Apple open-sourced OpenELM, an AI model that runs on the device, and also disclosed the code, weights, datasets, and training process.

Just as Google, Samsung, and Microsoft are pushing generative AI models for PC and mobile devices, Apple has joined the bandwagon. It's a new family of open-source large language models (LLMs) that can run on a single device platform without the need for a cloud server.

OpenELM has been released on the AI code community Huggang Face and consists of multiple small models designed to efficiently perform text generation tasks.

Apple releases OpenELM: a small, open-source AI model designed to run on the device

Apple has jumped into the open-source AI battle with four new models on Hugging Face!

There are eight members of the OpenELM model family, four of which are pre-trained models and the other four are instruction-fine-tuned models, with parameter sizes ranging from 270 million to 3 billion (i.e., the number of connections between artificial neurons in a large model, more parameters usually mean better performance and more functionality, but not absolutely). Whereas the Microsoft Phi-3 model is 3.8 billion.

Pre-training is an important way for large models to generate continuous, usable text, while instruction fine-tuning allows the model to respond to specific user requests with more relevant outputs. Specifically, pre-trained models often complete the requirements by adding new text on top of the prompt words, such as the user's prompt "teach me how to bake bread", the model may not give step-by-step instructions, but will stupidly reply "bake in a home oven". And this problem can be solved precisely by fine-tuning instructions.

OpenELM improves the performance of the Transformer language model by adopting a hierarchical scaling strategy and fine-tuning after pre-training on public datasets. As a result, OpenELM's transformer layers do not have the same set of parameters, but have different configurations and parameters. Such a strategy can significantly improve the accuracy of the model. For example, with a budget of about one billion parameters, OpenELM is 2.36% more accurate than OLMo, and the number of tokens required for pre-training is halved.

Apple publishes the weights of OpenELM models under what it calls a "sample code license," as well as the different checkpoints in training, model performance statistics, and instructions for pre-training, evaluation, instruction fine-tuning, and parameter efficiency tuning. Netizens commented, "It can be said that it is very friendly to developers, after all, a large part of the difficulty of the deep network exists parameter adjustment." ”

Apple's Sample Code License does not prohibit commercial use or modification, and only requires that "if you redistribute Apple Software in its entirety and without modification, you must retain this notice and the following text with the disclaimer in all such distributions." ”

The license is not recognized as an open source license, and while Apple is not overly restrictive, it does make it clear that Apple reserves the right to file a patent claim if any derivative work based on OpenELM is deemed to infringe its rights.

Apple further emphasized that these models "do not offer any security guarantees." As a result, the model may generate inaccurate, harmful, biased, or offensive output based on word prompts. ”

OpenELM is just the latest in a surprising series of open-source AI models released by Apple. In October last year, Apple quietly released Ferret, an open-source language model with multimodal capabilities, which quickly attracted attention from all walks of life.

At present, the field of large models is mainly divided into two camps: open source and closed source. Representative companies in the closed-source camp include OpenAI, Anthropic, Google, Midjourney, Udio, Baidu, iFLYTEK, Mobvoi, and Dark Side of the Moon. Representative companies in the open source camp include Meta, Microsoft, Google, Baichuan Intelligence, Alibaba, and Zero One Everything. These companies are committed to opening up the technology and code of large models, and encourage developers and researchers to participate in the development and improvement of models.

Apple has long been known for being mysterious and "closed" to the outside world, but this time it has rarely joined the camp of open source large models. Previously, Apple did not publicly announce or discuss its explorations in the field of AI, except for posting models and papers online.

What do we know about OpenELM?

Although OpenELM (the full name of open-source efficient language models) has only recently been released and has not yet been publicly tested, Apple pointed out on Hugging Face that its goal is to run these models on the device side. That's clearly following in the footsteps of rivals Google, Samsung, and Microsoft, which just released the Phi-3 Mini model this week that will run purely on smartphones.

In a model elaboration paper published in arXiv.org, Apple said that the development of OpenELM was "led by Sachin Mehta, with additional contributions from Mohammad Rastegrai and Peter Zatloukal," and that the model family "aims to enhance and empower the open research community for the future of research." ”

Apple's OpenELM models are available at four scales, with 270 million, 450 million, 1.1 billion, and 3 billion parameters, each smaller than existing high-performance models (typically 7 billion parameters) and available in both pre-trained and instruction-tuned versions.

The pre-training of these models uses a public dataset totaling 1.8 trillion tokens from Reddit, Wikipedia, arXiv.org, and others.

The OpenELM model is suitable for running on business laptops and even some smartphones. In the paper, Apple states that they ran benchmarks "on an NVIDIA RTX 4090 GPU with an Intel i9-13900KF CPU, 64 GB DDR5-4000 DRAM, and 24 GB VRAM on a workstation running Ubuntu 22.04" and "on an Apple MacBook Pro with an M2 Max SoC with 64 GiB of RAM running macOS 14.4.1."

Users test and run the OpenELM model

Interestingly, all models in the new family employ a hierarchical scaling strategy to allocate parameters within each layer of the Transformer model.

According to Apple, this approach provides more accurate results while improving computational efficiency. The company also pre-trained the model using the new CoreNet library.

"Our pre-trained dataset contains RefinedWeb, a deduplicated version of PILE, a subset of RedPajama, and a subset of Dolma v1.6, with a total size of about 1.8 trillion tokens," the company mentioned on Hugging Face. ”

It's positive, but the performance isn't top-notch

In terms of performance, Apple's published results show that the OpenELM model is quite good, especially with its 450 million parameter version.

In addition, the 1.1 billion parameter version of OpenELM "improves performance by 2.36% over the 1.2 billion parameter OLMo model and requires only half the pre-trained tokens." "OLMo is a "truly open-source and state-of-the-art large language model" recently released by the Allen AI Institute (AI2).

The pre-trained version of OpenELM-3B achieved an accuracy rate of 42.24% in the ARC-C benchmark, which emphasizes testing knowledge and reasoning skills, while achieving 26.76% and 73.28% on MMLU and HellaSwag, respectively.

One user who participated in the series of tests noted that Apple's model appears to be "stable and consistent," meaning that the response is not flexible and creative, and it is unlikely to risk content that is "not suitable for browsing at work."

With 3.8 billion parameters and 4k context length, rival Microsoft's recently launched Phi-3 Mini is still in the field of performance.

According to the latest released statistics, the Phi-3 Mini scored 84.9% on the 10-shot ARC-C benchmark, 68.8% on the 5-shot MMLU, and 76.7% on the 5-shot Hellaswag.

But in the long run, OpenELM will certainly continue to improve. At present, the open source model community is very excited about Apple's joining, and is looking forward to seeing how this "closed-source" giant can introduce its results into various application scenarios.

Large models are the future of smartphones

Mobile phone manufacturers are optimistic about the future of AI on mobile phones.

Companies such as Qualcomm and MediaTek have introduced smartphone chipsets that can meet the processing power required for AI applications. Previously, AI applications on many devices were actually partially processed in the cloud and then downloaded to the phone. However, there are also disadvantages of cloud models, such as the high cost of inference, and the cost of training + generating an image for some AI startups may cost one yuan. Advanced chips and device-side models will drive more AI applications to run on mobile phones, saving costs and bringing users better real-time computing capabilities, thus giving birth to new business models.

It has only been about a year since ChatGPT became popular, and mobile phone manufacturers have implemented AI large model technology in their mobile phones.

Samsung's newly released Galaxy S24 series this year is equipped with an on-device Galaxy AI that can process voice, text, and images. Google has also released the Pixel 8 series, a phone with its own AI model, which is powered by the Gemini Nano. Brian Rakowski, vice president of product management for Google's Pixel division, also said that Google's most advanced large models will also arrive directly on smartphones next year, "We've made quite a few breakthroughs in compressing these models." ”

Domestic leading mobile phone manufacturers are also vying for layout. Xiaomi released the surging OS and various applications supported by Xiaomi's self-developed large model in October last year, vivo also announced the launch of the Blue Heart large model last year, and open-sourced the BlueLM-7B device-cloud large model for mobile phones, and OPPO also released the Andean large model (AndesGPT) in November last year, with "device-cloud collaboration" as the infrastructure design idea, and launched a variety of model specifications with different parameter scales.

One of the highlights of this year's MWC was the ability of the large model to run locally on the device itself, "and that's where the biggest game changer. Ben Wood, principal analyst at CCS Insight, lamented. At the conference, some futuristic AI concept phones were also showcased, such as Deutsche Telekom and Brain.ai's T phone, which completely abandoned apps in favor of an AI interface. As a result, there are also predictions that the end of the app era may be just around the corner as AI takes over our smartphones, bringing about a whole new ecosystem and competitive landscape.

The battle of mobile phone large models, before only Apple, and now, Apple finally came with its open source large model.

Original link: Apple Releases OpenELM: A Small Open-Source AI Model Designed to Run on the Device Side_Generative AI_Tina_InfoQ Featured Article

Apple releases OpenELM: a small, open-source AI model designed to run on the device

What do we know about OpenELM?

It's positive, but the performance isn't top-notch

Large models are the future of smartphones

Read on