Apple is still very ahead of the curve in small models

author：Not bald programmer 2024-04-26 19:46:00

In the field of AI, small model technology is becoming increasingly popular because these models can be run directly on personal devices without relying on large data centers in the cloud. Apple recently launched the OpenELM project, which includes a series of small AI language models that are small enough to run directly on smartphones. Currently, these models are primarily used for proof-of-concept and research, but could become the basis for AI products on Apple devices in the future.

Apple's new AI models, collectively named OpenELM (Open Source Efficient Language Models), are now available on the Hugging Face platform under Apple's specific Apple Sample Code License. While this license is limited and may not fit the usual definition of "open source", you can access the source code for OpenELM.

Link: https://huggingface.co/apple/OpenELM

Microsoft's Phi-3 model aims to be similar to OpenELM: to achieve effective language understanding and processing power in small AI models that can run on local devices. For example, Phi-3-mini's model has 3.8 billion parameters, while Apple's OpenELM model is even smaller, with eight different models ranging from 270 million to 3 billion parameters.

In comparison, the largest model in Meta's Llama 3 series has 70 billion parameters, while OpenAI's GPT-3 model reached 175 billion parameters when it was launched in 2020. The number of parameters is a way to measure the complexity and capabilities of an AI model. The research trend in recent years has been to bring small models to the level of capability of large models a few years ago.

OpenELM's eight models are divided into two categories: four pre-trained models (i.e., the original and next token version of the model) and four instruction-tuned models (optimized for instruction-following, which are more suitable for developing AI assistants and chatbots):

OpenELM-270M：https://huggingface.co/apple/OpenELM-270M
OpenELM-450M：https://huggingface.co/apple/OpenELM-450M
OpenELM-1_1B：https://huggingface.co/apple/OpenELM-1_1B
OpenELM-3B：https://huggingface.co/apple/OpenELM-3B
OpenELM-270M-Instruct：https://huggingface.co/apple/OpenELM-270M-Instruct
OpenELM-450M-Instruct：https://huggingface.co/apple/OpenELM-450M-Instruct
OpenELM-1_1B-Instruct：https://huggingface.co/apple/OpenELM-1_1B-Instruct
OpenELM-3B-Instruct：https://huggingface.co/apple/OpenELM-3B-Instruct

The maximum processing window for these models is 2048 tokens. They were trained on several publicly available datasets, including a refined web dataset, a subset of RedPajama, and a subset of Dolma v1.6, which together have about 1.8 trillion tokens, according to Apple. A token is a fragmented representation of the data that AI uses when processing language.

Apple has adopted a strategy called "layer-by-layer scaling," which more efficiently distributes parameters across layers of the model, which not only saves compute resources, but also improves the model's performance while using fewer tokens. According to Apple's white paper, this strategy enables OpenELM to improve accuracy by 2.36% over Allen AI's OLMo 1B model, and requires only half the pre-trained token.

A comparison table of OpenELM with other similar small AI language models, taken from Apple's OpenELM research paper

Apple has also released the CoreNet library code for training OpenELM and provided training recipes that reproduce model weights, a rarity among large tech companies. As Apple outlines in its abstract, ensuring the replicability and transparency of LLMs is critical to driving open research, ensuring the reliability of research results, and exploring issues such as data and model bias.

By publishing source code, model weights, and training materials, Apple hopes to "empower and enrich the open research community." Apple also cautions that because models are trained on publicly available datasets, there is a risk that the model may produce inaccurate, harmful, biased, or objectionable output in response to user input.

Although Apple has yet to integrate these new AI language model technologies into its consumer devices, the iOS 18 update announced at WWDC in June is expected to include new AI features that leverage on-device processing to ensure user privacy. In addition, Apple may consider partnering with Google or OpenAI to handle more complex AI processing tasks that require off-device processing to greatly improve Siri's capabilities.

Original link: https://arstechnica.com/information-technology/2024/04/apple-releases-eight-small-ai-language-models-aimed-at-on-device-use/

Apple is still very ahead of the curve in small models

Read on

iFLYTEK's profit puzzle: high investment and low return in the field of large models

Ali Lin Junyang: Large models are not enough for many people, and building multimodal agents is the key

Li Feifei, the godmother of AI, founded a spatial intelligence company that strives to overcome the existing limitations of large-scale AI technology

"Butterfly Model" classic example class notes

Li Feifei, the "godmother of AI", founded a spatial intelligence company in an effort to overcome the existing limitations of AI technologies such as large models

The large model engages in "human flesh search", and the accuracy rate is as high as 95.8%!

Product Life (4): From "User Story Mapping" to "WOOP Mindset"

Surveying and Mapping Bulletin | Li Yayun: Research and Application of Multi-scale Population Spatial Big Data Aggregation Model in Map Visualization

Kimi large model: the advantages are obvious, but it is a money-burning game

Sunday Jingxue (139): Journal Paper 2.1 Wholesale Price Contract Model in Traditional Supply Chain

Northeastern University has proposed a video data augmentation method that can make video models learn better representations

Geely vast platform + Baidu AI model, Jiyue 07 is the strongest opponent of Xiaomi SU7?

8 major calculation models for buoyancy calculation

After a year of large-scale models, how is iFLYTEK?

Huawei Glory tied for the championship, Apple's sales fell by 19%, and Huawei soared by 70% to dominate the list

What model are the distinguished iPad users using (taking stock of the history of Apple's iPad Air)

I thought apples were the most boring fruits, until I ate Lotus Mist!

The dawn is gone! Completely ban Apple, Cook fights back"

In the first quarter, Apple earned $23.6 billion, Microsoft $21.9 billion, and Huawei 19.6 billion yuan

After he couldn't afford to buy an Apple 15, the father knelt down and begged his daughter, but it caused embarrassment and sadness!

I just gave birth to my son and want to eat durian, my mother-in-law is too expensive, and my husband: Be generous, apples are also nutritious

It's not Huawei and Xiaomi that beat Apple, it's Erbin! 70% power loss in 3 minutes.

Apple is "self-defeating", and the newly launched 498 yuan mobile phone case draws the dragon as a python to cause jokes

The father knelt down because he couldn't afford the apple 15 his daughter wanted, but the daughter thought he was ashamed

Altman selected netizen prompts and generated them with OpenAI's new large model Sora

Fubao can't eat apples whole when he returns to China, and his teeth won't be white?

Preview of the Global Industry Morning Post: Apple CEO Tim Cook is reported to be previewing new AI features next week

Vision Pro didn't take off, the next story is AI: Apple became a company that sells dreams

Apple's new product will be released at 10 p.m. tomorrow night, and Zhipu AI is developing a "Sora" product

iPhone sales fell by 10%, and Apple will officially announce new AI features in addition to releasing a new iPad

Apple's May press conference broke the news: the first "AI hardware", small size or out of stock!

Who is the Chinese version of Sora?