laitimes

Apple is still very ahead of the curve in small models

author:Not bald programmer
Apple is still very ahead of the curve in small models

In the field of AI, small model technology is becoming increasingly popular because these models can be run directly on personal devices without relying on large data centers in the cloud. Apple recently launched the OpenELM project, which includes a series of small AI language models that are small enough to run directly on smartphones. Currently, these models are primarily used for proof-of-concept and research, but could become the basis for AI products on Apple devices in the future.

Apple's new AI models, collectively named OpenELM (Open Source Efficient Language Models), are now available on the Hugging Face platform under Apple's specific Apple Sample Code License. While this license is limited and may not fit the usual definition of "open source", you can access the source code for OpenELM.

Link: https://huggingface.co/apple/OpenELM

Microsoft's Phi-3 model aims to be similar to OpenELM: to achieve effective language understanding and processing power in small AI models that can run on local devices. For example, Phi-3-mini's model has 3.8 billion parameters, while Apple's OpenELM model is even smaller, with eight different models ranging from 270 million to 3 billion parameters.

In comparison, the largest model in Meta's Llama 3 series has 70 billion parameters, while OpenAI's GPT-3 model reached 175 billion parameters when it was launched in 2020. The number of parameters is a way to measure the complexity and capabilities of an AI model. The research trend in recent years has been to bring small models to the level of capability of large models a few years ago.

OpenELM's eight models are divided into two categories: four pre-trained models (i.e., the original and next token version of the model) and four instruction-tuned models (optimized for instruction-following, which are more suitable for developing AI assistants and chatbots):

  • OpenELM-270M:https://huggingface.co/apple/OpenELM-270M
  • OpenELM-450M:https://huggingface.co/apple/OpenELM-450M
  • OpenELM-1_1B:https://huggingface.co/apple/OpenELM-1_1B
  • OpenELM-3B:https://huggingface.co/apple/OpenELM-3B
  • OpenELM-270M-Instruct:https://huggingface.co/apple/OpenELM-270M-Instruct
  • OpenELM-450M-Instruct:https://huggingface.co/apple/OpenELM-450M-Instruct
  • OpenELM-1_1B-Instruct:https://huggingface.co/apple/OpenELM-1_1B-Instruct
  • OpenELM-3B-Instruct:https://huggingface.co/apple/OpenELM-3B-Instruct

The maximum processing window for these models is 2048 tokens. They were trained on several publicly available datasets, including a refined web dataset, a subset of RedPajama, and a subset of Dolma v1.6, which together have about 1.8 trillion tokens, according to Apple. A token is a fragmented representation of the data that AI uses when processing language.

Apple has adopted a strategy called "layer-by-layer scaling," which more efficiently distributes parameters across layers of the model, which not only saves compute resources, but also improves the model's performance while using fewer tokens. According to Apple's white paper, this strategy enables OpenELM to improve accuracy by 2.36% over Allen AI's OLMo 1B model, and requires only half the pre-trained token.

Apple is still very ahead of the curve in small models

A comparison table of OpenELM with other similar small AI language models, taken from Apple's OpenELM research paper

Apple has also released the CoreNet library code for training OpenELM and provided training recipes that reproduce model weights, a rarity among large tech companies. As Apple outlines in its abstract, ensuring the replicability and transparency of LLMs is critical to driving open research, ensuring the reliability of research results, and exploring issues such as data and model bias.

By publishing source code, model weights, and training materials, Apple hopes to "empower and enrich the open research community." Apple also cautions that because models are trained on publicly available datasets, there is a risk that the model may produce inaccurate, harmful, biased, or objectionable output in response to user input.

Although Apple has yet to integrate these new AI language model technologies into its consumer devices, the iOS 18 update announced at WWDC in June is expected to include new AI features that leverage on-device processing to ensure user privacy. In addition, Apple may consider partnering with Google or OpenAI to handle more complex AI processing tasks that require off-device processing to greatly improve Siri's capabilities.

Original link: https://arstechnica.com/information-technology/2024/04/apple-releases-eight-small-ai-language-models-aimed-at-on-device-use/

Read on