Apple AI is finally here! From 2.7 to 3 billion, the code of the four large models is all open source, and AI technology continues to "soar".

(Image source: Apple's official website)

Apple suddenly announced a big news.

In the early morning of April 25, Beijing time, Apple released an "efficient language model with an open-source training and inference framework" on the Hugging Face platform, called OpenELM.

It is understood that OpenELM has four sizes: 270 million, 450 million, 1.1 billion and 3 billion parameter versions, positioned in an ultra-small-scale model, while Microsoft's Phi-3 model is 3.8 billion. This small model is cheaper to run and can run on devices such as mobile phones and laptops.

At the same time, before the WWDC24 developer conference, Apple thoroughly open-sourced OpenELM model weights and inference code, datasets and training logs, etc. Moreover, Apple has also open-sourced the neural network library CoreNet.

Back in February, Apple CEO Tim Cook said that Apple's generative AI capabilities would be available "later this year," with news that iOS 18 would be released in June and could be the "biggest" update in Apple's iOS history, with the first AI iPhone device coming in September.

Now, Apple seems to be catching up with the industry as the new wave of AI draws to a close.

Code: https://github.com/apple/corenet

hf: https://huggingface.co/apple/OpenELM

Thesis: https://arxiv.org/abs/2404.14619

The number of pre-trained tokens has been halved, but the effect of the 1.1 billion parameter Apple model is more accurate than that of competing products

With ChatGPT becoming popular all over the world, in recent months, mobile phone manufacturers such as Samsung, Google, and Xiaomi have comprehensively promoted the use of large language models on mobile phones, tablets and other devices, including photo processing, word processing enhancement, etc., and formed a major selling point. Apple, on the other hand, rarely reveals and rarely has similar built-in features, mainly using third-party tools to achieve similar results.

During the February earnings call, Cook first unveiled plans for generative AI and will integrate AI technology into his software platforms (iOS, iPadOS and macOS) later this year.

"I just want to say that I think there's a huge opportunity for Apple in generative AI and AI, and there's no need to reveal more details and not exceed our expectations," Cook said. Going forward, we will continue to invest in these and other technologies that will shape the future. That includes AI, where we continue to spend a lot of time and energy, and we're excited to share details of the work we're doing in that space later this year. We're very excited about it. ”

实际上，自年初至今，苹果在生成式 AI 领域动作不断。今年3月，苹果技术团队发表论文《MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training》，首次披露苹果大模型MM1，涵盖300亿参数、支持多模态、支持MoE架构，超半数作者属于华人。

Now, for mobile phones, tablets and other end-side fields, Apple's real open-source model has finally arrived.

According to the paper, Apple has open-sourced the large language model OpenELM, which has two model versions: instruction fine-tuning and pre-training, with a total of 270 million, 450 million, 1.1 billion and 3 billion 4 parameters, providing functions such as generating text, code, translation, and summarizing summaries.

Although the smallest parameter is only 270 million, Apple used public datasets including RefinedWeb, deduplicated PILE, a subset of RedPajama, and a subset of Dolma v1.6 for a total of about 1.8 trillion tokens for pre-training, which is one of the main reasons why it can show super performance with small parameters.

For example, OpenELM, which has 1.1 billion parameters, is 2.36% more accurate than the 1.2 billion parameter OLMo model, while using only half the pre-trained data.

In the training process, Apple used CoreNet as the training framework and used the Adam optimization algorithm for 350,000 iterative trainings. Apple's well-known research such as MobileOne, CVNets, MobileViT, and FastVit is based on CoreNet.

Apple also said in the paper that unlike the previous practice of only providing model weights and inference code and pre-training on private datasets, the version released by Apple includes a complete framework for training and evaluating language models on public datasets, including training logs, multiple checkpoints, and pre-training configurations. At the same time, Apple also released code to convert the model into an MLX library for inference and fine-tuning on Apple devices.

"This general release aims to strengthen and strengthen the open research community and pave the way for future open research work. Apple's research team said.

In addition, OpenELM does not use any learnable bias parameters in the fully connected layer, uses RMSNorm for pre-normalization, and uses rotation position embedding to encode position information. OpenELM also replaces multi-head attention with grouped query attention, replaces the traditional feedforward network with SwiGLU FFN, and uses Flash attention to calculate scaled dot product attention, which can be used for training and inference with fewer resources. Similarly, Apple uses dynamic word segmentation and data filtering to achieve real-time filtering and word segmentation, thereby simplifying the experimental process and increasing flexibility. The same tokenizer as Meta's Llama was also used to ensure consistency in the experiment.

This time, Apple is very sincere to open source the code, open it to the end, and contribute all the content. In just over 1 day, the model received more than 1,100 stars on the GitHub platform.

At present, the field of large models is mainly divided into two camps: open source and closed source, with well-known closed source companies at home and abroad including OpenAI, Anthropic, Google, Midjourney, Baidu, Mobvoi, etc., and open source camps including Meta, Microsoft, Google, SenseTime, Baichuan Intelligence, and 010,000 things.

Apple, as a leader in the closed-source field of mobile phones, has rarely joined the camp of open source large models this time. Some analysts believe that this may be following Google's approach to first attract users through open source, and then use closed-source products to achieve commercial profits.

At the same time, it also shows Apple's firm determination to enter the field of AI large models.

As an enterprise with both end-side models and open source models, Wang Xiaogang, co-founder and chief scientist of SenseTime, recently told Titanium Media App that open source is still very important for the development of the entire community and is an important driving force. In the end, the development of the large model includes various applications, and it is still necessary for the whole community to promote it. The application of large models is also divided into different levels, and the needs of so many industries for applications are also different, and a rich open source community is very important.

AI technology continues to "soar", and OpenAI has won the world's first DGX H200

Not only Apple, in the early hours of this morning, AI technology at home and abroad is still "soaring", and related news is flying all over the sky.

This morning, OpenAI co-founder and COO Greg Brockman tweeted that Nvidia handed over the world's first DGX H200 to the company, a move aimed at "advancing the development of artificial intelligence, computing technology, and humanity."

At the same time, he also posted a group photo, which also included Nvidia CEO Jensen Huang and OpenAI CEO Sam Altman, and it seemed that the three of them were very happy.

Back in 2016, shortly after OpenAI was founded, Huang personally delivered the world's first supercomputer, DGX-1, equipped with eight NVIDIA P100 chips, to OpenAI's office.

The more than $1 million DGX-1 is the result of three years of meticulous construction by Huang and 3,000 employees at LinkedIn. It has dramatically boosted OpenAI's computing power, reducing what would have taken a year to train to just one month.

At the time, OpenAI was still a start-up non-profit organization, and the supercomputer was undoubtedly a very important gift. Elon Musk, Sam Altman, and other early employees were so excited that they left their signatures on the DGX-1.

On November 13, 2023, NVIDIA released a new generation of AI GPU, the NVIDIA Grace Hopper H200 superchip, which has twice the memory capacity and bandwidth of the H100, supports up to 19.5TB, and AI performance reaches 128 petaFLOPS FP8, which is expected to be available in the second quarter of 2024.

According to Huang, this is a new trillion-scale AI supercomputer that offers massive shared memory space with linear scalability for giant AI models, and has great potential in the generative AI era.

Now, Huang personally gave the world's first DGX H200 to OpenAI.

At the same time, according to CTech, Nvidia acquired the Israeli AI infrastructure orchestration and management service Run:ai for about $700 million, which is reported to be founded in 2018 and has raised $118 million so far, while Nvidia also acquired Deci.

In addition, in the early hours of this morning, Cognition, the company behind the world's first AI code engineer, was revealed to have completed a round of financing of $175 million, led by Founders Fund, and in just one month, the company's valuation increased from $350 million to $2 billion, which attracted attention.

Gartner analyst John-David Lovelock said that with first-tier players such as Anthropic and OpenAI dominating, the scope of AI investment is "spreading out".

"The amount of billions of dollars in investment has slowed and is almost over, and hot money is pouring in a new direction – AI applications. "Large models require significant investment, but the market is now more influenced by tech companies that will leverage existing AI products, services, and offerings to build new products." ”

Seth Rosenberg, a partner at Greylock, believes that interest in funding a "large number of new players" in the AI space is inherently small. In the early stages of the cycle, investment in underlying models was very capital-intensive, compared to the low capital required for AI applications and agents, which may have contributed to the decline in absolute dollar funding.

Umesh Padval, managing director at Thomvest Ventures, attributed the overall reduction in AI investment to lower-than-expected growth. He said that the initial enthusiasm has given way to reality – that AI faces some technical challenges, some go-to-market challenges, and it may take years to solve and fully overcome.

"The slowdown in AI investment reflects the recognition that we are still exploring the early stages of AI technology development and its application across industries. While the long-term market potential remains enormous, the initial enthusiasm has been tempered by the complexity and challenges of rolling out AI technology in real-world applications...... This indicates that the investment environment is more mature and sensitive. Umesh Padval said.

Nowadays, AI continues to "soar", but the direction of the entire market has changed rapidly, and end-to-end models, AI applications, and industry models will become new trends in the entire AI field this year.

(This article was first published on Titanium Media App, author | Lin Zhijia, editor | Hu Runfeng)

Apple AI is finally here! From 2.7 to 3 billion, the code of the four large models is all open source, and AI technology continues to "soar".

The number of pre-trained tokens has been halved, but the effect of the 1.1 billion parameter Apple model is more accurate than that of competing products

AI technology continues to "soar", and OpenAI has won the world's first DGX H200

Read on

轩辕大模型的实践与应用 | ML-Summit 2024

The mobile UI model came out, and the Apple iPhone may welcome a new cycle of upgrades

iFLYTEK does not tell the "sexy story" of large models

Meta released the "strongest open-source AI model", and the next generation may be stronger than GPT

面壁新模型:早于Llama3、比肩 Llama3、推理超越 Llama3!

Huawei's profit soared by 564% in the first quarter, Tianya community recovered, and Xiaohongshu tested its self-developed large model

13 Models of Effective Communication Expression

Eat through an industrial chain in one day: NO.37 AI large model industrial chain

10 domestic large models vs. mentally handicapped - Chinese comprehension ability assessment

The most complete interpretation of the MoE hybrid expert model: revealing the key technologies and challenges

Baidu's strongest SOTA: 3DGS based on diffusion model!

Sprint 2024 "Half Year Red" | Sixty percent of AI companies have achieved profitable growth, and large model companies have made money?

Dialogue with UBTECH Jiao Jichao: Large model accelerates humanoid robots to "work in the factory"

iFLYTEK's profit puzzle: high investment and low return in the field of large models

Ali Lin Junyang: Large models are not enough for many people, and building multimodal agents is the key

Li Feifei, the godmother of AI, founded a spatial intelligence company that strives to overcome the existing limitations of large-scale AI technology

Apple's mobile phone has taken a big dive, and it will not suffer a loss to buy in May, it is more cost-effective to buy these three, don't buy it wrong

Apple announced its 2024Q2 earnings report, with both revenue and profit declining, but announced the largest buyback in history

Nearly a year after the release of the Vision Pro, Apple's dream has come to naught?

U.S. Forum: Why can't China ban companies like Apple or Tesla when the U.S. bans Huawei?

fell from 6999 to 1658 yuan, from the high-end market to the low-end market, Apple A chip + iOS system

Last night and this morning, the world's largest companies | Huawei's Pura 70 series phones have landed in Malaysia, and Apple has announced the largest share buyback program in the United States

Apple's iPhone 16 series is revealed! Amazing design highlights the new image experience of the future

iPhone 32 hidden features

Why does Berkshire continue to reduce its holdings in Apple stock? Buffett's response came

There is a knack for choosing apples, remember 1 "little trick", sweet or not clear at a glance, the method is awesome

6 kinds of apple water are easy to use, sweet and sour! 99% of people don't know this!

Following Apple and Foxconn, Tesla officially announced that foreign media: they all moved away

It is estimated that the fruit is in a hurry by domestic production, and a large tube of toothpaste is squeezed for the iPhone16 series image. Recently, it was revealed that the iPhone16ProMax will be used 6 times

Apple relocated to India against the trend, what does Cook's smile reveal?

The father knelt down because he couldn't afford the apple 15 that his daughter longed for, but his daughter thought he was ashamed

Internet celebrity Brother Xiao Yang rebroadcast and overturned the car, and actually sold Apple refurbished machines, netizens are hotly discussed!