laitimes

Apple AI is finally here! From 2.7 to 3 billion, the code of the four large models is all open source, and AI technology continues to "soar".

author:Titanium Media APP
Apple AI is finally here! From 2.7 to 3 billion, the code of the four large models is all open source, and AI technology continues to "soar".

(Image source: Apple's official website)

Apple suddenly announced a big news.

In the early morning of April 25, Beijing time, Apple released an "efficient language model with an open-source training and inference framework" on the Hugging Face platform, called OpenELM.

It is understood that OpenELM has four sizes: 270 million, 450 million, 1.1 billion and 3 billion parameter versions, positioned in an ultra-small-scale model, while Microsoft's Phi-3 model is 3.8 billion. This small model is cheaper to run and can run on devices such as mobile phones and laptops.

At the same time, before the WWDC24 developer conference, Apple thoroughly open-sourced OpenELM model weights and inference code, datasets and training logs, etc. Moreover, Apple has also open-sourced the neural network library CoreNet.

Apple AI is finally here! From 2.7 to 3 billion, the code of the four large models is all open source, and AI technology continues to "soar".

Back in February, Apple CEO Tim Cook said that Apple's generative AI capabilities would be available "later this year," with news that iOS 18 would be released in June and could be the "biggest" update in Apple's iOS history, with the first AI iPhone device coming in September.

Now, Apple seems to be catching up with the industry as the new wave of AI draws to a close.

Code: https://github.com/apple/corenet

hf: https://huggingface.co/apple/OpenELM

Thesis: https://arxiv.org/abs/2404.14619

The number of pre-trained tokens has been halved, but the effect of the 1.1 billion parameter Apple model is more accurate than that of competing products

With ChatGPT becoming popular all over the world, in recent months, mobile phone manufacturers such as Samsung, Google, and Xiaomi have comprehensively promoted the use of large language models on mobile phones, tablets and other devices, including photo processing, word processing enhancement, etc., and formed a major selling point. Apple, on the other hand, rarely reveals and rarely has similar built-in features, mainly using third-party tools to achieve similar results.

During the February earnings call, Cook first unveiled plans for generative AI and will integrate AI technology into his software platforms (iOS, iPadOS and macOS) later this year.

"I just want to say that I think there's a huge opportunity for Apple in generative AI and AI, and there's no need to reveal more details and not exceed our expectations," Cook said. Going forward, we will continue to invest in these and other technologies that will shape the future. That includes AI, where we continue to spend a lot of time and energy, and we're excited to share details of the work we're doing in that space later this year. We're very excited about it. ”

实际上,自年初至今,苹果在生成式 AI 领域动作不断。 今年3月,苹果技术团队发表论文《MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training》,首次披露苹果大模型MM1,涵盖300亿参数、支持多模态、支持MoE架构,超半数作者属于华人。

Now, for mobile phones, tablets and other end-side fields, Apple's real open-source model has finally arrived.

According to the paper, Apple has open-sourced the large language model OpenELM, which has two model versions: instruction fine-tuning and pre-training, with a total of 270 million, 450 million, 1.1 billion and 3 billion 4 parameters, providing functions such as generating text, code, translation, and summarizing summaries.

Although the smallest parameter is only 270 million, Apple used public datasets including RefinedWeb, deduplicated PILE, a subset of RedPajama, and a subset of Dolma v1.6 for a total of about 1.8 trillion tokens for pre-training, which is one of the main reasons why it can show super performance with small parameters.

For example, OpenELM, which has 1.1 billion parameters, is 2.36% more accurate than the 1.2 billion parameter OLMo model, while using only half the pre-trained data.

Apple AI is finally here! From 2.7 to 3 billion, the code of the four large models is all open source, and AI technology continues to "soar".

In the training process, Apple used CoreNet as the training framework and used the Adam optimization algorithm for 350,000 iterative trainings. Apple's well-known research such as MobileOne, CVNets, MobileViT, and FastVit is based on CoreNet.

Apple also said in the paper that unlike the previous practice of only providing model weights and inference code and pre-training on private datasets, the version released by Apple includes a complete framework for training and evaluating language models on public datasets, including training logs, multiple checkpoints, and pre-training configurations. At the same time, Apple also released code to convert the model into an MLX library for inference and fine-tuning on Apple devices.

"This general release aims to strengthen and strengthen the open research community and pave the way for future open research work. Apple's research team said.

In addition, OpenELM does not use any learnable bias parameters in the fully connected layer, uses RMSNorm for pre-normalization, and uses rotation position embedding to encode position information. OpenELM also replaces multi-head attention with grouped query attention, replaces the traditional feedforward network with SwiGLU FFN, and uses Flash attention to calculate scaled dot product attention, which can be used for training and inference with fewer resources. Similarly, Apple uses dynamic word segmentation and data filtering to achieve real-time filtering and word segmentation, thereby simplifying the experimental process and increasing flexibility. The same tokenizer as Meta's Llama was also used to ensure consistency in the experiment.

This time, Apple is very sincere to open source the code, open it to the end, and contribute all the content. In just over 1 day, the model received more than 1,100 stars on the GitHub platform.

At present, the field of large models is mainly divided into two camps: open source and closed source, with well-known closed source companies at home and abroad including OpenAI, Anthropic, Google, Midjourney, Baidu, Mobvoi, etc., and open source camps including Meta, Microsoft, Google, SenseTime, Baichuan Intelligence, and 010,000 things.

Apple, as a leader in the closed-source field of mobile phones, has rarely joined the camp of open source large models this time. Some analysts believe that this may be following Google's approach to first attract users through open source, and then use closed-source products to achieve commercial profits.

At the same time, it also shows Apple's firm determination to enter the field of AI large models.

As an enterprise with both end-side models and open source models, Wang Xiaogang, co-founder and chief scientist of SenseTime, recently told Titanium Media App that open source is still very important for the development of the entire community and is an important driving force. In the end, the development of the large model includes various applications, and it is still necessary for the whole community to promote it. The application of large models is also divided into different levels, and the needs of so many industries for applications are also different, and a rich open source community is very important.

AI technology continues to "soar", and OpenAI has won the world's first DGX H200

Not only Apple, in the early hours of this morning, AI technology at home and abroad is still "soaring", and related news is flying all over the sky.

This morning, OpenAI co-founder and COO Greg Brockman tweeted that Nvidia handed over the world's first DGX H200 to the company, a move aimed at "advancing the development of artificial intelligence, computing technology, and humanity."

At the same time, he also posted a group photo, which also included Nvidia CEO Jensen Huang and OpenAI CEO Sam Altman, and it seemed that the three of them were very happy.

Apple AI is finally here! From 2.7 to 3 billion, the code of the four large models is all open source, and AI technology continues to "soar".

Back in 2016, shortly after OpenAI was founded, Huang personally delivered the world's first supercomputer, DGX-1, equipped with eight NVIDIA P100 chips, to OpenAI's office.

The more than $1 million DGX-1 is the result of three years of meticulous construction by Huang and 3,000 employees at LinkedIn. It has dramatically boosted OpenAI's computing power, reducing what would have taken a year to train to just one month.

At the time, OpenAI was still a start-up non-profit organization, and the supercomputer was undoubtedly a very important gift. Elon Musk, Sam Altman, and other early employees were so excited that they left their signatures on the DGX-1.

On November 13, 2023, NVIDIA released a new generation of AI GPU, the NVIDIA Grace Hopper H200 superchip, which has twice the memory capacity and bandwidth of the H100, supports up to 19.5TB, and AI performance reaches 128 petaFLOPS FP8, which is expected to be available in the second quarter of 2024.

According to Huang, this is a new trillion-scale AI supercomputer that offers massive shared memory space with linear scalability for giant AI models, and has great potential in the generative AI era.

Now, Huang personally gave the world's first DGX H200 to OpenAI.

At the same time, according to CTech, Nvidia acquired the Israeli AI infrastructure orchestration and management service Run:ai for about $700 million, which is reported to be founded in 2018 and has raised $118 million so far, while Nvidia also acquired Deci.

In addition, in the early hours of this morning, Cognition, the company behind the world's first AI code engineer, was revealed to have completed a round of financing of $175 million, led by Founders Fund, and in just one month, the company's valuation increased from $350 million to $2 billion, which attracted attention.

Gartner analyst John-David Lovelock said that with first-tier players such as Anthropic and OpenAI dominating, the scope of AI investment is "spreading out".

"The amount of billions of dollars in investment has slowed and is almost over, and hot money is pouring in a new direction – AI applications. "Large models require significant investment, but the market is now more influenced by tech companies that will leverage existing AI products, services, and offerings to build new products." ”

Seth Rosenberg, a partner at Greylock, believes that interest in funding a "large number of new players" in the AI space is inherently small. In the early stages of the cycle, investment in underlying models was very capital-intensive, compared to the low capital required for AI applications and agents, which may have contributed to the decline in absolute dollar funding.

Umesh Padval, managing director at Thomvest Ventures, attributed the overall reduction in AI investment to lower-than-expected growth. He said that the initial enthusiasm has given way to reality – that AI faces some technical challenges, some go-to-market challenges, and it may take years to solve and fully overcome.

"The slowdown in AI investment reflects the recognition that we are still exploring the early stages of AI technology development and its application across industries. While the long-term market potential remains enormous, the initial enthusiasm has been tempered by the complexity and challenges of rolling out AI technology in real-world applications...... This indicates that the investment environment is more mature and sensitive. Umesh Padval said.

Nowadays, AI continues to "soar", but the direction of the entire market has changed rapidly, and end-to-end models, AI applications, and industry models will become new trends in the entire AI field this year.

(This article was first published on Titanium Media App, author | Lin Zhijia, editor | Hu Runfeng)

Read on