laitimes

20,000-word review: OpenAI's technical underlying logic

20,000-word review: OpenAI's technical underlying logic

Since the release of ChatGPT, the technology, products, and startup ecology in the AI field have been iterating almost on a weekly basis. As the trigger of this AI boom and the leader of industry facts (and may remain so for a long time), OpenAI has a broad and far-reaching impact on the industry ecology.

Starting from OpenAI's AGI vision, this paper first analyzes how OpenAI has formed the LLM development route that we can observe step by step based on the two important technical judgments of Scale and Generative Model, and analyzes the underlying logic of this technical roadmap. Based on the analysis of vision and technology selection, the report fits the historical behavior of OpenAI with this technical roadmap, tries to explain many confusing historical behaviors, and further deduces its future behaviors. Finally, the report gives its own analysis on the development of ecology and industrial chain based on large models, and puts forward some questions for everyone to consider.

This is the product of our comprehensive, systematic and deep reverse engineering of OpenAI, which provides a unique perspective to analyze the historical behavior and future action prediction of OpenAI from the underlying vision, hoping to help workers who are engaged in large model research, development and investment in China.

First, OpenAI's AGI vision and adherence to GPT technology path

1. 1 OpenAI's AGI vision

Before starting the analysis, we will review OpenAI's description of its AGI goals at different times:

“Our goal is to advance digital intelligence in the way that is most likely to benefit humanity as a whole, unconstrained by a need to generate financial return.” ——2015年12月11日《Introducing OpenAI》

“Our mission is to ensure that artificial general intelligence—AI systems that are generally smarter than humans—benefits all of humanity.” ——2023年2月14日《Planning for AGI and beyond》

The first change is the addition of a description of AGI, indicating that AGI is more intelligent than human.

The second change was to change from not aiming for financial returns to being inclusive of humanity.

There is currently no precise definition of the concept of AGI that has been agreed upon. The former change is the judgment given by OpenAI based on the exploration of the past few years, and the essence of its pursuit of AGI has not changed. The latter is that OpenAI has adjusted its ownership structure and commercialization strategy after deeper technical exploration, and the logic behind it will be expanded in detail later.

Overall, given the high degree of consistency in OpenAI's historical rhetoric and actions, we have reason to believe that OpenAI has always and will continue to pursue inclusive AGI as its first goal - this assumption is the basic premise of subsequent ecological deductions in this article.

1.2 OpenAI's incomprehensible GPT "beliefs" in the past 5 years

Under the AGI vision, we see that OpenAI has firmly chosen the technical path of continuously injecting LLM (Large Language Model) with GPT (Generative Pre-trainning Transformer) architecture over the past 5 years. During this period, OpenAI's lonely and surprisingly large investment made outsiders feel that this was the degree of faith. But if we understand the nature of OpenAI's technology choice, we will find that this is actually OpenAI's rational judgment under the deep insight of technology.

OpenAI can be roughly divided into three stages in its development:

1.2.1 Phase 1: AGI Implementation Path Exploration (November 2015~June 2017)

During this period, OpenAI's technical path to AGI did not converge, and projects including OpenAI Gym (Robotics), OpenAI Five (Dota2) and a series of Generative Models were explored.

It is worth noting that these projects use Unsupervised Learning (Unsupervised Learning) or RL (Reinforcement Learning), which do not require annotated data and have good scalability. Unsupervised Learning and RL were a difficult and even more difficult to scale algorithm path at the beginning of OpenAI, but OpenAI seems to focus only on this industrially immature technology path and try to scale.

Studying OpenAI's articles and Ilya Sutskever's (OpenAI's chief scientist) during this period, we can glimpse OpenAI's two important technical judgments:

Important technical judgment 1: Scale

All of Ilya's presentations during this period emphasized the importance of scale. In fact, back to AlexNet, which made Ilya and others famous in 2012, the core essence of its algorithm is also to use the parallel computing power of the GPU to scale the neural network. The concept of scaling basic algorithms runs through Lya's nearly ten years of research. It is reasonable to assume that it is precisely because of the pursuit of Scale that Ilya and OpenAI emphasize the importance of RL and Generative Model so much.

For example, also around 2015 to play Dota2, AlphaGo chose to combine search technology variant RL to improve algorithm performance, while OpenAI Five chose a pure RL scale method (the RL Agent released during the period also played a huge role later).

Later, the well-known article "The Bitter Lesson" published by Rich Sutton in 2019 also pointed out: "Throughout the history of AI development in the past 70 years, finding ways to use larger computing power is always the most efficient means. ”

It is also under the concept of algorithm Scale that OpenAI pays great attention to algorithm engineering and engineering algorithm thinking, and builds a team architecture and computing infrastructure that closely cooperates with engineering algorithms.

Important technical judgment 2: Generative model

In OpenAI's June 2016 article "Generative Model", it was analyzed: "One of the core goals of OpenAI is to understand the world (physical and virtual), and Generative Model (generative model) is the highest probability path to achieve this goal." ”

In the April 2017 article "Learning to Generate Reviews and Discovering Sentiment" published by the Unsupervised Sentiment Neuron algorithm, it was pointed out that "really good predictions are related to understanding" and that "after only being trained to predict the next character, the neural network automatically learns to analyze emotions".

This article did not receive much attention at the time and was even rejected by ICLR 2018, but our analysis believes that this research result has a profound impact on the subsequent research of OpenAI, and also lays the foundation for the next stage of OpenAI all-in GPT route.

1.2.2 Phase 2: Technology Path Convergence and Exploring the Engineering Limits of GPT Path (June 2017 ~ December 2022)

In 2017, Transformer was born, and Transformer was more friendly to parallel training of language models, completing the last link needed by OpenAI. Since then, OpenAI has established the LLM of GPT architecture as the main direction, gradually transferred resources to LLM, and opened the way to explore the engineering limit of GPT algorithm path. At this stage, OpenAI's huge bet on the GPT path seemed inconceivable to the outside world at the time.

In June 2018, OpenAI released GPT-1, and two months later Google released BERT. BERT's performance in downstream comprehension tasks is amazing, not only higher than GPT-1 (117M), but also basically leads to the disappearance of the significance of NLP upstream task research.

While scholars across the NLP field turned to BERT research, OpenAI stepped up and launched GPT-2 (1.5B) in February 2019. Although GPT-2 performs well in generative tasks, it still lags behind BERT in comprehension tasks.

In this context, OpenAI still adheres to the GPT route and significantly increases the Scale speed, launching GPT-3 (175B) in May 2020. GPT-3 model parameters are 175B (100 times GPT-2), and the amount of training data is 500B tokens (50 times GPT-2).

GPT-3 directly leads to the restructuring of OpenAI's equity architecture and the transformation of commercialization strategies. In March 2019, OpenAI was restructured from a non-profit organization to a limited profit organization (100x profit cap for all shareholders). Sam Altman pointed out in the post: "We'll need to invest billions of dollars in upcoming years into large-scale cloud compute, attracting and retaining talented people, and building AI supercomputers. We want to increase our ability to raise capital while still serving our mission, and no pre-existing legal structure we know of strikes the right balance. Our solution is to create OpenAI LP as a hybrid of a for-profit and nonprofit—which we are calling a ‘capped-profit’ company.” This shows how determined OpenAI is to explore AGI's technical path through GPT at this time.

In terms of commercialization, OpenAI launched a commercial API interface. GPT-3 not only performs well in generative tasks, but also has begun to catch up in comprehension tasks, especially the ability of few-shot-learning and zero-shot-learning has attracted the attention of a large number of startups. In the following two years, the application ecosystem built on the GPT-3 API continued to develop and gradually prospered, giving birth to a series of star companies: Jasper (ARR of $90 million in 2022), Repl.it, Copy.ai and so on. During the release of GPT-3 and the ecological formation (2020-2022), OpenAI has not launched the next generation model, but has begun to focus on the alignment problem.

GPT-3 has demonstrated a strong understanding of language and can do many tasks if used effectively. But GPT-3's ability to understand is not human-like, in other words, it's hard to get GPT-3 to do what you ask it to do, even if it can. As the understanding and reasoning capabilities of the base of the model increase, OpenAI believes that Alignment becomes particularly important. In order to make the model accurately and faithfully respond to human appeals, OpenAI released InstructGPT in January 2022, and published a related article "Training Language Models to Follow Instructions with Human Feedback" in March 2022 detailing the method of fine-tuning the align model with instructions, and the subsequent iteration of InstructGPT is well known GPT-3.5。 GPT-3.5 received widespread praise after its launch, and later OpenAI directly replaced GPT-3.5 with GPT-3 as the default API interface.

So far, OpenAI's LLM products are all available in the form of API products, and are mainly for the B-side, researchers and individual developer markets.

1.2.3 Phase 3: Post-ChatGPT Phase (2022.12-present)

On November 30, 2022, just when the industry expected the release of GPT-4, OpenAI suddenly released ChatGPT, a conversational product that took less than 1 month to develop, triggering this round of AI boom. According to multiple sources, ChatGPT was released after OpenAI learned that Anthropic was about to release Claude (LLM-based conversational product, Early Access on March 14, 2023). It is reasonable to think that the popularity of ChatGPT and the resulting AI boom are beyond OpenAI's expectations and plans.

The base model of GPT-4 was actually trained in August 2022. OpenAI took a more cautious approach to LLM, which had an increasingly strong foundational understanding and reasoning capabilities, spending 6 months focusing on extensive testing and patching on issues such as Alignment, security, and factuals. On March 14, 2023, OpenAI released GPT-4 and related articles. Almost no technical details were disclosed in the article. At the same time, the currently disclosed GPT-4 API is a version that limits the few-shot capability, and does not open the basic model of the full capability to the public.

The release of ChatGPT set off a chain reaction:

C-side: For the first time, ChatGPT allows C-end users without programming capabilities to have an interface to interact with LLM, and the public can comprehensively explore LLM capabilities from various scenarios. Taking educational scenarios as an example, a sample survey by US media said that 89% of college students and 22% of K-12 students are already using ChatGPT to complete assignments and essays. As of March 2023, ChatGPT's website has more than 100 million unique visitors (without device deduplication). On March 23, 2023, the release of the ChatGPT plugin made more people think that ChatGPT may develop into a new super traffic portal (this is a very worthy issue for separate discussion, because the topic of this article will not be discussed today).

Tech giants. Microsoft, which has the deepest cooperation with OpenAI, has eliminated and integrated its internal AI department on the one hand, and embraced the GPT series of products on the other hand. Google multi-pronged, the original LaMDA team released the dialogue product Bard, the PaLM team released the PaLM API product, and invested in OpenAI, the main competition for Anthropic $300 million. Meta released the LLaMA model and open source, LLaMA+LoRA model is the most active ecosystem in the current open source LLM (Alpaca-13B and Vicuna-13B). Amazon and the open source community HuggingFace are more active in cooperation based on the LLM ecosystem. Our analysis believes that OpenAI's current competition with Meta is more at the technical level, and there is no impact on Meta's main business in the short term. However, the OpenAI+Microsoft combination has a potentially huge impact on Google and Amazon at the business level, which will be analyzed later.

Entrepreneurial ecology. On the one hand, the rapid penetration of ChatGPT in the C-end has stimulated a new round of AI entrepreneurship enthusiasm, and the massive C-side application cases have also inspired and accelerated the development of the entrepreneurial ecosystem. On the other hand, the uncertainty of LLM capability boundary and OpenAI product boundary makes applications and traditional applications built on the GPT model base worry about their product value being annihilated - we will further discuss this issue later after dismantling the OpenAI behavior logic and LLM industry chain ecology.

OpenAI: A series of chain reactions in the industry and ecology obviously exceeded OpenAI's expectations, and from OpenAI's subsequent actions, we speculate that there are three core impacts:

(1) OpenAI may have the ambition to do C-side

The commercialization potential provided by C-end traffic and the ability to collect more non-public data show high value for OpenAI's model training, basic research, and ecological development. The ChatGPT plugin, released this month, is a typical C-side layout action.

(2) OpenAI can reduce its dependence on huge capital investments through moderate commercialization

One of the visions of OpenAI is to make AGI benefit human society, but the huge investment required by AGI research and development has led OpenAI to seek capital investment from technology giants - the contradictions and conflicts here have led to criticism of OpenAI in the academic community, and directly or indirectly led to its large number of brain drain. Moderate commercialization has the opportunity for OpenAI to reduce or even get rid of its dependence on tech giants. We speculate that OpenAI's commercialization strategy will continue to strike a balance between inclusive and sustainable independent development. The balance point judgment here is crucial for the subsequent analysis of the industrial chain.

(3) Strengthen research investment and action in alignment and safety

The rapid penetration of LLM capabilities on the C and B sides has also led to the rapid expansion of the risk and impact of LLM capabilities being used maliciously, and the urgency of security issues has increased.

At the same time, the current serious Hallucination problem of LLM (a serious nonsense that is difficult to distinguish between true and false) hinders the in-depth application of the B side and also has a negative impact on the C-side content environment. Interaction with humans can reduce Hallucination, but not necessarily the most essential solution. Through Alignment research, making models accurately and faithfully respond to human demands will become the focus of OpenAI's next research.

Second, what is OpenAI's technical path selection (LLM for GPT architecture) based on?

First of all, after a large number of interviews, courses, papers and interview learning, we boldly speculate that OpenAI believes that the essence of the AGI basic model is to achieve maximum lossless compression of the largest effective data set.

2.1 OpenAI believes that AGI has the ability to generalize intelligence ≈

Generalization is a technical term that refers to a model's ability to correctly adapt to new, previously unseen data, derived from the distribution that learns from and creates on the training data.

Generalization refers to your model's ability to adapt properly to new, previously unseen data, drawn from the same distribution as the one used to create the model。

More generalization, generalization is the process of pushing from the known to the unknown. The basis for all deep learning model advancements is to improve the generalization ability of the model.

OpenAI believes that the essence of AGI intelligence lies in the pursuit of stronger generalization capabilities. The stronger the generalization ability, the higher the level of intelligence.

It is important to note that generalization ability does not equal generalization efficiency, which will be further developed in the next chapter. This is also the largest non-consensus between OpenAI and the industry at the beginning of its establishment.

2.2 Model generalization capability ≈ Model generalization efficiency × Training data scale

We believe that if the model is more efficient in generalization and the larger the size of the training data, the more intelligent the model is.

This conclusion can be derived from strict mathematics, but because the author's mathematical ability limits the understanding of firstness, after consulting professionals, the following abstract understanding formula is given:

Model intelligence (generalization ability) ≈ Model generalization efficiency × Training data scale

The mathematical and abstract arguments here suggest reading the relevant articles of Uncle Guan, Zhou Xinyu (https://zhuanlan.zhihu.com/p/619511222) and Xinran (https://zhuanlan.zhihu.com/p/616903436), which will not be expanded here.

2.2.1 Model generalization efficiency ≈ Model compression efficiency

The minimum length of description of a valid method for accomplishing a task represents the maximum understanding of that task. Therefore, the compression efficiency of a model can be approximated and quantified as the generalization efficiency of the model.

The task of AGI can be understood as maximizing the generalization of the real world represented by the training dataset by compressing the training dataset.

The minimum description length of an AGI model can be quantified as the compression efficiency of the model.

Under this understanding, the greater the number of GPT model parameters, the higher the intelligence level of the model. (Large number of model parameters→ high model compression efficiency→ high model generalization efficiency→ high level of model intelligence)

(1) GPT model is lossless compression of training data (mathematical inference)

(2) The larger the number of GPT model parameters, the higher the compression efficiency (mathematical inference)

(3) The GPT model is a lossless text compressor for SOTA (state-of-the-art, best/most advanced) (status quo)

2.2.2 The scale and diversification of training data are essential to improve the generalization ability of the model

As mentioned earlier, the task of AGI is to maximize the generalization of the training dataset. So why is the generalization ability of the model not equal to the generalization efficiency?

Because the generalization efficiency of the model only pursues "maximum generalization" and ignores the "training dataset". Traditional academia only believes that algorithm innovation is worth pursuing, and the scale of training datasets is only an engineering problem and has no research value. Therefore, the long-term goal pursued by mainstream academia is: an efficient way for models to obtain intelligence, rather than the intelligent capabilities of models.

OpenAI, on the other hand, chooses to pursue both a larger training dataset (the Scale of the training dataset) and a larger degree of generalization (the Scale of the model parameters) after deeply understanding the nature of generalization capabilities.

Hoping for the fastest Scale training dataset, text data has naturally become the first choice for OpenAI. Therefore, in the past five years, OpenAI has first done to stretch the limits of the size of the training data and the number of model parameters on the single modal text that is easiest to scale. LLM is just the starting point, and when text data is stretched to the limit, we have reason to believe that OpenAI will further expand the training data modality, including observable data (special text, images, videos, etc.) and unobservable data (data interacting with the virtual and physical worlds).

2.3 Summary of OpenAI's technical path selection logic

The previous analysis of the essence of OpenAI technology concept is very abstract, and we try to sort out and summarize the logic and historical behavior of technical path selection as a whole, as shown below:

To sum up, OpenAI believes that the essence of the AGI basic model is to achieve the maximum lossless compression of the largest effective data set.

Under this technical understanding, the LLM route of GPT architecture is the optimal technical path selection in the past 5 years, and the scale of model parameters and training data is an inevitable behavior.

2.4 OpenAI's technical path selection controversy

The original text of the report raises some enlightening questions about the choice of OpenAI technology path, due to the space problem, we will only ask questions here, more detailed information welcome to the original report for further reading and discussion.

Does AGI's intelligence equal generalization ability? That is, the ability to understand and generalize general tasks (represented by OpenAI), and the research ability of complex and difficult scientific tasks (represented by DeepMind), who can better represent the intelligence level of AGI.

What did LLM learn about Book AI? Some scholars believe that the knowledge and understanding learned in the language and the physical world cannot form an effective mapping, so the intelligence of LLM is shallow intelligence.

One Model Rules ALL? Although the generalization understanding ability of the large model of the GPT route is high and it is constantly iterative, but the problem of Hallucination caused by this route will persist, so must there be space for different vertical models in scenarios with high reliability requirements with a fault tolerance rate close to 0 (such as API calls in vertical complex scenarios, etc.)?

Are instruction fine-tuning and RLHF the right path to solve the alignment problem? On the one hand, instruction fine-tuning and RLHF are of limited help to the Alignment of LLM bases with increasingly strong foundations. Instruction fine-tuning, on the other hand, sacrifices reasoning performance in exchange for Alignment (Alignment Tax).

Is the GPT route without Memory? The current GPT family model performs well in processing some single tasks, and the pre-interaction information with the GPT model cannot be automatically written to the token of the next interaction. AutoGPT and others can only violently replay history, resulting in too many token costs. This makes GPT models unfriendly to a large number of complex systems engineering and continuous production behaviors.

Third, the nature of OpenAI-based technology selection and understanding the past and future of OpenAI

3.1 Fit: Historical behavior interpretation of OpenAI

Based on the foregoing, OpenAI's vision is to pursue inclusive AGI. The technical concept of OpenAI is: the essence of AGI intelligence is to pursue generalization, so the essence of AGI basic model is to achieve maximum lossless compression of the largest effective data set.

Based on this, we try to explain the historical behavior of OpenAI. In the process, we felt the scarcity of the combination Sam Altman (business) + Ilya Sutskever (algorithm) + Greg Brockman (engineering). OpenAI's results today are the result of close collaboration between algorithms, engineering, data, products, and GTM teams.

3.1.1 Technology

(1) Why does OpenAI still stick to the GPT route when Bert performs well in downstream understanding tasks (much higher than GPT-1 and GPT-2)?

As analyzed earlier, OpenAI pursues the generalization ability of the model. All supervised learning is a subset of unsupervised language models. Then choosing supervised learning for the short-term improvement of specific tasks is undoubtedly not an essential approach.

The high performance of early BERT in understanding class subtasks is because through supervised learning of specific data sets, the understanding of the task can be obtained more quickly. When the parameters of unsupervised models such as GPT are large enough and the corpus is rich enough, other supervised learning tasks can be completed through unsupervised language learning.

Therefore, OpenAI's adherence to the GPT route is an inevitable and simple choice.

(2) Why has Scale continued in the past, and will it continue to scale significantly in the future?

The Scale of GPT-1 to GPT-3 is a double scale of the amount of training data and the amount of model parameters on the text modal. thereinto

The amount of training data Scale is an inevitable choice to improve the generalization ability of AGI. At present, the easiest to scale is text data, but when the understanding ability of text modes is gradually full, OpenAI will inevitably start a relatively difficult data scaling method, that is, increase data modes and further increase. It can be seen that GPT-3.5 adds special text data (code) for training, and GPT-4 introduces data modalities such as images.

The model parameter scale is a by-product of the combination of the current optimal algorithm architecture Transformer and the optimal algorithm path GPT to improve the generalization ability of AGI. If OpenAI finds a more efficient and better algorithm in the future, the number of parameters of the same intelligent AGI base model may not be greater.

(3) Why is the building of engineering capacity a high priority?

In the context of the non-consensus with traditional academia, OpenAI recognized the importance of model scaling early on. Therefore, an algorithm team with engineering capabilities (Pretraining group and Alignment group) and an engineering team with algorithm understanding (Scaling group) have been established. And build an organizational structure that closely cooperates with algorithms and engineering. The engineering team provides a highly scalable infrastructure for the algorithm team, and the algorithm team designs algorithm training in an industrial way.

Some facts that give a glimpse of its engineering capabilities (industrialized model production capacity):

OpenAI already has the ability to industrially train and accurately predict the performance of hyperscale models. In 2021~2022, OpenAI and Azure cooperated to reconstruct OpenAI's infrastructure. The GPT-3 training was the first use of this infrastructure, and some bugs were found and fixed along the way. After the infrastructure bug was fixed, GPT-4 training was stable and completed in one go. And using this foundation, the OpenAI team used only 1/10,000 of the computing power for small model experiments in the early stage of GPT-4 training, and accurately predicted the final loss of the GPT-4 large model through the loss of small model experiments.

Open source OpenAI Triton: No CUDA experience can automatically complete various optimizations of GPU programming (memory merging, shared memory management, SM internal adjustment), and can write efficient GPU code in Python.

We believe that the engineering capabilities of OpenAI and most current LLM teams can be compared to industrialized model factories and model workshops. The large gap in engineering capabilities makes it even more difficult for most LLM companies to catch up with SOTA models.

(4) Why cut projects like Robotics all in LLM?

Simply put, it is because the development of Robotics technology is temporarily lagging behind AI, making it difficult for RL to scale.

In fact, the RL used in the Robotics project is also an algorithm that conforms to the aesthetics of OpenAI technology. And the interaction between RL and the world (virtual and physical) and the high-dimensional representations that can be learned in it are something OpenAI is very eager to explore. But at that time, limited by the early development of Robotics technology itself, robots could not scale to limit the scale of RL algorithms and data. So OpenAI chose to cut projects like Robotics all in LLM.

But we have reason to judge that this is a phased choice. When the time comes, it is inevitable that large models will be combined with Robotics or other terminals that can interact with the world and learn higher AGI intelligence in interacting with the world. In fact, OpenAI made a Series A investment of about $20 million in humanoid robotics company 1X in March 2023.

(5) Why is there a Hallucination problem?

The AGI intelligence pursued by OpenAI is the maximum model generalization ability. The purpose of LLM is not to try to "fit" the training set, but to find the essential law (probability distribution) represented by the training set without loss, so as to understand the data outside the training set. As a result, LLM generates content outside of the training set, causing Hallucination problems.

It can be expected that the Hallucination problem will gradually alleviate as the capabilities of the AGI base model gradually improve. However, for now, OpenAI will use patching schemes such as pre- and post-processing models to temporarily mitigate the Hallucination problem so that LLM has higher availability and less harm.

At the same time, it should be noted that there are fallacies and value conflicts in the text training corpus of LLM, and how to construct a "value judgment" for LLM is also a problem worthy of in-depth study.

3.1.2 Products

We believe that all of OpenAI's behavior in the direction of the product can be explained by its two goals in product work and its two business flywheels that it derives. There are two core objectives:

Design product forms that can help OpenAI collect more effective data in pursuit of higher AGI intelligence.

Design products that are based on the current AGI model capabilities and are more inclusive of the masses.

According to the goal, two business flywheels are derived:

(1) More inclusive AGI products and "data-application" flywheel

The goal of such products is to build products that can be friendly and effectively used by the C-end public and B-end companies around the capabilities of the AGI model, so as to empower AGI and benefit human society. Thereinto:

ChatGPT

API for the GPT-1-4 series

Codex API

and so on are such products. C-end users can improve the efficiency of various tasks in daily life and solve various problems through such products; B-end users can obtain the capabilities of AGI models through such products, help them build product solutions in vertical scenarios, and iterate their data barriers and product advantages through the "data-application" flywheel.

20,000-word review: OpenAI's technical underlying logic

(2) Collect more effective data to feed back the basic model and the "data-model" flywheel

The goal of such products is to build specific product scenarios based on OpenAI's model capabilities and technical reserves, attract users with specific abilities or interests, accumulate specific effective data through user behavior feedback, and feed back the AGI basic model. This type of product has different product forms and markets due to the data required and the user groups that can contribute data.

DALL· E and Clip: Graph-text data

ChatGPT plugin: Data for users to build complex task processing scenarios through applications and APIs

OpenAI Codex Playground: Build different application data in code

OpenAI Universe: Various reinforcement learning tasks and training data

Rubik's Cube: Data for models to interact with the physical world

20,000-word review: OpenAI's technical underlying logic

(3) Migration and game between two data flywheels

A key and interesting fact: these two goals and their derived business flywheels actually have some subtle structural contradictions, which are the underlying reasons behind some confusing phenomena and behaviors. OpenAI's own products and its upper-layer ecological application products will migrate and game between the two data flywheels.

20,000-word review: OpenAI's technical underlying logic

Migration 1: The product goal of OpenAI's own products may be migrated from collecting data to feeding back to large models to building an ecosystem that benefits the public

Typical cases include API products for GPT family models. GPT-1 and GPT-2 are the early products of OpenAI on the LLM model, which requires more high-quality text data, so it only opens the API to a limited number of high-quality users, and provides it to users at a free and very low floating price. By the time GPT-3 was released, OpenAI was gradually full of LLM capabilities, and the general text data improved the ability of the model itself with a reduced ROI, so OpenAI priced the product standard and opened it to more users at this time. Today, this series of products is standard products that do not require waitlist. Migration 2: The improvement of the capabilities of OpenAI's basic model will lead to users of some layers of ecological application products migrating to OpenAI's own products, such as Jasper and ChatGPT. Due to the alignment problem of the GPT series model and the ease of use of the API itself for C-end users, it is difficult for ordinary users to use the language understanding and generation capabilities of LLM before the release of ChatGPT. Therefore, based on the understanding and experience of GPT model capabilities, Jasper has created a marketing content generation platform that is superior to all competitors on the market, and has rapidly increased to 90 million US dollars ARR in more than a year. However, the launch of ChatGPT quickly reduced Jasper's advantage, and products that were too thin on top of model capabilities made the market question the moat of his business. Although the company's revenue is still growing at a high rate, Jasper has also had to transform from a marketing content generation platform to a marketing link SaaS to obtain a more secure niche. This kind of migration is not subjectively designed by OpenAI, but it is inevitable that the ability of the basic model will improve. Game 1: Scenarios that help improve AGI's general capabilities and user behavior data compete for typical cases, such as ChatGPT plugin and Langchain. Langchain is an open source project based on the GPT ecosystem, which provides developers with a solution to build applications by combining private data and real-time search results with LLM capabilities, which is an important component of the GPT ecosystem. Langchain is one of the most active players in the current ecosystem, and the company received its first round of investment of $10 million from Benchmark Capital in March 2023. However, just a week after Langchain announced the funding, OpenAI launched the ChatGPT Plugins plugin set. Plugins can: (1) call Internet data to solve the problem of effectiveness;    (2) Access to third-party private data;    (3) Operate external applications.    The abundance of useful capability components directly squeezes Langchain's living space. However, contrary to the mainstream view in the market, "Plugins were launched by OpenAI for commercial purposes to build an app store in the LLM era". We believe that the essence of OpenAI's plugins is to obtain "behavioral data on how users use applications and APIs to solve specific tasks."

It is worth noting that the scenario of "correctly understanding user intent, accurately selecting and using the right tools to reliably complete the task" is currently highly competitive. In addition to OpenAI, Adept AI, Inflection AI, and Meta's Toolformer models are all competing for niche in this field. Further, if LLM is to truly become the next generation of human-machine interfaces in the future, accuracy and reliability are a must.

Game 2: Data in deep vertical scenarios competes with users

Typical cases include BloomBergGPT. On March 30, 2023, BloomBerg released its self-developed vertical GPT model BloombergGPT, with model parameters of 50B and training Token 700B, of which half each of private financial data and public data. It performs much better on private finance tasks than current GPT models.

20,000-word review: OpenAI's technical underlying logic

In other words, if the task complexity of the vertical field is deep enough, the data is unique enough and the amount of data is large enough, it may be a reasonable game at least in the short term to develop a large vertical model without embracing the general LLM ecology.

Overall, the product migration and game between these two data flywheels will continue.

3.1.3 GTM (Go-To-Market) and commercialization

Overall, we believe that OpenAI's GTM and commercialization strategy is a trade-off between benefiting the public and maintaining its independence, and the company will continue to swing in trade-offs.

(1) From OpenAI to OpenAI LP: the transition from non-profit to limited profit

At the beginning of its establishment, OpenAI only explored the vision of inclusive AGI, did not think clearly about the technical implementation path, and greatly underestimated the capital investment required. During the 2 years that OpenAI operated as a non-profit organization, the total financing amount was estimated to be only about 10 million ~ 30 million US dollars. 2018~2019 is the most difficult period for OpenAI funding. After confirming the LLM technology path for the GPT architecture in 2017, the training of GPT-1 and GPT-2 burned almost all of the money. Not only can they no longer afford the sky-high cost of training next-generation models, but they can't recruit the best talent in the industry (in fact, research talent has already been poached by Google).

In this context, the non-profit OpenAI was transformed into the limited profit OpenAI LP in March 2019. After the equity restructuring, OpenAI has accepted about 13 billion US dollars of investment from Microsoft. Since then, OpenAI can not only offer high salaries to attract top talents in the industry, bear high AI training costs, build super AI infrastructure, but also accelerate the speed of algorithm exploration and product development.

However, the high dependence on technology giants has led to questions about OpenAI's inclusive vision and loss of independence both internally and externally, and even led to the loss of some core employees.

We believe that AGI is a capital-intensive industry, and OpenAI must find an operating model that sustainably explores AGI. Obtaining external financial support and commercializing one's own products are the two current options. The commercialization of its own products is a more controllable model for OpenAI and can maintain its independence. Therefore, we judge that OpenAI will further commercialize it, but will not aim to maximize revenue or profit. The most fundamental goal of OpenAI is to explore the limits of AGI intelligence.

The commercialization strategy of limited profit will make OpenAI GTM and commercialization decisions different from traditional technology giants, and then affect the industry ecology.

(2) The honeymoon period of Microsoft's cooperation with OpenAI

Since Microsoft first invested in OpenAI in 2019, the two sides have embarked on a textbook-level strategic cooperation.

What OpenAI gets:

Funding: The two rounds of investment totaled about US$3 billion in 2019 and 2021, with an additional US$10 billion reported in January 2023;

Engineering Infra: Azure has dedicated team support for the training and inference of OpenAI models. More importantly, in 2021-2022, the Infra team led by Azure and Greg refactored the entire infrastructure of OpenAI, and obtained model training Infra with high stability and scalability (predictable scale is important for OpenAI);

Diverse and high-quality special data: special text data such as GitHub and Bing;

C-end mental occupation and rich general application scenarios: GitHub (73 million developer users), Office suite (145 million daily active users), Xbox (Xbox Live 90 million monthly active users) respectively provide high-quality general application scenarios such as developers, general productivity and marketing tools, and games for OpenAI to test the waters LLM application, forming a unique data flywheel with LLM;

B-side customer resources and vertical scenarios: Azure owns 95% of the Fortune 500 companies, and more than 250,000 companies use Microsoft Dynamics 365 and Microsoft Power Platform;

What Microsoft got:

The first thing to note is that Microsoft is the most diversified technology giant, with the largest business Azure accounting for 31% of revenue and the second largest business, Office, accounting for 24%. The single main business revenue of Silicon Valley giants such as Google, Amazon, Meta, and Apple accounts for more than 50%.

Azure: As a cloud service provider of OpenAI, Azure is the exclusive platform for OpenAI products in public cloud scenarios. If we consider that the AI content of human digital activities will increase significantly in the future, and OpenAI's products will account for the lion's share. Then Azure is likely to get the lion's share of the cloud computing incremental inference market. At the same time, if the large-scale training infrastructure jointly developed by Azure and OpenAI is open, it can also obtain most of the cloud computing training market. In the long run, this will cause challenges and impacts for AWS.

Office suite (defense): All items in the Office suite are being challenged by new types of players (Notion, Airtable, etc.), and the combination of OpenAI Copilot and Office suite not only upgrades the items, but also amplifies the advantages of the linkage between Office items.

Bing Search (Defense): Many investors believe Bing Search will disrupt Google. We have a different point of view here. Bing's patch-like cooperation with ChatGPT does not change the essence of the search experience, but it will steal some of Google's search traffic. What really has the potential to disrupt Google search is a new search product native to LLM like Perplexity. Adding LLM to search will actually increase the cost of a single query (according to various estimates, if it is not optimized, it may be 2~3 times that of traditional search), thereby reducing the profit margin of traditional search business. Google's dependence on the search business is much higher than Microsoft's, and it is more strategically uncomfortable. But for both Bing and Google Search, the originally high search advertising revenue has prevented them from truly building a new information and knowledge acquisition engine like Perplexity in the LLM context.

This stage is the honeymoon period for OpenAI and Microsoft to work together. However, it is worth noting that the value distribution of model vendors and cloud service vendors in the industry chain will still produce a game in the future, and it is unknown how long the honeymoon period between Microsoft and OpenAI can last.

(3) The C-end market that ChatGPT has windfallen, expanding from the basic model layer to the application layer

ChatGPT allowed OpenAI to unexpectedly gain the C-end market, with a total of more than 1 billion visits and more than 100 million unique visitors in the past 4 months. From the perspective of ecological prosperity, the involvement of the basic layer in the application layer is a taboo in any industry chain, which will greatly undermine the trust of the upper players of the ecological niche in the basic layer, but OpenAI has shown great "unscrupulousness" on this issue, and this behavior can be explained from the perspective of AGI vision and better data scale:

Lower cost data acquisition: Through the traffic and mental occupation of the C-side, ChatGPT and OpenAI have become synonymous with the current LLM and industry standard. As a new technology and product, mind occupation allows OpenAI to continuously obtain user data at a lower GTM cost.

Acquisition of valid data for richer scenarios: For example, Plugin, we speculate that general conversation data has little marginal value for GPT-4, but the data collected by Plugin to complete user tasks through the use of tools is very valuable. This may be the key to becoming a truly new generation of human-computer interfaces in the future (a highly competitive field mentioned earlier).

Optimize model capabilities with more long-tail conversations and use cases: On the one hand, you can speed up the research of alignment and security, and on the other hand, you can also explore more potential scenarios.

Maximize the original intention of inclusive AGI: through commercialization to obtain huge hematopoietic potential, there is an opportunity for OpenAI to reduce dependence on giants in the future and healthy sustainable development

(4) Build an ecosystem and complement the technology partners needed by AGI through investment

In 2021, OpenAI announced the launch of a $100 million startup fund called the OpenAI Startup Fund. The main investment targets are as follows:

Application-tier companies

Startups can use new capabilities before OpenAI publicly releases new tools, which gives them an edge over their competitors. OpenAI can deeply obtain data or early feedback for various scenarios.

In the future, the LLM ecosystem will not only have OpenAI as a model layer player, but will have multiple model vendors and a large number of vertical applications. Strong partnerships through investment can make OpenAI and its partners' flywheels bigger and faster.

20,000-word review: OpenAI's technical underlying logic

Cutting-edge technology companies such as chips and robots

OpenAI's exploration of AGI is expected to lead the industry for a long time, which will lead OpenAI to explore more advanced products and tools to meet its own research needs. For example, the chip of the new architecture serves model training on a larger scale and more modality, and the more advanced and low-cost robot gives OpenAI the opportunity to do RL Scale that interacts with the physical world in the future.

3.2 Prediction: Inference of future behavior of OpenAI

3.2.1 Technology

As analyzed above, under the technical understanding and aesthetics of OpenAI, the scale of data and parameter quantities is an inevitable choice, and Generative Model and Transformer are the best choices at present. Based on this, we boldly make some predictions about OpenAI's next technical actions:

(1) Further increase the effective data that LLM has not seen and embrace multimodality

20,000-word review: OpenAI's technical underlying logic

General text data: The marginal benefit becomes lower, and more other types of text data are introduced, such as code, other computable languages

Modal data such as images and videos: The training efficiency of images and video data under the Transformer architecture is very low, and the training cost of Scale will increase by square or more square steps

Interaction data with the bit world: As mentioned earlier, OpenAI has always wanted to do RL (reinforcement learning), but in the past Robotics' RL was difficult to scale, but there are a large number of user scenarios to try in the bit world

Interaction data with the physical world: Scaling RL through interaction with the physical world such as robots, where progress largely depends on the speed of development of robotics

(2) RL's Scale

Similar to the Genrative Model, RL is also an algorithm that conforms to the OpenAI aesthetic. Although RL contributes less to the GPT series models that have been released so far, GPT-3.5 has already shown surprising results by initially combining Instruction Tuning and RL amplification. In the future, OpenAI is expected to use more scalable RL (RLHF, RLAIF) to assist in basic model training. And now that there is more C-end traffic in hand, it is not ruled out that OpenAI will turn some products into RL agents to assist training in the future (such as using ChatGPT plugins to do "developer behavior related and tool use" RL training).

(3) Robotics and Embodied AGI

We believe that at this point in time, OpenAI is more concerned with Robotics for AGI than AGI for Robotics. Increase the understanding of the physical world and cognitive reasoning capabilities of AGI's foundational models through the ability of robotics to interact with the environment and perceive sensory information.

(4) Seek new algorithm architecture that can scale more modal data more efficiently

Transformer is still the most preferred architecture for the current OpenAI algorithm. It is efficient for scaling text modalities, but inefficient for modalities such as image and video. Therefore, after GPT-4, OpenAI's need to seek more efficient algorithm architecture has become more urgent. We have reason to believe that OpenAI is doing model training experiments with Transformer variants and even newer algorithm architectures internally.

(5) Deep understanding of the reasoning and emergence ability of the model

The academic understanding of the emergence and reasoning capabilities of LLM is still in its early stages. We believe that the accuracy and reasoning ability of the next word prediction must have mathematical connections in high-dimensional space, but they are complex and difficult to study. The best innovations in technology come from understanding what is known. In-depth research in this area would be valuable.

(6) Increase the reliability, controllability and safety of the model

Reliability: weakening of Hallucination problems;

Controllability: Accurately understand and execute tasks. Today ChatGPT introduces Wolfram, which provides a transition solution in the form of third-party components. In the future, efforts will be made to increase the controllability of the model itself;

Security: Do not do evil and not be used by evil people.

In these three points, how to do alignment well is important. RLHF (Reinforcement Learning from Human Feedback) is just the first step.

3.2.2 Products

We believe that at this stage, OpenAI's product strategy will continue to have "further improving AGI model capabilities" as the primary goal, and "making AGI products more widely and rationally used" as the secondary goal.

(1) In order to further improve the capabilities of AGI models, OpenAI will design more products that can obtain effective data, conduct model experiments, and interact with users to iterate

The key here is valid data. As mentioned earlier, Ilya's past technical aesthetic favored "basic algorithm scaling". Similarly, on the data side, we believe that OpenAI will give preference to data that is easy to scale and easy to train. In the future, OpenAI may combine products with the model training process to make user behavior part of model training.

(2) In order to make AGI products more widely and rationally used, OpenAI will be more careful to control the rhythm of model capabilities released to the public

AGI is not just about improving social productivity, but about increasing the speed at which social productivity progresses. Sam has highlighted a series of social issues in several articles and interviews such as AI safety and the widening gap between rich and poor in the future brought about by AI. The GPT-3.5 model has in fact begun to affect the ecology of many types of work in human society. GPT-4 is currently released as a downgrade version. It can be predicted that OpenAI may predict the potential impact of model capabilities with more social research institutions in the future, and slow down the pace of model capability release, giving related industries a buffer period.

3.2.3 GTM and commercialization

(1) In terms of GTM strategy, OpenAI will continue to capture the attention of the C side, and at the same time develop more diversified ecological integration with the B side

C-side traffic also provides OpenAI with various effective channels for collecting data and monetizing hematopoietic capabilities, and it is predicted that OpenAI will continue to seek greater C-end traffic, longer user stay, and deeper user behavior. Attention and mind occupation are especially important for C-end products. Anthropic's conversation product Claude is comparable to ChatGPT, but its awareness and traffic on the C side are much lower than that of ChatGPT and Bard.

The B side will continue to accelerate the rotation of the "data-model" flywheel through the all-round cooperation with Microsoft's ecosystem, the use of incentives for startups, and investment.

(2) Limited commercialization

Based on the previous analysis of OpenA's vision and limited profit structure, we believe that OpenAI's product pricing will be based on the framework of inclusiveness and organizational sustainability. The specific manifestations are:

Products with strong feedback model goals are free;

C-side generic product sticker cost pricing (may even be free in the future);

Limited profitability of B-end products;

Overall, OpenAI's limited-margin architecture makes its GTM and commercialization different from commercialized companies. But as a de facto industry chain master and industry standard, its GTM and commercialization strategy will have a great impact on the industry.

4. Analysis of LLM industry chain

4.1 LLM ecology from a macro perspective

4.1.1 Current industry incremental revenue distribution estimates: application layer 30%~40%, model layer 0%~10%, computing infrastructure services 50-70%

20,000-word review: OpenAI's technical underlying logic

(1) The application layer takes away 30%~40% of the value

According to A16Z's US LLM entrepreneurship survey, the gross profit of pure application manufacturers is about 60%~80%, and 20%~40% of the revenue is used for inference and model fine-tuning;

Application vendors' current users and revenue are growing rapidly, and the current multi-vendor ARR has reached $100 million;

Although the number of users and revenue are growing rapidly, many application vendors are facing key problems such as low user retention rate, intensified competition and shallow moats.

(2) The model layer takes away 0%~10% of the value

Based on GPT-3.5's model parameter volume and price calculations, it is speculated that OpenAI almost prices the API at cost or very low gross profit. And according to interviews with overseas competing LLM companies, competing similar capability models are doing inference cost optimization to match the price of GPT-3.5 (not yet reached);

In the future, if the model capabilities of pure model manufacturers are homogenized with OpenAI's standard products, the reasoning price will inevitably need to match the commercialization strategy of OpenAI, which has limited profits, for a long time. The training cost of LLM is extremely high, and pure model manufacturers are facing great pressure for commercialization.

(3) The computing infrastructure service layer (computing hardware + cloud computing) takes away 50%~-70% of the value

Get 20%~40% of the value in reasoning;

The training cost is extremely high: based on the current A100 price, the training cost of the 100 billion model (GPT-3.5) is about 20 million RMB; After LLM enters the multimodal stage, it is expected that the growth of SOTA's model training computing will exceed the decline rate of unit computing cost, and more model-layer players will enter the market in the short term, and it is expected that the training market of LLM will grow rapidly in 1~3 years.

The entry of more LLM players on the training side and the multimodal model are further scaled, and the LLM on the inference side is entering the starting point of explosive growth, and the cloud computing and computing hardware market will accelerate its growth. The industry structure of cloud computing vendors may change greatly.

(4) Due to the early stage of development of the current LLM ecosystem, the ecological niche of developer tools is not yet stable, so this article will not discuss it.

4.1.2 In the future, the application layer will grow rapidly and gross profit may improve, and the competition at the model layer will intensify, and computing infrastructure vendors will continue to grow rapidly

It should be noted that LLM is still in the large-scale research and development period at this stage, and many new LLM players have just entered. Moreover, the potential of LLM in the application layer has not yet been tapped, large-scale penetration has not yet begun, and the training cost of LLM has not been amortized. As a result, cloud computing and hardware vendors became the biggest players in this period. We believe that the value chain distribution at this time is the early stage of the LLM industry. The distribution of the value chain after the real formation of the industry ecology will be very different from the current stage.

(1) Application layer: As the potential of LLM in various application scenarios is tapped, the application layer will accelerate growth. At the same time, due to the price war that may lead to increased competition at the model layer, it is expected that the gross profit of the application layer will improve. However, homogeneous applications will also lead to price wars, which requires application-layer companies to build barriers beyond the basic model capabilities, and we believe that application-layer companies that can differentiate products or establish network effects will really gain the maximum value of the industrial chain.

(2) Model layer: OpenAI's pricing strategy will become the pricing standard for pure model APIs. OpenAI is expected to adhere to the limited profit commercialization strategy of inclusive public (such as ChatGPT's 90% price reduction in March 2023), and LLM companies that do not have significant technical advantages are expected to have a hard time selling model APIs. Only companies that truly master the global SOTA model and cost control capabilities have the power to price models.

3) Computing infrastructure service layer (computing hardware + cloud computing): training and inference have grown together, and the whole industry has obtained a new growth curve. New growth may also be an opportunity for industry reshuffle, and how to work with LLM to gain the initiative is critical for cloud vendors. At the same time, pay attention to the possibility of some application-layer companies or hardware-layer companies doing new clouds.

After taking stock of the macro pattern of the current LLM ecosystem, we zoom in on the various parts of the discussion and openly propose some topics worth discussing. But now that the industry is in a stage of drastic change, we give a point of view based on the current understanding, more to stimulate discussion.

4.2 Will LLM enter a price war, and will the price converge at the model layer to the price of cloud computing?

Before discussing this issue, two questions need to be asked:

(1) What is the value point of LLM? Is LLM the ability to capture, understand, and reason about the information it provides, or is it a revolution in new human-computer interfaces?

The development goal of the former model is to further improve complex reasoning and advanced intelligence capabilities. The immediate imperative of the latter model is to increase the understanding of human tasks and enhance the reliability and accuracy of the application of the tools. The current model development focus of the two is subtly bifurcated.

(2) What is the self-positioning of new LLM companies? Is it AGI company exploring the limits of AI intelligence, or a regional version of OpenAI mirror company, or a commercial LLM company?

We believe that at this stage, replicating GPT-3.5 and ChatGPT is essentially an engineering problem, and replicating the OpenAI SOTA model after GPT-4 requires algorithm research capabilities. Exploring AGI requires strong technical insight, independent technical judgment (OpenAI is not necessarily the right answer), true AGI belief and long-term patience.

It is undeniable that GPT-3.5 and ChatGPT already have full commercialization potential.

However, we believe that from the perspective of model capability, GPT-3.5 and ChatGPT level model capabilities will be leveled within various LLM teams in 1~2 years. If a company's model capabilities stay at this level, price wars for model APIs are inevitable and will eventually tend to cost. Only manufacturers who can truly and continuously iterate out the SOTA model exclusively can grasp the pricing power.

On the other hand, from the perspective of product form, the API itself will not become a platform, but only a channel. It is possible to extract more value by building a platform product with aggregation capabilities based on the capabilities of the AGI model and occupying a favorable ecological niche.

To be clear, in the long run, we do not believe that the value of this wave of AI will be digested by infrastructure vendors. Different from the first wave of CV (Computer Vision) in China after 2010, the downstream high-value scenarios of LLM today are very divergent, and will not converge to 1~3 (face recognition in security, identity authentication, etc.) standard scenarios. The LLM model layer will receive more premiums.

4.3 Will LLM companies with different paths diverge or converge?

As mentioned in the previous question, LLM companies with different self-positioning and goals will diverge in the next stage in the short term. And long-term work takes time to achieve phased results (GPT route has been going for 5 years).

We believe that the development direction of LLM model is likely to be a "convergence-divergence-convergence" process. There is a lot of convergence in short-term work, and then it will diverge in the vertical field, and it will converge when long-term work has phased results.

4.4 LLM: Open Source VS Closed Source?

Looking at the Venzograph space, Stable Diffusion and MidJourney are still in a tug-of-war. In the LLM field, LLaMA+LoRA projects are everywhere, and everyone can train a large model. How will the two ecosystems evolve?

We offer an analytical perspective: the nature of open source is a way of product development and GTM. The activity of a community cannot be equated with commercial value. For LLM R&D, does open source provide value that closed source does not have? Whatever GTM's path, the last thing the customer pays is the product value. Is the capability or service experience of open source closed-source products unsatisfactory by closed-source products?

4.5 How big will the increment of the compute infrastructure layer be? Are there opportunities for new clouds?

On April 5, 2023, ChatGPT Plus stopped new paid signups, allegedly because Microsoft ran out of computing resources. Whether the news is true or not, LLM has and will continue to increase the demand for computing infrastructure is evident and may even lead to a reshuffle in the cloud computing industry. How much AI will increment for cloud computing depends on how much human activity in the bit world will be infiltrated by AI. This requires prediction of model capabilities and analysis of each segmented scenario, which will not be covered in detail today.

Among the four inference platforms released by NVIDIA at the GTC conference in March 2023, H100-NVL (2 cards, 94GB*2HBM3) - why not 80G (video memory for single-card platforms)*2? Because you can't put the parameter amount of GPT-3 176B. At the same time, NVIDIA released DGX Cloud products, which allow enterprises to directly rent clusters for various AI model training and fine-tune, eliminating the complexity of deploying and building infrastructure, and surpassing traditional cloud computing vendors. This makes us wonder if the huge computing increment brought by AI has made NVIDIA ignite its ambition to do cloud computing?

On the other hand, does LLM company, which truly far exceeds the capabilities of its competitors' models, have the opportunity to extend downward and hit a new cloud? As analyzed earlier, computing infrastructure is the most certain sustainable profitable and barrier-free link in the current ecosystem. If SOTA LLM is exclusively bound to a cloud service, downstream customers are likely to have a higher stickiness to SOTA LLM than cloud service providers, and the potential opportunities here are worth in-depth study.

Undoubtedly, both new and old players, the strategy of competing with LLM is crucial for cloud service providers (on the same day of publication, AWS released Amazon Bedrock and officially joined the fray).

4.6 Whether downstream applications and tools have stable living space

The encounter between Jasper and Langchain has sparked a huge debate among entrepreneurs: Will OpenAI, which is rapidly upgrading its capabilities, gradually cannibalize the living space of downstream applications and tools?

We think entrepreneurs can break it down into 2 layers to look at this problem:

(1) Q1: Will AGI's constantly upgraded basic model capabilities naturally cover the core competitiveness of my products?

If the core competitiveness of the product is completely a shallow encapsulation of model capabilities, the company's living space is naturally unstable. Application-layer companies should strive to build network effects or data accumulation for their own business. In the case of Jasper, if the company can shift its core product competitiveness from a single "intelligent marketing content production" to a "most intelligent all-in-one marketing platform", the fear of competition with ChatGPT will be greatly reduced. Of course, this puts Jasper in competition with traditional marketing platforms such as Salesforce and Hubspot. Who wins between new and old players in each vertical scenario is also a topic worth studying.

(2) Question 2: In order to continuously develop AGI, does OpenAI want to obtain data in my scenario?

This question goes back to the game between the two data flywheels, and not just the game of technology. OpenAI will continue to want non-fungible valid data that its models have not learned.

Langchain's scenarios have the data that OpenAI hopes to obtain "developers build applications by using various tools to complete user tasks", while the scenes are highly dependent on the GPT ecosystem, and natural scenes and data are recycled by OpenAI;

Bloomberg is not. We believe that taking Bloomberg's fine-tune GPT model will be better than BloombergGPT in terms of both performance and cost. But Bloomberg mastered the deep scenes of finance, the amount of private data large enough and unique enough, and the ability to play with OpenAI. Of course, another level of prisoner's dilemma is: if you choose not to embrace the universal model ecology, will you lose out to competitors built on top of large models?

4.7 Value distribution between the model layer and the application layer

First of all, since OpenAI actually controls the industry pricing power of the LLM model, based on our judgment that OpenAI will continue to pursue the vision of inclusive AGI and the limited profit structure, we believe that OpenAI will not subjectively encroach on the profit margins of downstream applications.

So when the number of parameters of the underlying LLM model increases year by year, will the inference cost of the model be unbearable for upstream applications?

We judge not. Because different intelligent content scenarios, the required model capabilities and affordable model prices are different. For example, writing 10 pieces of marketing copy for Xiaohongshu may require 1 hour for an employee with a monthly salary of 5,000 yuan, while 10 opinions on amendments to cross-border legal contracts require 1 hour for an overseas lawyer with an hourly salary of $400. The sensitivity of the two to model cost is obviously much worse.

4.8 Super traffic ingress on the C-side? Platform vs. Pipeline?

OpenAI undoubtedly demonstrates the potential of a new generation of C-side traffic ingress. However, traffic can become a pipeline or a platform, and the commercial value of the two is not the same.

As Packy McCormick points out in Attention is All You Need, OpenAI was the first to attract Attention with Intelligence. ChatGPT has now established a direct connection with hundreds of millions of users, providing low marginal cost for service users, and can obtain demand-driven multi-party network effects at decreasing marginal cost, becoming a most promising super C-end aggregation platform. The interface of the plugin is completely different from the traditional API, which may have a more profound impact on the C-side, and will not be expanded today.

At the same time, Google is still not to be underestimated, and after Bard recently replaced the underlying model with PaLM, its capabilities have greatly improved. The current Bard is still very Nerdy compared to ChatGPT. However, we expect that with Google's technical depth and various C-end products with 1 billion users, it has the full potential to create a new generation of LLM-based C-end aggregation platform.

In contrast, Anthropic's Claude is considered to have the same level of intelligence as ChatGPT, but its platform potential is far from being unleashed.

Not all LLM followers can successfully replicate the GPT model+ChatGPT+Plugin path. As previously analyzed, OpenAI's achievements today are the result of a combination of technology + product + GTM. Even a relatively independent regional market like China requires a combination of truly leading technical and strategic capabilities to succeed.

Write at the end

The above is a compressed version of OneMoreAI based on the original report, which leaves many questions for further study and discussion in addition to a more in-depth and specific analysis of the information mentioned above.

The LLM industry is still in its infancy, the ecology is still unstable, and the future is full of uncertainty. Starting from the idea of reverse engineering OpenAI, we try to explain and predict the behavior of the industry's most critical players, hoping to establish a macro framework that can systematically discuss the LLM ecosystem for everyone to discuss and welcome this historic wave of AI.

Reference:

Introducing OpenAI https://openai.com/blog/introducing-openai

Planning for AGI and Beyond https://openai.com/blog/planning-for-agi-and-beyond

Generative models https://openai.com/research/generative-models

Unsurpervised Sentiment Neuron https://openai.com/research/unsupervised-sentiment-neuron

Improving Language Understanding by Generative Pre-Training https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf

Language Models are Unsupervised Multitask Learners https://cdn.openai.com/better-language-models/language_models_are_unsupervised_multitask_learners.pdf

Language Models are Few-Shot Learners https://arxiv.org/abs/2005.14165

OpenAI LP https://openai.com/blog/openai-lp

Aligning language models to follow instructions https://openai.com/research/instruction-following

Training Language Models to Follow Instructions with Human Feedback https://arxiv.org/abs/2203.02155

ChatGPT https://openai.com/blog/chatgpt

GPT-4 Technical Report https://cdn.openai.com/papers/gpt-4.pdf

https://www.forbes.com/sites/chriswestfall/2023/01/28/educators-battle-plagiarism-as-89-of-students-admit-to-using-open-ais-chatgpt-for-homework/

https://openai.com/blog/chatgpt-plugins

https://www.similarweb.com

Bard https://bard.google.com/

PaLM API https://blog.google/technology/ai/ai-developers-google-cloud-workspace/

LLaMA: Open and Efficient Foundation Language Models https://arxiv.org/abs/2302.13971

Alpaca: A Strong, Replicable Instruction-Following Model https://crfm.stanford.edu/2023/03/13/alpaca.html

Vicuna: An Open-Source Chatbot Impressing GPT-4 with 90%* ChatGPT Quality https://vicuna.lmsys.org/

Compression for AGI - Jack Rae | Stanford MLSys #76 https://www.youtube.com/watch?v=dO4TPJkeaaU&t=247s

AI Today and Vision of the Future (Ilya Sutskever interviewed by NVIDIA's Jensen Huang) https://youtu.be/ZZ0atq2yYJw

OpenAI Meta-Learning and Self-Play https://www.youtube.com/watch?v=9EN_HoEk3KY

Minds, brains, and programs https://www.cambridge.org/core/journals/behavioral-and-brain-sciences/article/abs/minds-brains-and-programs/DC644B47A4299C637C89772FACC2706A

Mastering the game of Go with deep neural networks and tree search https://storage.googleapis.com/deepmind-media/alphago/AlphaGoNaturePaper.pdf

Improving alignment of dialogue agents via targeted human judgements

https://arxiv.org/pdf/2209.14375.pdf

https://alphacode.deepmind.com/

Aligning language models to follow instructions https://openai.com/research/instruction-following

Constitutional AI: Harmlessness from AI Feedback https://arxiv.org/pdf/2212.08073.pdf

Evaluating Large Language Models Trained on Code https://arxiv.org/pdf/2107.03374.pdf

GPT-4 Technical Report

https://cdn.openai.com/papers/gpt-4.pdf

https://www.linkedin.com/posts/weights-biases_peter-welinder-of-openai-on-how-they-use-activity-7042149010198974464-28DP

OpenAI Triton

https://github.com/openai/triton

BloombergGPT: A Large Language Model for Finance https://arxiv.org/pdf/2303.17564.pdf

https://fortune.com/2023/03/27/altman-vs-musk-openai-treads-on-teslas-robot-turf-with-investment-in-norways-1x/

https://www.reuters.com/technology/microsoft-talks-invest-10-bln-chatgpt-owner-semafor-2023-01-10/

Technology and wealth inequality

https://blog.samaltman.com/technology-and-wealth-inequality

Introducing  Claude

https://www.anthropic.com/index/introducing-claude

Who Owns the Generative AI Platform?

https://a16z.com/2023/01/19/who-owns-the-generative-ai-platform/

nft