laitimes

Analysis of Al+ Web3: New Production Relations Empower the Era of Artificial Intelligence

author:MarsBit

原文作者:Frank-Zhang.eth

Source: Twitter

Note: This article is from @dvzhangtz Twitter, and Mars Finance is organized as follows:

The author believes that artificial intelligence itself represents a new type of productivity and is the direction of human development. The combination of Web3 and A will make Web3 a new type of production relationship in the new era, and a way of salvation to organize the future human society and avoid the absolute monopoly of AI giants.

As a long-term front-line investor in Web3 investment and a former AI researcher, I think it is my duty to write a track mapping.

Analysis of Al+ Web3: New Production Relations Empower the Era of Artificial Intelligence

1. Objectives of this paper

In order to understand A more fully, we need to understand:

1.A Some basic concepts such as: what is machine learning and why do we need large language models?

2.AI development steps include: data acquisition, model pre-training, model fine tune, model use; It's all about what it's doing.

3. Some emerging directions such as: external knowledge base, federated learning, ZKML, FHEML, promptlearning, and ability neurons.

4. What are the corresponding Web3 projects in the entire A chain?

5. For the entire AI chain, what links have greater value or are easy to produce large projects.

When describing these concepts, I try not to use formulas or definitions, but to use analogies.

This article covers as many new terms as possible, and the author hopes to leave an impression in the reader's mind, and if he encounters it in the future, he can come back to check where it is in the knowledge structure.

2. Basic concepts

Part 1

Today, we are familiar with Web3+AI projects, and their technology belongs to the idea of neural networks in artificial intelligence and machine learning.

The following paragraph mainly defines some basic concepts: artificial intelligence, machine learning, neural networks, training, loss functions, gradient descent, reinforcement learning, and expert systems.

Part 2

artificial intelligence

Definition: Artificial intelligence is a new technical science that researches and develops theories, methods, technologies and application systems that can simulate, extend and expand human intelligence. The research purpose of artificial intelligence is to enable intelligent machines to listen, see, speak, think, learn, and act

My definition: the results given by the machine are the same as the results given by the human, and it is difficult to distinguish between the real and the false (Turing test)

Analysis of Al+ Web3: New Production Relations Empower the Era of Artificial Intelligence

Part 3

Expert system

If there is a clear step to one thing, the knowledge that needs to be used: the expert system

Part 4

If one thing is hard to describe how to do it:

1. Labeled data: Machine learning, such as analyzing sentiment in text

Example: Training data required

配钥匙师傅问我:”你配吗”neutral

隔壁很壮的小王问我:”你配吗“-negative

2. Virtually unlabeled data: reinforcement learning, such as playing chess

Analysis of Al+ Web3: New Production Relations Empower the Era of Artificial Intelligence

Part 5

How neural networks teach machines a piece of knowledge

Machine learning now involves a wide range of knowledge and scope, and we will only discuss the most classic routine of machine learning, neural networks.

How does a neural network teach a machine a knowledge? We can analogize it to us:

If you want to teach a puppy how to urinate on a mat (classic case, no bad pointing) - (if you want to teach a machine a knowledge)

Method 1: Reward a lump of meat if the dog urinates on the mat and spank if not

Method 2: If the dog urinates on the mat, reward a piece of meat, and spank if not; And the farther away from the mat, the harder you hit (calculate the loss function)

Method 3: Every time the dog takes a step, make a judgment:

If it is towards the mat, a lump of meat is awarded, and if it is not towards the mat, a spank

(The loss function is calculated once for each training session)

Method 4: Every step the dog takes, a judgment is made

If it is walking towards the mat, the block of meat is awarded, if it is not towards the mat

If the child goes, he will be spanked;

And put a piece of meat in the direction of the mat for the dog to attract the dog to the mat

(The loss function is calculated once for each training session, and then the gradient descent is performed in the direction that best reduces the loss function)

Analysis of Al+ Web3: New Production Relations Empower the Era of Artificial Intelligence

Part 6

Why have neural networks advanced by leaps and bounds in the last decade?

Because in the past decade, human beings have advanced by leaps and bounds in computing power, data, and algorithms.

Computing power: Neural networks were actually proposed in the last century, but the hardware at that time took too long to run neural networks. However, with the development of chip technology in this century, the computing power of computer chips has doubled in 18 months. There are even GPUs that excel at parallel computing, which makes neural networks "acceptable" in terms of computation time.

Data: Social media and the Internet have accumulated a lot of training data, and large manufacturers also have related automation needs.

Models: With computing power and data, researchers have developed a series of more efficient and accurate models.

"Computing power", "data", and "model" are also known as the three elements of artificial intelligence.

Part 7

Why large language models (LLMs) are important

Why you should care: We're here today because you're curious about Al+ web3; And A fire is because of ChatGPT; ChatGPT is one example of a large language model.

Why do we need large language models: As we said above, machine learning requires training data, but the cost of large-scale data labeling is too high. Large language models solve this problem in an ingenious way.

Analysis of Al+ Web3: New Production Relations Empower the Era of Artificial Intelligence

Part8

BERT – the first large language model

What if we don't have training data? A human sentence is a narrative in itself. We can use cloze to create data.

We can hollow out a paragraph and dig out some words and let the transformer architecture (unimportant) model predict what words should be filled in these places (let the dog find the mat);

If the model prediction is wrong, measure some loss function, gradient descent (if the dog walks towards the mat, reward the piece of meat, if it does not walk towards the mat, spank, and give the dog a piece of meat in the direction of the mat to attract the dog to the mat)

In this way, all the passages on the Internet can become training data. Such a training process is also called "pre-training", so large language models are also called pre-trained models. Such a model can give him a sentence and let him guess word by word, what words should be said next. The experience is the same as we do with ChatGPT now.

My understanding of pre-training: Pre-training allows the machine to learn general knowledge from the corpus and develop a "sense of language".

Analysis of Al+ Web3: New Production Relations Empower the Era of Artificial Intelligence

Part 9

Subsequent development of large language models

After Bert came up with it, everyone found it to work!

You just need to make the model bigger and the training data bigger, and the results will get better and better. It's not just a brainless rush.

Explosive training data: Bert used all wikipedia and book data for training, and later expanded the training data to the English data of the whole network, and later extended to all languages of the whole network

The number of model parameters is skyrocketing

Analysis of Al+ Web3: New Production Relations Empower the Era of Artificial Intelligence

3. Steps of AI development

Part 1

Pre-training data acquisition

Pre-training generally requires a huge amount of data, which requires crawling of various web pages on the whole network, accumulating data in terabytes, and then preprocessing

After the data collection is completed, a large amount of computing power needs to be mobilized, and hundreds of A100/TPU computing power are used for pre-training

Part 2

The model is pre-trained for the second time

(option) pre-training allows the machine to learn general human knowledge from the corpus and cultivates the "sense of language", but if we want the model to have more knowledge in a certain domain, we can take the corpus of this domain and pour it into the model for secondary pre-training.

For example, Meituan, as a catering delivery platform, needs a large model that should know more about catering delivery knowledge. Therefore, Meituan used the Meituan Dianping business corpus for secondary pre-training and developed MT-Bert.

My understanding of secondary pre-training: secondary pre-training allows the model to become an expert in a certain scenario

Part 3

模型 fine tune 训练

(option) If the pre-trained model wants to become an expert in a certain task, such as an expert in sentiment classification, an expert in topic extraction, or an expert in speaking and reading comprehension; You can use the data on the task to fine-tune the model.

But here you need to label the data, for example, if you need sentiment classification data, you need data like the following:

配钥匙师傅问我:”你配吗”neutral

隔壁很壮的小王问我:”你配吗“negative

My understanding of secondary pre-training: Fine tune makes the model an expert in a certain task

It should be noted that the training of the model requires a large amount of data transfer between graphics cards. At present, we have a large category of Al+ web3 projects that are distributed computing power - people around the world contribute their idle machines to do certain things. But it is very, very difficult to use this computing power to do complete distributed pre-training; If you want to do distributed fine tune training, you also need to be very clever design. This is because the time it takes to transfer information between graphics cards will be higher than the calculated time.

Part 4

It should be noted that the training of the model requires a large amount of data transfer between graphics cards. At present, there is a large category of projects in Al+web3 that are distributed computing power - people around the world contribute their idle machines to do certain things. But it is very, very difficult to use this computing power to do complete distributed pre-training; If you want to do distributed fine tune training, you also need to be very clever design. This is because the time it takes to transfer information between graphics cards will be higher than the calculated time.

Part 5

Model usage

Model usage, also known as model inference. This refers to the process by which a model is used once after it has been trained.

Compared to training, model inference does not require the graphics card to transfer a large amount of data, so distributed inference is a relatively easy thing to do.

Fourth, the latest application of large models

Part 1

External knowledge base

Why: We want the model to know a small amount of knowledge in our domain, but we don't want to spend a lot of money retraining the model

Method: Pack a large amount of PDF data into a vector database and use it as background information as input

Examples: Baidu Cloud, Myshell

Promptlearning

Reason: We feel that the external knowledge base cannot meet our needs for model customization, but we don't want to bear the burden of parameter training for the entire model

Method: Do not train the model, only use the training data, and learn what kind of prompt should be written

Case: Widely used today

Part 2

联邦学习(Federated Learning,FL)

Reason: In the use of training models, we need to provide our own data, which will leak our privacy, which is unacceptable for some financial and medical institutions

Method: Each institution uses data locally to train the model, and then centralizes the model in one place for model fusion

Case in point: Flock

FEML

Reason: Federated learning requires each participant to train a model locally, but this is too high a threshold for each participant

Method: FHE is used for fully homomorphic encryption, and the model can be directly trained with the encrypted data

Cons: Extremely slow and expensive

案例:ZAMA,Privasea

Part 3

ZKML

Reason: When we use the model service provided by others, we want to confirm that they are really providing model services according to our requirements, rather than using a small model and fooling around

Method: Let him use ZK to generate a proof that he is indeed doing the operation he claims to have done

Cons: Slow and expensive

Case in point: Modulus

能力神经元(skillneuron)

The reason for this: today's model is like a black box, we feed him a lot of training data, but we don't know what he has learned; We hope that there will be a way for the model to be optimized in a certain direction, such as having a stronger emotional perception and a higher level of morality

Method: The model is like the brain, some areas of neurons manage emotion, some areas manage morality, find out these nodes, we can optimize targeted

Case Study: Future Directions

5. The classification method of Web3 projects corresponds to the A chain

Part 1

The author will divide it into three categories:

Infra: The infrastructure of Decentralized A

Middleware: Enables Infra to better serve the application layer

Application layer: Some applications that are directly oriented to the C-side/B-side

Part 2

Infra layer: AI infrastructure will always be divided into three categories: data computing algorithms (models)

Decentralized Algorithms (Models):

@TheBittensorHub 研报:x.com/dvzhangtz/stat..@flock_ io

Decentralized Computing Power:

General hashrate: @akashnet_, @ionet

专用算力:@rendernetwork(渲染)、@gensynai(AI),@heuris_ai(Al)@exa_bits (A)(AD,

Decentralized Data:

数据标注:@PublciAl_,QuestLab

Storage: IPFS, FIL

Oracle: Chainlink

索引:The Graph

Part 3

Middleware: How to make Infra better serve the application layer

Privacy: @zama FHE, @Privasea_ai

Validation: EZKL, @ModulusLabs, @gizatechxyz

Application layer: In fact, it is difficult to classify all applications, and only a few of them can be listed

data analysis

@_kaitoai,@DuneAnalytics ,Adot

Agent

Market: @myshell_ai

Web3 Knowledge Chatbots: @qnaweb3

Help someone do the operation: @autonolas

6. What kind of places are easier to produce big projects?

First of all, similar to other fields, Infra is prone to large projects, especially decentralized models and decentralized computing power, and the author feels that its marginal cost is low.

Then, inspired by Brother @owenliang60, the author felt that if there was a killer application in the application layer, it would also become a top-level big project.

Looking back at the history of large models, it was ChatGPT, a killer application, that pushed it to the forefront of the veil, which was not a big iteration of technology, but an optimization for the task of Chat. Maybe in the A+Web3 space, there will be phenomenal applications like Stepn/Friendtech in the future, and we will wait and see

Read on