Why do processor instructions need to be pipelined?

An instruction pipeline is a technique used in the design of computers and other digital electronic devices to increase their instruction throughput, with the basic idea of breaking down the processing of computer instructions into a series of independent steps, with storage at the end of each step. This allows the computer's control circuitry to issue instructions at the processing speed of the slowest step, which is much faster than the time it would take to perform all the steps at once.

Meaning of the pipeline:

Similar to a factory production line, a piece of work is divided into a number of fixed processes.

CPU pipeline technology is a technology that decomposes instructions into multiple steps and overlaps the operations of different instructions, so as to realize the parallel processing of several instructions to speed up the process of program running. Each step of the instruction is handled by its own independent circuit, and each completed step moves on to the next step, while the previous step processes the subsequent instructions. (The principle is the same as the production line)

CPU instruction pipeline

According to the basis described earlier, the process of instructions entering the pipeline, processing through the pipeline, and coming out of the assembly line is relatively intuitive for us programmers.

The I486 has a five-stage assembly line. They are: Fetch, D1, main decode, D2, translate, EX, execute, and WB. An instruction can be at any level of the pipeline.

But such an assembly line has an obvious flaw. For the following instruction code, their function is to swap the contents of the two variables.

XOR a, b

XOR b, a

XOR a, b

There is no pipeline from 8086 to 386 processors. The processor can only execute one instruction at a time. In this architecture, there would be no problem with the execution of the above code.

But the i486 processor is the first x86 processor to have a pipeline, what happens if it executes the code above? When you go and watch a lot of instructions running in the pipeline, you get confused, so you need to go back and refer to the diagram above.

1. The first step is for the first instruction to enter the finger extraction stage;

2. Then in the second step, the first instruction enters the decoding stage, and the second instruction enters the finger taking stage at the same time;

3. The first instruction of the third step enters the forwarding stage, the second instruction enters the decoding stage, and the third instruction enters the finger extraction stage.

4. However, there will be a problem in the fourth step, the first instruction will enter the execution stage, and the other instructions will not continue to move forward.

5. The second XOR instruction needs the result A calculated by the first XOR instruction, but it will not be written back until the first instruction is executed.

Therefore, the other instructions of the pipeline will wait at the current pipeline level until the execution and writeback phase of the first instruction is completed. The second instruction waits for the first instruction to complete before entering the next level of the pipeline, and the third instruction waits for the second instruction to complete.

This phenomenon is known as assembly line blockage or assembly line bubbles.

Common Concepts:

1. Pipeline progression: the number of beats of the assembly line.

2. Throughput rate: the number of tasks that the assembly line can handle per unit time.

3. Maximum throughput rate: the throughput rate that can be obtained after the assembly line reaches a stable state of uninterrupted flow.

4. Acceleration ratio: the ratio of the working speed of the flow mode to the equivalent sequential working mode time.

Pipeline indicators:

1. Pipeline technology does not help reduce the processing latency of a single task, but it helps to improve the throughput rate of the overall workload

2. Operate multiple different tasks at the same time and use different resources

3. Potential acceleration ratio = pipeline progression

4. The rate of the assembly line is limited by the slowest flow section

5. If the execution time of the flow section is not balanced, the acceleration ratio will be reduced

6. The time to start filling the pipeline and the time to discharge the last discharge line reduce the acceleration ratio

ARM7 in the low-power embedded field adopts a three-stage pipeline structure.

Pipeline of MIPS architecture

Pipeline of x86 architecture

A pipeline is a design method used in a processor to execute instructions. It breaks down the execution process of an instruction into multiple stages and allows multiple instructions to be processed at the same time. This design can improve the performance of the processor, but it can also have some implications.

Here are some of the effects of pipelines on processor performance:

Increased instruction throughput: Pipelines allow different stages of multiple instructions to be processed at the same time, increasing the throughput of instructions. This means that more instructions can be executed at the same time, speeding up the execution of the program.
Increased latency: Although pipelines increase throughput, they also introduce some latency. Since the instructions are broken down into multiple stages, each stage takes a certain amount of time to execute. If one of the stages takes a long time to execute, the efficiency of the entire pipeline may be affected.
Data Adventure and Control Adventure: Data adventure and control adventure may be encountered during pipeline execution. Data adventure occurs in the case of waiting for the result of the previous instruction, while control adventure involves a situation where a branch instruction may change the flow of the program. These ventures can lead to pipeline disruptions or delays.
Complex design and management: The design and management of pipelines is relatively complex, and it is necessary to consider the coordination of various stages, error handling, branch prediction, etc. This makes processor design more complex, potentially increasing cost and difficulty.

Some of the Hazards and Solutions that may be encountered during the execution of instructions:

Structure Hazards（结构冒险）:

More hardware: In processors, structural adventures can occur in situations where multiple instructions attempt to use the same hardware resource at the same time. To solve this problem, the number of hardware resources can be increased, such as adding more functional units or execution units so that more instructions can be processed at the same time.
Register renaming: By using the renaming technique in a pipeline, multiple instructions can use the same register at the same time without conflict. This helps to avoid delays due to structural risk-taking.
Performed by compiler: The compiler can minimize structural risk by adjusting the order of instructions or inserting appropriate instructions. This may include a rearrangement of data and actions to make better use of hardware resources.

Data Hazards（数据冒险）:

Compiler Scheduling: Compilers can minimize data adventures by rescheduling the order in which instructions are executed. By considering the data dependencies between instructions, the compiler can optimize the arrangement of instructions to reduce the risk of reading after Write, Write after Write, and Write after Read.
Out-of-order execution: The processor can execute instructions based on the availability of data without executing them in order. This helps reduce the impact of data adventures and improve execution efficiency.
Register renaming: It also serves to reduce data risk here, as it allows multiple instructions to use the same register at the same time without conflict.

Control Hazards（控制冒险）:

Branch Prediction: The processor can use branch prediction to guess the execution path of branch instructions, thus avoiding control risks due to branching. Correct forecasting can reduce pipeline disruptions and improve execution efficiency.
Speculative execution: Some processors may employ a guess execution technique to continue executing instructions without determining the outcome of the branch. If the guess is correct, the execution efficiency will improve, otherwise, you need to roll back to the correct execution path.

Overall, these techniques and methods are designed to maximize the performance of the processor while overcoming the structural, data, and control risks that may be encountered during instruction execution.

What's wrong with too many pipeline progressions?

While pipelining can improve processor performance, excessive pipeline progressions can also introduce a number of problems, including:

Increase the clock cycle: Each pipeline level requires a clock cycle to complete, and each stage in the pipeline goes through. Too many pipeline stages can lead to an increase in the total execution time per instruction, as the number of clock cycles increases accordingly.
Increased latency: Multi-stage pipelines can cause delays in data transmission and control signaling between increased pipeline levels. These delays can reduce the efficiency of the pipeline, as instructions waiting at a higher pipeline level can block the flow of subsequent instructions.
Complex design and control: As the number of pipeline stages increases, the design and control of processors becomes more complex. More hardware resources and more complex logic are required to ensure that the various pipeline stages work together. This can increase the complexity of the design and the cost of manufacturing.
Add Structural Risk: A structural adventure is a situation where multiple instructions attempt to use the same hardware resource at the same time. Too many pipeline sequences can lead to more structural risk-taking, as competition for hardware resources can increase.
Increased energy consumption: More pipeline stages typically require more power consumption. Power is required at each pipeline level, and there may be more voltage and current fluctuations when switching clock cycles. This can lead to an increase in the overall power consumption of the processor.
Challenges with branch prediction: As the number of pipeline sequences increases, the prediction of branch instructions becomes more complex. A longer pipeline can lead to more branch prediction errors because the execution path of the branch is determined at a later stage.

When designing a processor, you need to weigh the number of pipeline stages and performance to ensure that you don't introduce too many problems and complexity while improving performance. Different types of applications and requirements may require different levels of pipeline to achieve optimal performance.

Overview of pipeline processing

CPU works in the following ways:

Introduction

Cheng-hsien

stream

Similar to the principle of flooring...

The working mode of the instruction is divided into taking the instruction, analyzing and executing the instruction

If the execution time of each stage is equal, a total of 3N t is required

Advantages: simple control;

Disadvantages: slow speed, low utilization rate of various parts of the machine.

Overlap: In the interpretation process of two similar instructions, there is some overlap in time between the different stages of interpretation.

These include primary overlapping, advance control technology, and multi-operating component parallelism.

Advance the overlap time of two adjacent commands by one stage: T=3×t+(n-1)×t=(n+2)×t

One-time overlap: The operation of taking the instruction is implicit in the process of analyzing and executing the instruction, and only the "execution" of the previous instruction and the "analysis" of the next instruction are allowed to overlap at any time. T=（n+1）×t

If the time is not equal, there is the actual execution time:

Pre-control: The analysis component and the execution component can continuously analyze and execute the instruction respectively, and the technology of combining prefetching and buffering, through the prior control of the instruction flow and data flow, the instruction analyzer and the execution component can work continuously and in parallel as much as possible.

Execution time:

Parallel operation of multiple functional components: The processing machine with multiple functional components is used to disperse the multiple functions of the ALU into several components with specialized functions, which can work in parallel, so that the speed of instruction outflow is greatly improved.

Advance control: Modern computer instruction systems are complex, and the time required for "analysis" and "execution" often varies greatly, resulting in a waste of functional components, therefore, the need to adopt advance control technology.

Analyze how instructions and instructions are executed at different times

The use of a leading buffer stack is a representation of the instruction execution process

Advance Control:

Generally, the method of first buffer stack is used:

There are generally four types of buffer stacks:

Predecessor buffer stack

When the main memory is busy, the instruction analyzer can get the required instructions from the antecedent instruction buffer stack.

先操作栈

For conditional transfers, etc.

Reading Stack first

Buffer memory between the main and combinatorial memory is used to smooth out the work between the combinator and the main memory.

Write the stack on the back line

Currently, data that is not fully written to the main memory can be staged to the write stack

Structure of the processor with preemptive control:

Buffer depth design in advance control:

An example of an edge case calculation:

Suppose the precedent buffer stack is fully filled and the buffer depth is D1.

At this time, the output of the instruction buffer stack has the fastest outflow speed, while the input end has the slowest inflow speed

Assuming that the maximum length of an instruction sequence is L1, the average time to analyze an instruction is t1

What's worse at this time is that the instruction is very slow, and the average time to get an instruction is t2

Suppose that the number of command analysis entries is L1 when the advance control stack is full and emptyed

则此时有:L1t1 = (L1-D1)t2

The i486 processor, introduced in 1989, introduced a five-stage pipeline. At this point, there is no longer just one instruction running in the CPU, and each stage of the pipeline is running a different instruction at the same time. This design more than doubles the performance of the i486 processor at the same frequency. The finger fetching stage in the five-level pipeline takes the instruction out of the instruction cache (the instruction cache in i486 is 8KB), the second level is the decoding stage, which translates the retrieved instruction into a specific functional operation, the third level is the forwarding stage, which is used to convert the memory address and offset, the fourth level is the execution stage, where the instruction actually executes the operation, and the fifth level is the exit stage, where the result of the operation is written back to the register or memory. Since the processor runs multiple instructions at the same time, the performance of the program is greatly improved.

The processor generally consists of the following functional units:

Take the unit

Decoding unit

Execution unit

Load/store unit (load is used to fetch data from memory, while STORE is used to save data to memory)

Exception/Break Units

Power management unit

A pipeline usually consists of units such as fingering, decoding, execution, and load/store. Each unit repeats its work in a cycle according to the steps shown in the diagram.

Super Flow

The super pipeline technology is to increase the main frequency through the refinement of the flow line. Enabling the machine to complete one or more operations in a cycle is essentially exchanging space for time.

The ultra-flow processor is relative to the benchmark processor, and the pipeline of the general CPU is the basic instruction prefetching, decoding, execution and writeback result four levels. Superpiplined means that the internal pipeline of a certain type of CPU exceeds the usual 5~6 steps, for example, the pipeline of Pentium pro is up to 14 steps. The more steps (stages) the pipeline designs, the faster it can complete an instruction, so it can adapt to the CPU with a higher working frequency. We can illustrate this point with daily examples, for example, there are 5 people relaying the wood (corresponding to a 5-level assembly line), and the super-flow is to refine the flow process, that is, 10 people relay (at this time, it is 10 levels of flow), and it is obvious that the speed of completing all the tasks will be faster. It is equivalent to Chairman Mao's words: there is great strength (high efficiency) in numbers.

Excess scalars

Superscalar refers to the fact that there is more than one pipeline in the CPU, and more than one instruction can be completed in each clock cycle, which is called superscalar technology. Its essence is to exchange space for time.

CPU architecture refers to a type of parallel operation that implements instruction-level parallel in a processor core. This technology can achieve a higher CPU throughput rate at the same CPU frequency.

What is the difference between CPU, SoC, and MCU?

Why do MCUs generally run RTOS, and SoCs generally run embedded Linux?

Why are there so many processor instruction sets?

Is the Harvard structure so good, and why is von Neumann more popular?

Why do processors need multicores?

Domestic SoC manufacturers

After a bloody acquisition, which MCU is stronger?

Enter the hard ten website, you can download more processor materials, and buy it for free for a limited time

Why do processor instructions need to be pipelined?

CPU instruction pipeline

Read on

Hongmeng + Pura70 with the same processor, Huawei's new machine will be launched soon, and the configuration and design will be exposed

I almost broke the defense.,I didn't expect the mid-range machine to also have a flagship experience [covering my face],In recent days, I've played with OPPOReno12.,Today I'm going to briefly talk about the hands-on experience.,Give me an idea.

HarmonyOS 4.2+ Kirin processor, a variety of Huawei official refurbished phones ushered in new discounts

Apple advertises a serious error, and the processor can be mislabeled丨iPhone5S has become an obsolete product

AMD's third-generation Ryzen AI processors were released: Zen5, XNDA 2 collection, beating Apple, Intel, and Qualcomm

AMD has officially announced that the Ryzen 9000 desktop processor will be available in July

Don't have the budget to buy an iPhone, and you want the same live picture? It is recommended that you try the new domestic machine OPPOReno12, which is very popular in 618, is currently on the Jingdong mobile phone racing list

Computex 2024: AMD Announces Zen 5 Ryzen 9000 Desktop Processor with 16% IPC Improvement

Powered by AMD Ryzen AI 9 HX 370 processor, ROG has unveiled the new ROG X16 Air

It is said that 618 is a good time to buy a mobile phone, and after looking around, I found that what I said was really right! The iPhone15 can be started as long as it is more than 4K, which feels pretty good, but I see the 128GB one

The first Snapdragon 8 Gen 4 processor Xiaomi 15 series was exposed

The beginning of June every year can be said to be the most important time for high school students, in addition to being nervous about the upcoming college entrance examination, but also looking forward to a large period of free time after the college entrance examination

Intel Lunar Lake Processor Details Announced IPC Performance Increases by 14% and Power Consumption Reduces by 40%

How good is the opporeno12? The current competition in the market for mid-to-high-end models is very sufficient, and in May, various manufacturers successively released the Honor 200 and Oppore

Intel released Xeon 6 processors, CEO Kissinger: If the United States increases sanctions, China must produce its own chips

Zhang Moumou and Zou Moumou were dealt with in accordance with the law! Hush~

300% faster performance! 120 TOPS AI Hashrate Acceleration: The new AI PC Core Ultra processor is launched

Lin Shiyi, chairman of Xiaoshan Rural Commercial Bank, has a postgraduate degree? The bank's "teller scolded the customer" to deal with it to a certain extent

The inheritance of traditional line drawing, the fusion of sketch sketches, the influence of Western watercolor oil painting, and the processing of figurative abstract relations #Impressionist Chinese painting##画家那些事#

What should I do if I am owed wages? Don't panic, teach you how to deal with it!

Media person: There was only a personal handling of the fake gambling briefing meeting, and there was no mention of the football industry as a whole

@人民日报@央视新闻红领巾新系法事件还不关注吗? Let's find out thoroughly! The relevant departments have not dealt with anyone after the 22-year poisoning textbook incident, and they are not self-conscious

"Between the Ink Rain Clouds" This misunderstanding is a big one! My aunt treated her niece, whom she hadn't seen for more than ten years, as a wild fox brought back by her husband, and took her grandmother and took a broom to beat people...... Ginger pear is more than a year old

RIVIAN's new R1T and R1S are officially released! RIVIANR1T positioned as a pure electric pickup truck, the new model is aimed at the front bumper, rear envelope wheels, and rear logo

Common practices for pork lung folded ear root soup:1. Ingredients: pig lungs, folded ear roots. 2. Treat the pig lungs: Rinse the pig lungs with the faucet, let the pig lungs fully expand, divide them into large pieces, and cut them

137 million in 3 years! Price reduction deal! Congratulations to the Lakers for getting 25+5 stars?

At the moment when the goal was lost in the battle between China and Thailand, Sun Jihai, the commentator, was in a hurry in the studio and shouted: "Press up, press up, you can't retreat!" Some netizens said that Sun did not have back pain when he stood and spoke. Same field

How to deal with the benefits obtained from disciplinary violations

In 2022, a 17-year-old girl was diagnosed with brain death, and her parents had just signed an organ donation form, when a doctor from a neighboring hospital called: "Don't donate! I can save her! "The end result? (Source:

"The Story of the Rose" is really a coincidence! Huang Yimei went to the interview and accidentally soiled her collar, so she went to the bathroom to deal with it, and a person came in at this time. Huang Yimei saw that the other party was wearing silk