laitimes

Why do processor instructions need to be pipelined?

author:Hard ten

An instruction pipeline is a technique used in the design of computers and other digital electronic devices to increase their instruction throughput, with the basic idea of breaking down the processing of computer instructions into a series of independent steps, with storage at the end of each step. This allows the computer's control circuitry to issue instructions at the processing speed of the slowest step, which is much faster than the time it would take to perform all the steps at once.

Why do processor instructions need to be pipelined?

Meaning of the pipeline:

Similar to a factory production line, a piece of work is divided into a number of fixed processes.

Why do processor instructions need to be pipelined?

CPU pipeline technology is a technology that decomposes instructions into multiple steps and overlaps the operations of different instructions, so as to realize the parallel processing of several instructions to speed up the process of program running. Each step of the instruction is handled by its own independent circuit, and each completed step moves on to the next step, while the previous step processes the subsequent instructions. (The principle is the same as the production line)

Why do processor instructions need to be pipelined?

CPU instruction pipeline

According to the basis described earlier, the process of instructions entering the pipeline, processing through the pipeline, and coming out of the assembly line is relatively intuitive for us programmers.

The I486 has a five-stage assembly line. They are: Fetch, D1, main decode, D2, translate, EX, execute, and WB. An instruction can be at any level of the pipeline.

Why do processor instructions need to be pipelined?

But such an assembly line has an obvious flaw. For the following instruction code, their function is to swap the contents of the two variables.

1

2

3

XOR a, b

XOR b, a

XOR a, b

There is no pipeline from 8086 to 386 processors. The processor can only execute one instruction at a time. In this architecture, there would be no problem with the execution of the above code.

But the i486 processor is the first x86 processor to have a pipeline, what happens if it executes the code above? When you go and watch a lot of instructions running in the pipeline, you get confused, so you need to go back and refer to the diagram above.

1. The first step is for the first instruction to enter the finger extraction stage;

2. Then in the second step, the first instruction enters the decoding stage, and the second instruction enters the finger taking stage at the same time;

3. The first instruction of the third step enters the forwarding stage, the second instruction enters the decoding stage, and the third instruction enters the finger extraction stage.

4. However, there will be a problem in the fourth step, the first instruction will enter the execution stage, and the other instructions will not continue to move forward.

5. The second XOR instruction needs the result A calculated by the first XOR instruction, but it will not be written back until the first instruction is executed.

Therefore, the other instructions of the pipeline will wait at the current pipeline level until the execution and writeback phase of the first instruction is completed. The second instruction waits for the first instruction to complete before entering the next level of the pipeline, and the third instruction waits for the second instruction to complete.

This phenomenon is known as assembly line blockage or assembly line bubbles.

Common Concepts:

1. Pipeline progression: the number of beats of the assembly line.

2. Throughput rate: the number of tasks that the assembly line can handle per unit time.

3. Maximum throughput rate: the throughput rate that can be obtained after the assembly line reaches a stable state of uninterrupted flow.

4. Acceleration ratio: the ratio of the working speed of the flow mode to the equivalent sequential working mode time.

Pipeline indicators:

1. Pipeline technology does not help reduce the processing latency of a single task, but it helps to improve the throughput rate of the overall workload

2. Operate multiple different tasks at the same time and use different resources

3. Potential acceleration ratio = pipeline progression

4. The rate of the assembly line is limited by the slowest flow section

5. If the execution time of the flow section is not balanced, the acceleration ratio will be reduced

6. The time to start filling the pipeline and the time to discharge the last discharge line reduce the acceleration ratio

ARM7 in the low-power embedded field adopts a three-stage pipeline structure.

Why do processor instructions need to be pipelined?

Pipeline of MIPS architecture

Why do processor instructions need to be pipelined?

Pipeline of x86 architecture

Why do processor instructions need to be pipelined?

A pipeline is a design method used in a processor to execute instructions. It breaks down the execution process of an instruction into multiple stages and allows multiple instructions to be processed at the same time. This design can improve the performance of the processor, but it can also have some implications.

Here are some of the effects of pipelines on processor performance:

  1. Increased instruction throughput: Pipelines allow different stages of multiple instructions to be processed at the same time, increasing the throughput of instructions. This means that more instructions can be executed at the same time, speeding up the execution of the program.
  2. Increased latency: Although pipelines increase throughput, they also introduce some latency. Since the instructions are broken down into multiple stages, each stage takes a certain amount of time to execute. If one of the stages takes a long time to execute, the efficiency of the entire pipeline may be affected.
  3. Data Adventure and Control Adventure: Data adventure and control adventure may be encountered during pipeline execution. Data adventure occurs in the case of waiting for the result of the previous instruction, while control adventure involves a situation where a branch instruction may change the flow of the program. These ventures can lead to pipeline disruptions or delays.
  4. Complex design and management: The design and management of pipelines is relatively complex, and it is necessary to consider the coordination of various stages, error handling, branch prediction, etc. This makes processor design more complex, potentially increasing cost and difficulty.

Some of the Hazards and Solutions that may be encountered during the execution of instructions:

  1. Structure Hazards(结构冒险):
  • More hardware: In processors, structural adventures can occur in situations where multiple instructions attempt to use the same hardware resource at the same time. To solve this problem, the number of hardware resources can be increased, such as adding more functional units or execution units so that more instructions can be processed at the same time.
  • Register renaming: By using the renaming technique in a pipeline, multiple instructions can use the same register at the same time without conflict. This helps to avoid delays due to structural risk-taking.
  • Performed by compiler: The compiler can minimize structural risk by adjusting the order of instructions or inserting appropriate instructions. This may include a rearrangement of data and actions to make better use of hardware resources.
  • Data Hazards(数据冒险):
    • Compiler Scheduling: Compilers can minimize data adventures by rescheduling the order in which instructions are executed. By considering the data dependencies between instructions, the compiler can optimize the arrangement of instructions to reduce the risk of reading after Write, Write after Write, and Write after Read.
    • Out-of-order execution: The processor can execute instructions based on the availability of data without executing them in order. This helps reduce the impact of data adventures and improve execution efficiency.
    • Register renaming: It also serves to reduce data risk here, as it allows multiple instructions to use the same register at the same time without conflict.
  • Control Hazards(控制冒险):
    • Branch Prediction: The processor can use branch prediction to guess the execution path of branch instructions, thus avoiding control risks due to branching. Correct forecasting can reduce pipeline disruptions and improve execution efficiency.
    • Speculative execution: Some processors may employ a guess execution technique to continue executing instructions without determining the outcome of the branch. If the guess is correct, the execution efficiency will improve, otherwise, you need to roll back to the correct execution path.

    Overall, these techniques and methods are designed to maximize the performance of the processor while overcoming the structural, data, and control risks that may be encountered during instruction execution.

    What's wrong with too many pipeline progressions?

    While pipelining can improve processor performance, excessive pipeline progressions can also introduce a number of problems, including:

    1. Increase the clock cycle: Each pipeline level requires a clock cycle to complete, and each stage in the pipeline goes through. Too many pipeline stages can lead to an increase in the total execution time per instruction, as the number of clock cycles increases accordingly.
    2. Increased latency: Multi-stage pipelines can cause delays in data transmission and control signaling between increased pipeline levels. These delays can reduce the efficiency of the pipeline, as instructions waiting at a higher pipeline level can block the flow of subsequent instructions.
    3. Complex design and control: As the number of pipeline stages increases, the design and control of processors becomes more complex. More hardware resources and more complex logic are required to ensure that the various pipeline stages work together. This can increase the complexity of the design and the cost of manufacturing.
    4. Add Structural Risk: A structural adventure is a situation where multiple instructions attempt to use the same hardware resource at the same time. Too many pipeline sequences can lead to more structural risk-taking, as competition for hardware resources can increase.
    5. Increased energy consumption: More pipeline stages typically require more power consumption. Power is required at each pipeline level, and there may be more voltage and current fluctuations when switching clock cycles. This can lead to an increase in the overall power consumption of the processor.
    6. Challenges with branch prediction: As the number of pipeline sequences increases, the prediction of branch instructions becomes more complex. A longer pipeline can lead to more branch prediction errors because the execution path of the branch is determined at a later stage.

    When designing a processor, you need to weigh the number of pipeline stages and performance to ensure that you don't introduce too many problems and complexity while improving performance. Different types of applications and requirements may require different levels of pipeline to achieve optimal performance.

    Overview of pipeline processing

    CPU works in the following ways:

    Introduction

    Cheng-hsien

    stream

    Similar to the principle of flooring...

    The working mode of the instruction is divided into taking the instruction, analyzing and executing the instruction

    Why do processor instructions need to be pipelined?

    If the execution time of each stage is equal, a total of 3N t is required

    Advantages: simple control;

    Disadvantages: slow speed, low utilization rate of various parts of the machine.

    Overlap: In the interpretation process of two similar instructions, there is some overlap in time between the different stages of interpretation.

    These include primary overlapping, advance control technology, and multi-operating component parallelism.

    Advance the overlap time of two adjacent commands by one stage: T=3×t+(n-1)×t=(n+2)×t

    One-time overlap: The operation of taking the instruction is implicit in the process of analyzing and executing the instruction, and only the "execution" of the previous instruction and the "analysis" of the next instruction are allowed to overlap at any time. T=(n+1)×t

    If the time is not equal, there is the actual execution time:

    Why do processor instructions need to be pipelined?

    Pre-control: The analysis component and the execution component can continuously analyze and execute the instruction respectively, and the technology of combining prefetching and buffering, through the prior control of the instruction flow and data flow, the instruction analyzer and the execution component can work continuously and in parallel as much as possible.

    Execution time:

    Why do processor instructions need to be pipelined?

    Parallel operation of multiple functional components: The processing machine with multiple functional components is used to disperse the multiple functions of the ALU into several components with specialized functions, which can work in parallel, so that the speed of instruction outflow is greatly improved.

    Why do processor instructions need to be pipelined?

    Advance control: Modern computer instruction systems are complex, and the time required for "analysis" and "execution" often varies greatly, resulting in a waste of functional components, therefore, the need to adopt advance control technology.

    Why do processor instructions need to be pipelined?

    Analyze how instructions and instructions are executed at different times

    Why do processor instructions need to be pipelined?

    The use of a leading buffer stack is a representation of the instruction execution process

    Advance Control:

    Generally, the method of first buffer stack is used:

    There are generally four types of buffer stacks:

    Predecessor buffer stack

    When the main memory is busy, the instruction analyzer can get the required instructions from the antecedent instruction buffer stack.

    先操作栈

    For conditional transfers, etc.

    Reading Stack first

    Buffer memory between the main and combinatorial memory is used to smooth out the work between the combinator and the main memory.

    Write the stack on the back line

    Currently, data that is not fully written to the main memory can be staged to the write stack

    Structure of the processor with preemptive control:

    Why do processor instructions need to be pipelined?

    Buffer depth design in advance control:

    An example of an edge case calculation:

    Suppose the precedent buffer stack is fully filled and the buffer depth is D1.

    At this time, the output of the instruction buffer stack has the fastest outflow speed, while the input end has the slowest inflow speed

    Assuming that the maximum length of an instruction sequence is L1, the average time to analyze an instruction is t1

    What's worse at this time is that the instruction is very slow, and the average time to get an instruction is t2

    Suppose that the number of command analysis entries is L1 when the advance control stack is full and emptyed

    则此时有:L1t1 = (L1-D1)t2

    Why do processor instructions need to be pipelined?

    The i486 processor, introduced in 1989, introduced a five-stage pipeline. At this point, there is no longer just one instruction running in the CPU, and each stage of the pipeline is running a different instruction at the same time. This design more than doubles the performance of the i486 processor at the same frequency. The finger fetching stage in the five-level pipeline takes the instruction out of the instruction cache (the instruction cache in i486 is 8KB), the second level is the decoding stage, which translates the retrieved instruction into a specific functional operation, the third level is the forwarding stage, which is used to convert the memory address and offset, the fourth level is the execution stage, where the instruction actually executes the operation, and the fifth level is the exit stage, where the result of the operation is written back to the register or memory. Since the processor runs multiple instructions at the same time, the performance of the program is greatly improved.

    The processor generally consists of the following functional units:

    Take the unit

    Decoding unit

    Execution unit

    Load/store unit (load is used to fetch data from memory, while STORE is used to save data to memory)

    Exception/Break Units

    Power management unit

    A pipeline usually consists of units such as fingering, decoding, execution, and load/store. Each unit repeats its work in a cycle according to the steps shown in the diagram.

    Super Flow

    The super pipeline technology is to increase the main frequency through the refinement of the flow line. Enabling the machine to complete one or more operations in a cycle is essentially exchanging space for time.

    The ultra-flow processor is relative to the benchmark processor, and the pipeline of the general CPU is the basic instruction prefetching, decoding, execution and writeback result four levels. Superpiplined means that the internal pipeline of a certain type of CPU exceeds the usual 5~6 steps, for example, the pipeline of Pentium pro is up to 14 steps. The more steps (stages) the pipeline designs, the faster it can complete an instruction, so it can adapt to the CPU with a higher working frequency. We can illustrate this point with daily examples, for example, there are 5 people relaying the wood (corresponding to a 5-level assembly line), and the super-flow is to refine the flow process, that is, 10 people relay (at this time, it is 10 levels of flow), and it is obvious that the speed of completing all the tasks will be faster. It is equivalent to Chairman Mao's words: there is great strength (high efficiency) in numbers.

    Why do processor instructions need to be pipelined?

    Excess scalars

    Superscalar refers to the fact that there is more than one pipeline in the CPU, and more than one instruction can be completed in each clock cycle, which is called superscalar technology. Its essence is to exchange space for time.

    CPU architecture refers to a type of parallel operation that implements instruction-level parallel in a processor core. This technology can achieve a higher CPU throughput rate at the same CPU frequency.

    Why do processor instructions need to be pipelined?

    What is the difference between CPU, SoC, and MCU?

    Why do MCUs generally run RTOS, and SoCs generally run embedded Linux?

    Why are there so many processor instruction sets?

    Is the Harvard structure so good, and why is von Neumann more popular?

    Why do processors need multicores?

    Domestic SoC manufacturers

    After a bloody acquisition, which MCU is stronger?

    Enter the hard ten website, you can download more processor materials, and buy it for free for a limited time

    Why do processor instructions need to be pipelined?

    Read on