Abstract: On April 7, the global authoritative AI benchmark MMLPerf released the latest list, in the focus on low power consumption, high energy efficiency in the Field of IoT Tiny v0.7 list, based on the flathead Brother Gentetsu RISC-V C906 processor software and hardware joint optimization scheme, achieved the first place in all 4 indicators, and reached more than 10 times the best performance of other competitors. This also means that the Gentetsu RISC-V C906 processor has become the AIoT computing core with the highest energy efficiency ratio today.

On April 7, the global authoritative AI benchmark MLPerf released the latest list, in the focus on low power consumption, high energy efficiency IoT field Tiny v0.7 list, based on the flat head brother Gentetsu RISC-VC906 processor software and hardware joint optimization scheme, achieved the first place in all 4 indicators, and reached more than 10 times the best performance of other competitors. This also means that the Gentetsu RISC-V C906 processor has become the AIoT computing core with the highest energy efficiency ratio today.
On the day after the test results were announced, Xin Zhixun interviewed Meng Jianyi, vice president of Ali Pingtou Brother, and Yang Jing, head of Pingtou Brother Ecology, hoping to unveil the secret behind pingtou Brother Xuantie RISC-V C906 processor in the AI benchmark test with an absolute advantage to win 4 global firsts. At the same time, Meng Jianyi and Yang Jing also further shared the latest technological and ecological progress of Pingtou Brother in the field of RISC-V, as well as the future development of the RISC-V industry.
A new opportunity for risc-V architectures: energy-efficient AI computing for CPUs
As we all know, at present, in the CPU market, Intel's X86 architecture and Arm architecture are still the most mainstream instruction set architectures. Among them, the x86 architecture dominates the PC and server markets, while the Arm architecture completely occupies the mobile market. However, in the IoT (Internet of Things) market, due to the very fragmented demand, more sensitive to power consumption and cost, it is difficult for processors with X86 architecture or Arm architecture to meet the various customized needs of the IoT market.
In contrast, the RISEC-V architecture that has emerged in recent years has the advantages of very simple instructions, modularity, scalability, open source, and free, making it a natural advantage in the IoT market, and it is easier to develop low-power, energy-efficient, low-cost processor products that meet various customized needs for the Internet of Things market. More critically, in the IoT market, x86 or Arm have no absolute ecological advantages, and the development of RISC-V will not be suppressed. Coupled with the open source RISC-V, which can not only enable enterprises to participate in the collaborative innovation of global RISC-V technology, but also meet the relatively independent and controllable development needs, RISC-V has been sought after by many Chinese manufacturers.
In recent years, with the rise of edge computing and artificial intelligence (AI) technology, more AI computing originally placed in the cloud has been delegated to the edge, which can reduce the consumption of network bandwidth on the one hand, and also reduce the delay of data processing, ensure the security of user data, and improve the overall AI experience of users. In this context, the AIoT (Intelligent Internet of Things) market also puts forward higher requirements for the capabilities of edge AI.
Because AIoT chips are more sensitive to cost and power consumption, different market segments also have different personalized needs, which also makes AIoT chips mostly unable to directly plug in or integrate customized AI acceleration to be responsible for AI computing like chips in the cloud or mobile terminals, and more rely on CPUs for AI computing.
Meng Jianyi also said that in the face of AI needs in the Field of IoT, in fact, many scenarios do not need to use AI accelerators to do, especially in the face of applications with computing power requirements below 1Tops, it can already be dealt with by optimizing and improving the AI capabilities of the CPU, which is very friendly for the cost, power consumption, debuggability and developability of the chip.
Compared with CPU architectures such as x86 and Arm, because the RISC-V architecture CPU has the advantages of low power consumption and low cost, this also makes further mining the AI capabilities of RISC-V CPUs, which has become a new direction for many AIoT chip manufacturers to focus on expanding. Especially under the premise of controlling power consumption, the performance of AI energy efficiency of RISC-V CPUs is more critical.
What is the significance of the C906 winning the world's first place in four AI tests?
The MLPF benchmark test participated in by the Flathead Semiconductor Gentetsu RISC-V C906 is currently one of the most authoritative AI benchmarks in the world. Tiny is a new performance test category for low-power, cost-effective IoT scenarios added by MLPerf in recent years, which is mainly used to demonstrate the software and hardware performance and optimization capabilities of chip manufacturers in the increasingly widely used IoT intelligent market scenario.
It is understood that in the test of Tiny v0.7, the CPU architectures participating in the competition are diverse, covering Arm, RISC-V architecture and self-developed architecture. Without the use of accelerators, Alibaba based on the software and hardware performance optimization results submitted by Quanzhi D1, a flat-headed Brother Gente RISC-V C906 processor core, while meeting the accuracy requirements, it refreshed the record of all 4 benchmarks (mainly visual wake-up, image classification, voice wake-up and anomaly monitoring) performance indicators, creating the best performance of the RISC-V architecture in the history of the MLPerf Tiny benchmark.
△ The MLPerf website shows four tests of gentetsu C906
From the comparison of Tiny v0.7 inference performance data, the four test scores of Gentetsu C906 reached 12.6 times, 20.8 times, 16.2 times and 10.9 times the optimal performance of other institutions, respectively. This shows the performance advantages of Gentetsu C906 in the field of AIoT.
△ Comparison of performance data of the authoritative AI list MMLPerf Tiny V0.7
It is understood that in the field of AI Benchmark, in the past, everyone valued the AI performance of the chip, and in this piece of MMLPerf has achieved internationally recognized authority. After several years of development, especially in the field of IoT, the AI energy efficiency performance of chips has received more and more attention, and MLPerf has also launched a benchmark test for AI energy efficiency in the IoT field.
Meng Jianyi told Xin Zhixun: "Because Ali has always had more layout and accumulation in AI energy efficiency, when Tiny was launched, we naturally wanted to reflect our own capabilities, so we went to participate in the test, and the results of the four indicators were the first, proving that our technical route was correct." ”
It should be noted that the test results were obtained without the use of accelerators, that is, the results of this test fully reflect the AI processing ability of the flathead C906.
"From the perspective of the flathead brother, our positioning is to provide RISC-V native AI support, rather than doing AI accelerators, so we all use CPUs to run, through the collaborative innovation of software and hardware to prove our overall capabilities." Customers can develop more customized AI accelerators based on this. Meng Jianyi emphasized: "We feel that the real value of this matter lies in the promotion of the development of the RISC-V industry, which not only proves that RISC-V is feasible in the energy-efficient processing of AI, but also has a leading edge over other architectures, and everyone can continue to do better in this direction." ”
Calista Redmond, CEO of risc-V International Foundation, also said: "AI technology in the field of Internet of Things is fiercely competitive, and targeted optimization at different levels is essential to achieve new breakthroughs at very low power consumption. Ali's work is a testament to its leadership position in the RISC-V industry and provides confidence in the development of the global RISC-V community and ecosystem. ”
Soft-hard collaborative optimization is key
So, the flat-headed brother Gentetsu C906 successfully won the world's first place in four AI tests this time, and compared with the optimal performance score of friends, what are the secrets behind it?
According to reports, the achievement of this Xuantie C906 is mainly due to the strength of Ali's software and hardware collaborative innovation and optimization, including all aspects from hardware, to compilation, to algorithms, and then to applications.
First, at the hardware level, the Gentetsu C906 processor is the industry's earliest mass-produced vector-extended RISC-V instruction set processor. It adopts 5~8 stage pipeline design, equipped with high performance single and double precision floating point and 128-bit vector operation unit, supporting IN8 / INT16 / INT32 / INT64 and BF16 / FP16 / FP32 / FP64 vector operations in various formats. At the same time, the C906 has also been optimized in data prefetching, using multi-channel and multi-mode data prefetching technology, which can greatly improve the data access bandwidth.
Secondly, at the compilation level, Pingtou brother further optimized the neural network model deployment toolset HHB (Heterogeneous Honey Badger) for the Gentetsu CPU platform and the open source neural network acceleration library CSI-NN2, which better adapted the AI operator to the hardware, so that the Gentetsu CPU achieved AI performance upgrades. At present, HHB and CSI-NN2 have been open sourced.
Third, at the algorithm level, with the help of Alibaba Cloud Aurora Heterogeneous Computing Acceleration Platform SinianML, the neural network of each benchmark has been compressed, distilled, stretched, and searched for network structures, etc., and the computational efficiency of the standard model is obtained in the case of ensuring that the required accuracy target is achieved, and the optimization experience and capabilities of Alibaba IoT, Ant IoT, and Dharma Academy Speech Lab in their respective fields are integrated to further expand the optimization effect of the subdivision field.
Fourth, at the application level, after several years of development, the Gentetsu RISC-V processor has covered various scenarios such as low power consumption, high energy efficiency, and high performance, and supports OpenXuantie's multi-operating systems (AliOS, FreeRTOS, RT-Thread, Linux, Android, etc.), which are widely used in smart home appliances, vehicles, industrial control, edge computing and other fields. This also enables gentetsu RISC-V processors to be continuously optimized for the needs of a variety of different applications.
To sum up, the hardware of Gentetsu RISC-V is more of a processor, while the software is more of an AI compilation framework and a tool for optimizing the upper layer of the network structure. The optimization of soft and hard collaboration benefits from the collaboration between the Alibaba Cloud AI team and the pingtou brother team.
As early as the Yunqi Conference in August 2019, Ali Damo Academy released the first self-developed AI chip for the cloud - the world's strongest AI inference chip Hanguang 800. In the first round of AI inference benchmarks announced by the MLPerf Benchmarking Alliance at that time, the Hanguang 800 achieved single-chip first results in four scenarios of the Resnet50 v1.5 benchmark for image classification tasks.
Meng Jianyi told Xinzhixun: "Alibaba Cloud has rich experience in the optimization of AI compilation capabilities and AI frameworks, and the ability to optimize AI models is very strong. At the level of RISC-V processors, the flathead brother optimizes our hardware structure by optimizing the library, which ultimately achieves better AI capabilities. ”
So can other RISC-V chip manufacturers achieve a catch-up in AI energy efficiency for the flat-headed brother Risc-V C906 through similar software and hardware collaborative optimization?
Meng Jianyi believes that the key to this is whether it can break its existing software and hardware separation system, optimize at the system level, and achieve software and hardware collaboration. As an open architecture system, RISC-V has a natural advantage in soft and hard collaborative optimization, and other manufacturers can optimize it in their respective application scenarios. Of course, there is also a high threshold for this.
"It is necessary to have a deep understanding of the framework, model, etc. of AI in order to optimize the upper layer well, and the upper layer optimization also needs the cooperation of the underlying hardware." This is a system capability, not only look at the hardware and software, but also look at the application. Alibaba's advantage lies in Alibaba Cloud's deep accumulation in this field over the years. Meng Jianyi said.
According to reports, all the current processors based on the Gentetsu 9 series can be upgraded through the software and hardware tools provided by Pingtou Brother, with this soft and hardware collaboration ability to greatly improve the overall AI energy efficiency.
It should be pointed out that at present, including the flat-headed brother Gentetsu RISC-V E902, E906, C906, C910 and other 4 mass-produced processor IP, as well as the neural network model deployment toolset HHB and neural network acceleration library CSI-NN2 for the AI of the Gentetsu RISC-V processor, are completely open source and open. Therefore, on this basis, customers can also do more in-depth optimization by themselves when facing specific fields.
It is understood that in the process of product iteration with customers, the Gentetsu 9 series is basically stable in the instruction architecture and hardware architecture, and will continue to update in the underlying library and the upper layer of application support, resource use, algorithm and hardware architecture matching, to help customers adapt.
Meng Jianyi said: "The Gentetsu RISC-V provides a basic software and hardware capability. As partners continue to understand the scenario in depth, they can do better on top of us. I think RISC-V chips should be able to show more advantages in terms of low power consumption, low cost, high energy efficiency, and AI in the future. ”
Yang Jing also stressed: "The ability of Gentetsu RISC-V soft and hardware collaboration can be copied, and we hope to help customers start from the application in the industry and continuously optimize through the collaborative innovation of software and hardware to further improve energy efficiency." ”
The future of RISC-V in the mobile and server markets
As we pointed out earlier, RISC-V's low power consumption and low cost make it ideal for the IoT market. At present, the development of the RISC-V ecosystem is mainly around the application of various IoTs. However, this does not mean that RISC-V does not have the opportunity to enter the mobile market dominated by arm architecture, and the server market dominated by x86 architecture.
On October 13, 2021, Pingtou Ge announced that its Gentetsu C910 based on RISC-V architecture is successfully compatible with Android systems and can run applications such as Chrome browser. This was also the first time that the industry implemented THE SUPPORT OF THE RISC-V architecture for Android, which means that the RISC-V architecture is expected to break through the barriers of the scene and become a new choice for mobile chip design.
According to The Core Intelligence, Pingtou Brother is currently continuing to promote the development of the Android ecosystem based on RISC-V CPU architecture, and there will be more new progress and releases worth looking forward to in the future.
Although, RISC-V's energy efficiency, low cost and other advantages may bring better experience and cost reduction to mobile devices. However, the disadvantages of RISC-V are also obvious, because it is a new architecture, whether on the mobile side, or on the PC/server side, it lacks ecological support.
In addition, the completeness and processing capabilities of the RISC-V architecture need to be further refined to meet the requirements of the mobile market or the PC/server market. However, as RISC-V CPU cores continue to develop in the direction of high performance, and the addition of various complex functions, it may also make RISC-V CPU cores more and more bloated, power consumption will also increase significantly, and the threshold for development will also be greatly increased. At present, the high-performance RISC-V CPU IP is also mainly provided by some RISC-V developers (such as SiFive and Pingtou Brother) through IP licensing. In this case, the original advantages of RISC-V's instruction reduction, low power consumption, and low cost will no longer be obvious, and it is difficult to compete with the Arm architecture that occupies an absolute ecological advantage in the mobile market, and it is also difficult to compete with the x86 architecture in the server market.
In this regard, Meng Jianyi also admitted that the RISC-V architecture is developing in the field of high performance, and as the architecture system becomes larger and larger, it may indeed lose some advantages such as low power consumption and low cost. However, RISC-V still has great potential to compete with Arm and x86.
"RISC-V is a new future-oriented architecture, its internal use of modular design, although in the Field of IoT has been relatively complete, but if you want to expand to the mobile and data center market, you need to add a lot of corresponding modules, so it must have a small to large development process." When we develop and complete the entire RISC-V architecture today, we will examine the problems existing in the past Arm architecture and x86 architecture, and will critically inherit some of the past experience and improve it in the new RISC-V architecture. Although the entire RISC-V architecture will become larger, it is a new and more complete architecture, at least a new and highly promising new architecture that we can redefine in the industry today. Meng Jianyi said.
It is understood that at present, the RISC-V Foundation has more than 2,000 member companies, and there are as many as 60 or so technical committees within it, which are actively improving RISC-V technology from all dimensions. Among them are not only hardware developers, but also software developers, as well as chip manufacturers in various industries who have been using arm architecture. Even Intel, a major processor manufacturer with x86 architecture, has joined the RISC-V Foundation. At the same time, Intel and AMD have also invested in SiFive, a well-known RISC-V IP vendor.
Meng Jianyi said: "Risc-V's technology development has a global synergy, such an open architecture, so that everyone can participate in it, continue to improve it, and avoid the problems that existed in the past Arm and x86 architectures. This also makes the RISC-V architecture as it continues to be fully functional, and it may become very large in the future, but it is not bloated, and we will keep it lean. I think it's a common goal of those of us who do RISC-V architectures. ”
"At present, the server market is still dominated by x86 architecture, but the development is very fast, and with the addition of Arm, vendors are constantly putting forward new requirements for infrastructure. As a new architecture, RISC-V can quickly follow up with data center requirements, such as new data volumes and changes in new model requirements. Yang Jing further explained: "Compared with x86 and Arm architecture, RISC-V has higher flexibility and openness, and can continue to better adapt to changes with new breakthroughs and growth in the data center." ”
Where does RISC-V IP go from here?
In the past two years, with the popularity of the RISC-V architecture and the pursuit of capital, many RISC-V IP suppliers have emerged in the market, such as Flathead Brother, SiFive, Xinlai, Jingxin, Saifang, etc.
However, from a market point of view, the semiconductor IP market is not large, especially in the field of CPU IP. Even Arm, which monopolizes the CPU IP of the entire mobile market, has a revenue scale of only about $1.98 billion in 2020. Not to mention that there are many open source RISC-V IP on the market, and some terminal manufacturers choose to develop their own RISC-V IP. These will make the future RISC-V IP market or will face extremely fierce competition.
In this regard, Meng Jianyi said that as an open source architecture, RISC-V's business model must be different from x86 and Arm. At present, many mainstream RISC-V IP vendors, including Pingtou Brother, are exploring a new business model.
"Brother Pingtou has actually opened up several of his own IPs, and is gradually moving towards a more open road." We're also exploring how we can benefit our customers by helping them get to RISC-V with capabilities that weren't available in their original technology, which is something we've been trying to do on the road to technology and business innovation. Meng Jianyi said.
In the view of Xin Zhixun, the path taken by Pingtou Brother is not the path taken by a pure RISC-V IP company. Because the research and development of IP requires a lot of investment, and the DEVELOPed IP is open source, this is obviously a loss-making transaction. So why would a flat-headed brother do this? Obviously, Ali attaches more importance to the development of risc-V ecology.
"For every $1 invested in core IP today, the entire ecosystem can increase its output by $20. So we should not be limited to looking at how much money our IP can make, we must participate in the larger ecosystem and help partners explore more commercialization and ecological possibilities. Only when everyone can develop in this ecology and get what they want, can they better help the overall ecology of RISC-V grow, and the Gentetsu processor can mature faster. Yang Jing explained.
Author: Xin Zhixun - Langke Sword