laitimes

5 ali common code detection recommendation | Alibaba DevOps Practical Guide

Summary: As businesses evolve and teams expand, software scale and call links become more complex. If there is no good code detection mechanism, relying only on functional verification, the team's technical debt will be more and more expensive, and the development team often has to spend a lot of time and energy to find and modify code defects, which will eventually drag down the iteration progress, collaboration efficiency, and even cause serious security issues.

Author: Yang Yu, Alibaba Cloud Cloud Algorithm Expert

Article Source: Alibaba DevOps Practice Guide (Guide Download: Alibaba DevOps Practice Guide)

In the daily research and development process, the code asset problems we usually face are mainly divided into two categories: code quality problems and code security vulnerabilities.

Code quality issues

Code quality is a cliché, but the problem is that everyone knows it's important, but they don't know how to promote and maintain the team's common property. On the one hand, developers may neglect to control the quality in order to go online in time for the function, on the other hand, developers have different coding habits and program understanding styles. In the long run, the decline in code quality is usually self-causal, tending to decline because of the great business pressure, and thus the development efficiency decreases, further increasing the business pressure, resulting in a vicious circle.

Code security issues

Security problems are often hidden in coding logic that lacks security awareness and open source dependent components that are not detected or maintained, and are difficult to detect in a timely manner in daily development and code reviews.

Code security issues can also be analyzed in two ways:

  1. Coding security issues, i.e., security specification problems, reduce the occurrence of privacy data leakage, injection risks, and security policy vulnerabilities by avoiding non-compliant code entering the enterprise code base.
  2. Dependency security issues, i.e. open source relies on security vulnerabilities introduced by third-party components. According to synopsys 2020 open source security reports, more than 99% of organizations use open source technologies. The advantages of using open source components to bring about technical exchanges and standing on the shoulders of giants, reducing development costs, accelerating iteration cycles, and improving software quality do not need to be repeated, but open source software brings a series of conveniences while also hiding a large number of security risks, according to audits, 75% of the code inventory is in security vulnerabilities, of which 49% contain high-risk problems, and another 82% of the code base is still using outdated components for more than 4 years.

Therefore, code security problems, on the one hand, also need to carry out admission checks, and configure secure coding specification detection and card points according to business scenarios and specifications. On the other hand, regular maintenance is required to detect and repair newly erupted security vulnerabilities in a timely manner.

solution

Code quality inspection

Tool 1: Java code specification detection

In the process of Alibaba's practice, due to historical barriers and differences in business styles, the engineering structure of various organizations is very different, the code style is very different, the specifications are different, the communication cost is large, the cooperation efficiency is low, and the maintenance cost is high. The development of the group to the current scale requires the iterative and intensive development of the professional technical group army, rather than repeated wheel building, and the truly professional team will definitely have a unified development protocol, which represents efficiency, resonance, feelings and sustainability.

Based on the above background, Alibaba has formulated the Alibaba Java Development Manual, which serves as a development specification for Alibaba's internal Java engineers, covering programming protocols, unit testing protocols, exception log protocols, MySQL specifications, engineering specifications, and security specifications. This is the summary of the experience of nearly 10,000 Ali Java technical elites, and has undergone many large-scale front-line combat tests and improvements.

Traffic regulations are ostensibly designed to restrict the right to drive, but in fact to protect the safety of the public. Imagine who would dare to hit the road without speed limits, without traffic lights, without right-hand driving clauses. In the same way, for software, the development protocol is by no means to eliminate the creativity and elegance of the code content, but to limit excessive personalization, promote relative standardization, and do things together in a universally accepted way.

Therefore, the goal of the code specification is to:

  1. Code production efficiency: unified standards, improve communication efficiency and R & D effectiveness.
  2. Code out quality: prevent problems before they occur, improve quality awareness and system maintainability, and reduce failure rate.
  3. Code out of the feelings: the spirit of craftsmanship, the pursuit of the ultimate spirit of excellence, polishing the boutique code.

Code specifications are deeply integrated into Alibaba's various development activities through tools such as IDE detection plug-ins, pipeline integration tests, and code review integration. At the same time, in the cloud-based code hosting platform Codeup, java code specification detection capabilities are also integrated, providing developers with more convenient and rapid checks during the code submission and code review stages.

Tool 2: Code Intelligent Patch Recommendations

Defect detection and patch recommendation have been a problem in software engineering for decades, and one of the most concerned issues for researchers and front-line developers, and the defects mentioned here are not network vulnerabilities, system defects, but hidden in the code. Helping developers identify and fix these flaws can greatly improve software quality.

Based on the more popular defect detection methods in the industry and academia, and to analyze and circumvent its limitations, alibaba codeup algorithm engineers have proposed a new algorithm to achieve more accurate and efficient analysis of code defects and recommend optimization solutions, which has been included in the International Conference on Software Engineering (ICSE).

5 ali common code detection recommendation | Alibaba DevOps Practical Guide
  1. Find a repair commit based on the keyword in the commit message, and only take commits that involve fewer than 5 files (commits involving too many files may dilute the repair behavior). This step relies heavily on the developer's good commit habits, and I hope that the developer can use the commit well and write the message.
  2. Extracting deletions and additions at the file level from these repair commits, namely Defect and Patch pairs (DP Pair), is a noisy step.
  3. Cluster similar defects and patch codes together with an improved DBSCAN approach to clustering both buggy and patch pairs. By clustering similar defects and fixes, a lot of noise left by the previous step is reduced, and the mistakes made by everyone in the historical code submission have strong reference significance.
  4. Use a self-developed template extraction method to summarize defect code and patch code, and adapt the context according to different variables.

The code patch recommendation service is currently applied to the code scanning scenario of the merge request, detecting the optimizable code fragments and giving optimization suggestions during the code review process, and continuously improving the quality of enterprise code by precipitating the manual experience in the historical review.

Code security detection

Tool 3: Sensitive Information Detection

In recent years, there have been a number of incidents in the industry where sensitive information (API Key, Database credential, OAuth token, etc.) has been unconsciously leaked through certain sites, bringing security risks and even direct economic losses to enterprises.

In the course of our practice, we also face similar problems, hard-coded problems appear very frequently, and lack of effective identification mechanisms. Therefore, developers and business managers urgently need a stable and sound sensitive information detection method and system. Through the survey, we learned that most of the existing sensitive information detection tools simply use rule matching or information entropy technology, resulting in their recall rates or accuracy being difficult to meet expectations. Therefore, on the basis of rule matching and information entropy technology, combined with context semantics, we propose a sensitive information detection tool using a multi-layer detection model - SecretRadar.

5 ali common code detection recommendation | Alibaba DevOps Practical Guide

SecretRadar's technical implementation ideas are mainly divided into three layers, the first layer uses rule matching, which is a traditional sensitive information recognition technology, rule matching has good accuracy and scalability, but it is very dependent on the more solidified length, prefix, variable name, it is difficult to deal with different coding styles of different developers, easy to cause false negatives. For scenes that are difficult to capture with fixed rules, we use an information entropy algorithm at the second layer. The information entropy algorithm is used to measure the degree of code line confusion and is good for randomly generated keys and random identity information recognition. But the information entropy algorithm also has its limitations, and the increase in recall rates is accompanied by an increase in false positives. Therefore, in the third layer, we use methods such as template clustering and contextual semantic analysis to filter and optimize, extract common keywords for the aggregation of information entropy results, and combine contextual semantics and current syntax structure to improve the accuracy of the model.

The sensitive information detection tool not only serves our internal development students, but also supports more than 20,000 code bases and 3,000 enterprises on the cloud effect platform, helping developers solve more than 90,000 hard-coded problems.

Tool 4: Source code vulnerability detection

Alibaba uses the Sourcebrella Pinpoint source umbrella detection engine to detect source code vulnerabilities, mainly involving injection risk and security policy risk detection.

The source umbrella detection engine is the result of the technical research of the Prism Research Group of the Hong Kong University of Science and Technology over the past decade. The engine absorbs the international research results of nearly a decade of software verification technology, and improves and innovates, independently designing and implementing a set of technology-leading software verification system. Its main verification method is to translate the programming language into mathematical expressions such as first-order logic and linear algebra, and reason about the cause of defects through formal verification techniques. So far, a total of four core technology related papers, one PLDI and three ICSE papers have been published, and research students can click on the links below to read:

  • Pinpoint: Fast and Precise Sparse Value Flow Analysis for Million Lines of Code [1]http://t.tb.cn/0qxIpFV5sRD5uxOcgED7o
  • SMOKE: Scalable Path-Sensitive Memory Leak Detection for Millions of Lines of Code [2]http://t.tb.cn/2l96Jh2yqOGowsfs4oVk2m
  • Pipelining Bottom-up Data Flow Analysis [3]https://qingkaishi.github.io/public_pdfs/ICSE2020a.pdf
  • Conquering the Extensional Scalability Problem for Value-Flow Analysis Frameworks [4]https://qingkaishi.github.io/public_pdfs/ICSE2020b.pdf

The source umbrella detection engine is able to find defects hidden for more than 10 years in large open source projects with a high degree of activity, such as MySQL detection [5], which cannot be scanned by other inspection tools on the market, and can complete the detection of 2 million rows of large open source projects in 1.5 hours. While maintaining the high efficiency of the scan, the false alarm rate can also be controlled to around 15%. For complex and bulky analysis projects, the scanning efficiency and false positive rate of the source umbrella detection engine are also at the leading level in the industry.

"Source Code Vulnerability Detection" integrates the security analysis capabilities of the source umbrella detection engine, which can obtain better analysis results in terms of analysis accuracy, speed and depth, and has the core advantages:

  1. Supports analysis of bytecode, and the code logic of the two or three-party package will not be missed;
  2. Good at logical analysis of long call links across functions;
  3. Can handle indirect data modification caused by references, pointers, etc.;
  4. High accuracy, better than similar tools such as Clang, Infer, in terms of accuracy and effective problem recognition;
  5. Good performance, the current single application average of about 5 minutes to complete the analysis;
5 ali common code detection recommendation | Alibaba DevOps Practical Guide

The source umbrella detection engine can accurately track the data flow direction in the code, has a high-depth and high-precision function call chain analysis capability, and can find depth problems across multi-layer functions. While discovering defects, it can also give the process triggered by the problem, and completely display the relevant control flow and data flow, which can assist developers to quickly understand and repair the problem, improve software quality at a lower cost in the early stage of software development, greatly reduce production costs, and improve R&D efficiency.

Tool 5: Dependency package vulnerability detection

We expect to establish an effective detection and management mechanism for developers for the security trustworthiness of open source components, so we have implemented the dependency package vulnerability detection service and the dependency package security problem report. In practice, developers generally report that the cost of fixing dependency package vulnerabilities is mostly higher than fixing their own coding vulnerabilities, so they are unwilling or difficult to deal with such problems. The reason is that, on the one hand, most of the vulnerabilities are not introduced directly, but the third-party components that rely on them indirectly depend on other components, and on the other hand, it is not certain which version is clean and compatible.

5 ali common code detection recommendation | Alibaba DevOps Practical Guide

In order to reduce the difficulty of developers to repair, we have further identified and analyzed the reference relationship of dependencies, clearly marked direct and indirect dependencies, and located specific dependency package introduction files, so that developers can quickly find key problem locations. At the same time, through the aggregation of vulnerability data, intelligently recommend the version upgrade suggestions for fixing vulnerabilities, because one dependency may correspond to multiple vulnerability problems, and developers can evaluate whether to accept adoption for the recommendations. By analyzing API changes and code call links between versions, measuring the cost of version upgrades, and automatically creating remediation reviews for developers, we help developers maintain code security more efficiently.

Whether it is code quality detection or code security detection, the above 5 Alibaba code automatic detection tools can be experienced for free in cloud-based Codeup. Experience Path: Create/Import Code Base -- Enter Code Base -- Click Settings -- Click Integration Services - Enable Code Detection.

5 ali common code detection recommendation | Alibaba DevOps Practical Guide

【About Cloud Effect】

Cloud effect, a one-stop BizDevOps platform in the cloud-native era, supports multiple deployment forms of public cloud, proprietary cloud and hybrid cloud, and helps innovative and entrepreneurial and digital transformation enterprises to quickly achieve R&D agility and organizational agility through cloud-native new technologies and new R&D models, create a "double-sensitive" organization, and achieve 10 times performance improvement.

Experience Now: Alibaba Cloud Cloud Effects_Cloud Effects_New DevOps Platform for the Cloud Native Era - Alibaba Cloud

Read on