laitimes

From manual analysis to intelligent analysis, how to quickly get started with traffic analysis?

author:Zhiwei Data
From manual analysis to intelligent analysis, how to quickly get started with traffic analysis?

The IT informatization construction of various industries in China is in a stage of rapid development, with the rapid increase in business volume, with the large concentration of data and the rapid launch of business systems, the operation and maintenance department as a business network guarantee department is facing great challenges and pressures. As various critical services and applications are hosted on infrastructure, web applications, middleware, and databases, and business systems need to achieve fast, flexible, and on-demand pluggable deployment capabilities, the complexity and maintenance difficulty of services are greatly increased. How to effectively monitor and prevent risks of these complex business systems, ensure high performance and high availability of key businesses, and how to optimize existing O&M processes to continuously improve management and O&M levels has become a new problem.

Taking an enterprise as an example, the data center of the enterprise deploys a large number of load balancing devices, and often a large number of Limiting closed port RST Respond alarm information appears. Therefore, locating faults always takes a lot of time, resulting in low troubleshooting efficiency. The specific difficulties are as follows:

  • The business operation environment is becoming more and more complex, fault location is slow, various business systems are increasing, the system has high dependence on related resources, and once any problems occur in the system, they need to be investigated one by one, and fault location is difficult;
  • Maintenance personnel face a large number of repetitive and manual troubleshooting work every day, which is not only time-consuming and laborious, but also prone to errors, and urgently needs new means to help improve efficiency.
  • The daily O&M workflow is chaotic, or there is no standard process, resulting in low work efficiency, and customer complaints and complaints continue to decrease;

In order to solve the above O&M pain points, nCompass traffic analysis platform starts from the actual situation of users, takes data as the starting point to carry out business-oriented data visualization intelligent analysis, and solves some practical problems faced in O&M management from six aspects: intelligent alarm, data traceback, data analysis, fault location, analysis process code, and intelligent analysis.

An alarm appears

F5 devices have a large number of Limiting closed port RST Respond alert messages.

From manual analysis to intelligent analysis, how to quickly get started with traffic analysis?

【Warning Intention】

As can be seen from the above figure, a large number of Limiting closed port RST Respond alarm information has appeared on the F5 device of an enterprise, and the alarm information has been continuing, and the operation and maintenance personnel have no way to start.

Data backtracking

nCompass uses data visualization to filter the combination of dimension indicators to backtrace the problem, and can query the Reset packet information and the number of Reset packets in each VLAN. The filtered table shows which VLAN has the highest number of Resets.

From manual analysis to intelligent analysis, how to quickly get started with traffic analysis?

【Dimension selection diagram】

From manual analysis to intelligent analysis, how to quickly get started with traffic analysis?

【Indicator selection diagram】

From manual analysis to intelligent analysis, how to quickly get started with traffic analysis?

【Data table diagram】

Through the "Data Table Schematic" in the figure above, it can be seen that after selecting dimensions and indicators in the nCompass data table, all relevant data can be displayed in the data table, and the number of VLAN2007 REST is the highest through the displayed data.

data analysis

Next, the VLANs with the highest number of Reset are drilled down by adding dimensions in the table, and it can be seen that there are no obvious abnormal IP addresses in the customer-side Reset packet, and the server-side Reset column finds that 0.16 server-side Reset accounts for more than 80%.

From manual analysis to intelligent analysis, how to quickly get started with traffic analysis?

【Client drill-down diagram】

From manual analysis to intelligent analysis, how to quickly get started with traffic analysis?

【Server side drill-down diagram】

After obtaining the highest number of VLAN2007 REST, we need to drill down to analyze it, first drill down the client, through the above figure "client drill-down diagram" can be seen that although the total number of client REST is too high, but on average after each client, only a few or dozens, so you can first exclude the client exception. Next, we drill down to the server, and find that the server-side REST indicator column reaches 1,565,194 through the "server-side drill-down diagram" above, and its corresponding IP is *.*.0.16, which can be concluded that the server-side *.*.0.16 is an abnormal IP.

Then through nCompass DNS resolution, it is found that 0.16 corresponds to the domain name telemetry.******** .com non-company domain name.

From manual analysis to intelligent analysis, how to quickly get started with traffic analysis?

【DNS Resolution Diagram】

After obtaining the abnormal IP, we can bring it into the nCompass built-in view DNS resolution, through the "DNS resolution diagram" can be seen that the abnormal IP *.*.0.16 corresponding domain name is telemetry.*******.com, after DNS and CMDB troubleshooting telemetry.******* .com not the company's normal domain name.

Fault location

After adding 0.16 to the blacklist for interception through F5, the overall server-side Reset decreased significantly, and the alarm trend related to Limiting closed port RST Response decreased significantly. It can be seen that the reason for the sudden increase in the number of alerts related to Limiting closed port RST Respond is caused by 0.16.

From manual analysis to intelligent analysis, how to quickly get started with traffic analysis?

【Alarm trend diagram】

After the access of the abnormal IP address is blocked, it can be seen from the "alarm trend diagram" in the above figure that the alarm trend decreases significantly.

The analysis process is coded

As a new generation of data visualization intelligent analysis platform, nCompass not only has visual analysis capabilities, which can quickly locate faults, but also supports the experience of operation and maintenance personnel to analyze problems, and retains individual experience in the system knowledge base.

From manual analysis to intelligent analysis, how to quickly get started with traffic analysis?

【Code-based diagram】

As shown in the "Code-based Schematic Diagram" in the figure above, nCompass can code the analysis experience after each problem solving, and when similar problems occur, it can be directly called, one-click analysis of the problem, and improve O&M efficiency. If you are a programming expert or encounter complex analysis scenarios, the product also provides a Python editor to implement complex data analysis scripts.

Intelligent analytics

When an alarm is generated, the system automatically calls the intelligent knowledge base to provide a detailed analysis report of the fault. The report includes analysis objects, abnormal phenomena, analysis conclusions, troubleshooting specific commands, follow-up suggestions, etc., and provides detailed data analysis process data to support the analysis conclusions and provide decision-making support for O&M personnel to dispose of the next step.

From manual analysis to intelligent analysis, how to quickly get started with traffic analysis?

【Intelligent analysis report diagram】

nCompass adopts the collection method of multi-source data, which can not only analyze based on traffic, but also support docking with DNS, CMDB, etc., to achieve deeper correlation analysis. As shown in the "Intelligent Analysis Report Schematic Diagram" in the figure above, the analysis object is VLAN2007, and on August 22, an abnormal REST log phenomenon occurred, we not only analyze it through traffic, but also determine the domain name as an abnormal domain name through the automatic call of DNS and CMDB, and give reasonable suggestions for the phenomenon.

nCompass traffic analysis platform realizes the transformation from manual analysis to intelligent analysis for O&M personnel from six aspects: alarm occurrence, data backtracking, data analysis, fault location, analysis process code, and intelligent analysis, while providing a large amount of data support, it also provides O&M personnel with one-click fault analysis capabilities, greatly reducing the dependence on professional experience in some specific fields during O&M data analysis, and improving the overall troubleshooting efficiency of the team. It truly solves the problems of difficult daily O&M fault location, heavy O&M work, high repetition, and no standard process for O&M personnel, and realizes "simple, fast, one-click output analysis results" for O&M faults.

(Note: The pictures in this article are demo data demonstrations, and do not have any authenticity, if you have any questions or doubts about the content of the manuscript, please contact us.) )