SLS機器學習最佳實戰：批量時序異常檢測

0.文章系列連結

1. 高頻檢測場景

1.1 場景一

叢集中有N台機器，每台機器中有M個時序名額（CPU、記憶體、IO、流量等），若單獨的針對每條時序曲線做模組化，要手寫太多重複的SQL，且對平台的計算消耗特别大。該如何更好的應用SQL實作上述的場景需求？

1.2 場景二

針對系統中的N條時序曲線進行異常檢測後，如何快速知道：這其中有哪些時序曲線是有異常的呢？

2. 平台實驗

2.1 解決一

針對場景一中描述的問題，我們給出如下的資料限制。其中資料在日志服務的LogStore中按照如下結構存儲：

timestamp : unix_time_stamp
machine: name1
metricName: cpu0
metricValue: 50
---
timestamp : unix_time_stamp
machine: name1
metricName: cpu1
metricValue: 50
---
timestamp : unix_time_stamp
machine: name1
metricName: mem
metricValue: 50
---
timestamp : unix_time_stamp
machine: name2
metricName: mem
metricValue: 60

在上述的LogStore中我們先擷取N個名額的時序資訊：

* | select timestamp - timestamp % 60 as time, machine, metricName, avg(metricValue) from log group by time, machine, metricName

現在我們針對上述結果做批量的時序異常檢測算法，并得到N個名額的檢測結果：

* | 
select machine, metricName, ts_predicate_arma(time, value, 5, 1, 1) as res from  ( 
    select
        timestamp - timestamp % 60 as time, 
        machine, metricName, 
        avg(metricValue) as value
    from log group by time, machine, metricName )
group by machine, metricName

通過上述SQL，我們得到的結果的結構如下

| machine | metricName | [[time, src, pred, upper, lower, prob]] |
| ------- | ---------- | --------------------------------------- |

針對上述結果，我們利用矩陣轉置操作，将結果轉換成如下格式，具體的SQL如下：

* | 
select 
    machine, metricName, 
    res[1] as ts, res[2] as ds, res[3] as preds, res[4] as uppers, res[5] as lowers, res[6] as probs
from ( select machine, metricName, array_transpose(ts_predicate_arma(time, value, 5, 1, 1)) as res from  ( 
    select
        timestamp - timestamp % 60 as time, 
        machine, metricName, 
        avg(metricValue) as value
    from log group by time, machine, metricName )
group by machine, metricName )

經過對二維數組的轉換後，我們将每行的内容拆分出來，得到符合預期的結果，具體格式如下：

| machine | metricName | ts | ds | preds | uppers | lowers | probs |
| ------- | ---------- | -- | -- | ----- | ------ | ------ | ----- |

2.2 解決二

針對批量檢測的結果，我們該如何快速的将存在特定異常的結果過濾篩選出來呢？日志服務平台提供了針對異常檢測結果的過濾操作。

select ts_anomaly_filter(lineName, ts, ds, preds, probs, nWatch, anomalyType)

其中，針對anomalyType有如下說明：

0：表示關注全部異常
1：表示關注上升沿異常
-1：表示下降沿異常

其中，針對nWatch有如下說明：

表示從實際時序資料的最後一個有效的觀測點開始到最近nWatch個觀測點的長度。

具體使用如下所示：

* | 
select 
    ts_anomaly_filter(lineName, ts, ds, preds, probs, cast(5 as bigint), cast(1 as bigint))
from
( select 
    concat(machine, '-', metricName) as lineName, 
    res[1] as ts, res[2] as ds, res[3] as preds, res[4] as uppers, res[5] as lowers, res[6] as probs
from ( select machine, metricName, array_transpose(ts_predicate_arma(time, value, 5, 1, 1)) as res from  ( 
    select
        timestamp - timestamp % 60 as time, 
        machine, metricName, 
        avg(metricValue) as value
    from log group by time, machine, metricName )
group by machine, metricName ) )

通過上述結果，我們拿到的是一個Row類型的資料，我們可以使用如下方式，将具體的結構提煉出來：

* | 
select 
    res.name, res.ts, res.ds, res.preds, res.probs 
from
    ( select 
        ts_anomaly_filter(lineName, ts, ds, preds, probs, cast(5 as bigint), cast(1 as bigint)) as res
    from
        ( select 
            concat(machine, '-', metricName) as lineName, 
            res[1] as ts, res[2] as ds, res[3] as preds, res[4] as uppers, res[5] as lowers, res[6] as probs
          from ( 
                select 
                    machine, metricName, array_transpose(ts_predicate_arma(time, value, 5, 1, 1)) as res 
                from  ( 
                    select
                        timestamp - timestamp % 60 as time, 
                        machine, metricName, avg(metricValue) as value
                    from log group by time, machine, metricName )
                group by machine, metricName ) ) )

通過上述操作，就可以實作對批量異常檢測的結果進行過濾處理操作，幫助使用者更好的批量設定告警。

3.硬廣時間

3.1 日志進階

這裡是日志服務的各種功能的示範

日志服務整體介紹，各種Demo

更多日志進階内容可以參考：

日志服務學習路徑

。

3.2 聯系我們

糾錯或者幫助文檔以及最佳實踐貢獻，請聯系：悟冥

問題咨詢請加釘釘群：

SLS機器學習最佳實戰：批量時序異常檢測

0.文章系列連結

1. 高頻檢測場景

1.1 場景一

1.2 場景二

2. 平台實驗

2.1 解決一

2.2 解決二

3.硬廣時間

3.1 日志進階

3.2 聯系我們

繼續閱讀

更改LYNC SIP位址

SQL優化SQL語句優化的目的

JAVA高效程式設計指南

Storm編譯打包過程中遇到的一些問題及解決方法

ansible配置檔案說明及ad hoc指令

關于SQL語言

vsftpd dead but subsys locked 的解決方法

SQL語言基礎：常用的資料查詢語句

Shell程式設計——sort排序、uniq忽略重複、tr替換壓縮删除、cut指定删除字段、正規表達式元字元sort 指令uniq 指令tr 指令cut 指令正規表達式

Linxu常用指令技巧彙總

httpd服務的部署、啟動、配置和簡單優化一、部署二、啟動三、配置檔案

《Linux指令行與Shell腳本程式設計大全第2版.布盧姆》pdf

nginx 安裝錯誤資訊解決

neo4j之cypher使用文檔

Ambari介紹和架構原理

sqlServer根據經緯查距離