天天看點

StatsD 五種類型資料發送形式拟測試StatsD Metric Types

        statsd 五種類型資料發送形式拟測試

StatsD Metric Types

Counting

gorets:1|c
           

This is a simple counter. Add 1 to the "gorets" bucket. At each flush the current count is sent and reset to 0. If the count at flush is 0 then you can opt to send no metric at all for this counter, by setting 

config.deleteCounters

 (applies only to graphite backend). Statsd will send both the rate as well as the count at each flush.

Sampling

gorets:1|c|@0.1
           

Tells StatsD that this counter is being sent sampled every 1/10th of the time.

Timing

glork:320|ms|@0.1
           

The glork took 320ms to complete this time. StatsD figures out percentiles, average (mean), standard deviation, sum, lower and upper bounds for the flush interval. The percentile threshold can be tweaked with 

config.percentThreshold

.

The percentile threshold can be a single value, or a list of values, and will generate the following list of stats for each threshold:

stats.timers.$KEY.mean_$PCT
stats.timers.$KEY.upper_$PCT
stats.timers.$KEY.sum_$PCT
           

Where 

$KEY

 is the stats key you specify when sending to statsd, and 

$PCT

 is the percentile threshold.

Note that the 

mean

 metric is the mean value of all timings recorded during the flush interval whereas 

mean_$PCT

 is the mean of all timings which fell into the 

$PCT

 percentile for that flush interval. And the same holds for sum and upper. Seeissue #157 for a more detailed explanation of the calculation.

If the count at flush is 0 then you can opt to send no metric at all for this timer, by setting 

config.deleteTimers

.

Use the 

config.histogram

 setting to instruct statsd to maintain histograms over time. Specify which metrics to match and a corresponding list of ordered non-inclusive upper limits of bins (class intervals). (use 

inf

 to denote infinity; a lower limit of 0 is assumed) Each 

flushInterval

, statsd will store how many values (absolute frequency) fall within each bin (class interval), for all matching metrics. Examples:

  • no histograms for any timer (default): 

    []

  • histogram to only track render durations, with unequal class intervals and catchall for outliers:
    [ { metric: 'render', bins: [ 0.01, 0.1, 1, 10, 'inf'] } ]
               
  • histogram for all timers except 'foo' related, with equal class interval and catchall for outliers:
    [ { metric: 'foo', bins: [] },
        { metric: '', bins: [ 50, 100, 150, 200, 'inf'] } ]
               

Statsd also maintains a counter for each timer metric. The 3rd field specifies the sample rate for this counter (in this example @0.1). The field is optional and defaults to 1.

Note:

  • first match for a metric wins.
  • bin upper limits may contain decimals.
  • this is actually more powerful than what's strictly considered histograms, as you can make each bin arbitrarily wide, i.e. class intervals of different sizes.

Gauges

StatsD now also supports gauges, arbitrary values, which can be recorded.

gaugor:333|g
           

If the gauge is not updated at the next flush, it will send the previous value. You can opt to send no metric at all for this gauge, by setting 

config.deleteGauges

Adding a sign to the gauge value will change the value, rather than setting it.

gaugor:-10|g
gaugor:+4|g
           

So if 

gaugor

 was 

333

, those commands would set it to 

333 - 10 + 4

, or 

327

.

Note:

This implies you can't explicitly set a gauge to a negative number without first setting it to zero.

Sets

StatsD supports counting unique occurences of events between flushes, using a Set to store all occuring events.

uniques:765|s
           

If the count at flush is 0 then you can opt to send no metric at all for this set, by setting 

config.deleteSets

.

Multi-Metric Packets

StatsD supports receiving multiple metrics in a single packet by separating them with a newline.

gorets:1|c\nglork:320|ms\ngaugor:333|g\nuniques:765|s
           

Be careful to keep the total length of the payload within your network's MTU. There is no single good value to use, but here are some guidelines for common network scenarios:

  • Fast Ethernet (1432) - This is most likely for Intranets.
  • Gigabit Ethernet (8932) - Jumbo frames can make use of this feature much more efficient.
  • Commodity Internet (512) - If you are routing over the internet a value in this range will be reasonable. You might be able to go higher, but you are at the mercy of all the hops in your route.

          而在國外基于 StatsD 産生了一系列的工具,或者在成熟的項目基礎之上,開始相容 StatsD。如果按照方向可以劃分為如圖的幾個方向。

StatsD 五種類型資料發送形式拟測試StatsD Metric Types

有了資料和資訊可以做很多事,包括資料內建、可視化、可視化+存儲、事件流,甚至将這些結合做出一體化解決方案,針對不同的需求,不同的市場,每一個方向都能産生獨特價值。接下來我們大緻介紹一下這幾個方向。

Integrations

StatsD 本身并不負責定義名額的涵義,是以如果要從資料庫或者作業系統中采集資料,需要進行腳本的開發。其中在這方面做出突出貢獻的是 Datadog。Datadog 開發的 dd-agent 項目在 GitHub 多達 150 個貢獻者,相容 60 多種作業系統、中間件、資料庫。

StatsD 五種類型資料發送形式拟測試StatsD Metric Types

除此之外,Librato 和 App First 也加入到 StatsD 的陣營中。而基礎設施管理的解決方案:Puppet 和 Chef 也開始相容将 StatsD 批量安裝到基礎設施中。

Visualization & Data Hosting

光有資料是不夠的,良好的可視化才能将資料的作用發揮出來。在可視化這一塊影響力較大的 Graphite 作為一個可視化的控件,不僅包含可視化還自帶存儲的部分。但也有不少人反映 Graphite 自帶的界面太難看,得益于開源世界的偉大,我們有了 Grafana 可用,直接部署在nginx上面就行,使用node.js 實作的資料抓取。單論可視化,Grafana 是做得最好的一家,其展現形式豐富,可配置項目巨細靡遺。Signal FX 後來居上,也參與到競争中。

StatsD 五種類型資料發送形式拟測試StatsD Metric Types

在資料可視化的基礎之上,也有服務開始從事可視化資料的托管服務。例如:Host Graphite。

時間序列資料庫和事件處理引擎

其實 StatsD 和時間序列資料庫的出現,是相輔相成的。在 OpenTSDB 和 InfluxDB 基礎之上,StatsD 的應用才日漸豐滿。InfluxDB 是一個開源分布式時序、事件和名額資料庫,使用 Go 語言編寫,無需外部依賴。對于運維工程師而言,OpenTSDB 可以擷取基礎設施和服務的實時狀态資訊,展示叢集的各種軟硬體錯誤,性能變化以及性能瓶頸。

再說說事件處理引擎,比如 Bosun 是一個新型的監控和告警系統,使用 golfing 編寫,支援定義複雜的告警規則,支援 OpenTSDB、Graphite、Logstash-Elasticsearch 等資料源。Riemann 也開始與時間序列資料庫,或者基于 StastD 的一體化解決方案對接,來彌補一些資料展現産品在報警這個方向上的不足。

一體化解決方案

那麼,有沒有能包含資料內建、可視化、資料存儲、事件流處理于一體的解決方案呢?對于中小型企業尤其創業公司來說,自主開發或者利用現有的開源工具進行監控或多或少都會遇到一些問題,既要考慮成本又怕踩坑。這時候除開上述細分的方向之外,提供一體化解決方案的廠商及時出現了。國外這樣的廠商有 Datadog、Librato 等等。其中 Datadog 在國外擁有 Facebook、Airbnb 等重量級客戶,正大展風頭。

繼續閱讀