天天看點

Rabbitmq 監控

系統層面

- CPU  
    - user, system, iowait & idle percentages
- MEM
    - used, buffered, cached & free percentages
- Virtual Memory 
    - dirty page flushes, writeback volume
- Disk I/O
    - operations & amount of data transferred per unit time, time to service operations
- Free disk space
    - node data directory
- File descriptors
    - beam.smp vs. max system limit
- TCP connections
    - ESTABLISHED, CLOSE_WAIT, TIME_WAIT
- Network throughput
    - bytes received, bytes sent, maximum network throughput
- Network latency
    - between all RabbitMQ nodes in a cluster as well as to/from clients
      

Rabbitmq層面

叢集監控

可以從任一節點擷取叢集監控資料

API:GET /api/overview

名額 JSON field name
叢集名稱 cluster_name
叢集範圍的消息速率 message_stats
連接配接總數 object_totals.connections
channel總數 object_totals.channels
隊列總數 object_totals.queues
消費者總數 object_totals.consumers
消息總數(ready+unacked) queue_totals.messages
準備傳遞的消息數量 queue_totals.messages_ready
未确認的消息數量 queue_totals.messages_unacknowledged
最近釋出的消息數量 message_stats.publish
消息釋出的速率 message_stats.publish_details.rate
最近發送給消費者的消息數量 message_stats.deliver_get
消息傳遞速率 message_stats.deliver_get.rate

節點監控

擷取節點資訊的API:

GET /api/nodes/{node} 傳回單個節點的狀态

GET /api/nodes 傳回所有叢集成員的統計資訊

使用的記憶體總量memory used mem_used
記憶體使用門檻值 mem_limit
當記憶體使用超過門檻值時将觸發報警memory alarm mem_alarm
剩餘磁盤空間門檻值 disk_free_limit
當空閑磁盤空間低于配置的限制時,将觸發報警 disk_free_alarm
可用檔案描述符總數 fd_total
目前使用的檔案描述符 fd_used
嘗試打開的檔案描述符數量 io_file_handle_open_attempt_count
socket可用 sockets_total
已經使用的socket數量 sockets_used
Message store disk reads message_stats.disk_reads
Message store disk writes message_stats.disk_writes
Inter-node communication links cluster_links
GC runs gc_num
gc回收的位元組 gc_bytes_reclaimed
erlang程序限制 proc_total
已經使用erlang程序 proc_used
正在運作的隊列 run_queue

單個隊列監控

API位址: GET /api/queues/{vhost}/{qname}

記憶體 memory
消息總數(ready+unacknowledged) messages
messages_ready
messages_unacknowledged
消息釋出速度
最近傳遞的消息數量
消息傳遞速度
其他消息狀态 this document

健康檢查

叢集中是否有資源報警  rabbitmq-diagnostics -q alarms  

檢視 rabbitmq是否正常運作(沒有stop_app或pause)rabbitmq-diagnostics check_running  

檢查目前節點是否有報警,如果有,将以非零狀态退出 rabbitmq-diagnostics check_local_alarms     
      
- CPU  
    - user, system, iowait & idle percentages
- MEM
    - used, buffered, cached & free percentages
- Virtual Memory 
    - dirty page flushes, writeback volume
- Disk I/O
    - operations & amount of data transferred per unit time, time to service operations
- Free disk space
    - node data directory
- File descriptors
    - beam.smp vs. max system limit
- TCP connections
    - ESTABLISHED, CLOSE_WAIT, TIME_WAIT
- Network throughput
    - bytes received, bytes sent, maximum network throughput
- Network latency
    - between all RabbitMQ nodes in a cluster as well as to/from clients
      

叢集中是否有資源報警  rabbitmq-diagnostics -q alarms  

檢視 rabbitmq是否正常運作(沒有stop_app或pause)rabbitmq-diagnostics check_running  

檢查目前節點是否有報警,如果有,将以非零狀态退出 rabbitmq-diagnostics check_local_alarms