logstash 從elasticsearch中以csv的資料格式到hdfs

2023-06-29 03:48:53

logstash是什麼就不介紹了，具體看代碼

1.input為elasticsearch

input {
  elasticsearch {
    hosts => "192.168.1.16:9200"  //這裡是你es的IP位址和端口号
    index => "position"      //索引名
    size => 10000      //每次刷入的量
     query => '{"query":{"bool":{"disable_coord":false,"adjust_pure_negative":true,"boost":1}},"_source":{"includes":["_id","ent_status","formatted_address","dom","city","adcode","level","ent_type","city_code","data_date","update_date","pripid","province","entname","district","location"]}}'     //需要查詢的條件
    scroll => "5m"
    docinfo => true
  }
}

2.filter對input進來的資料做操作

資料格式如下圖

logstash 從elasticsearch中以csv的資料格式到hdfs

filter {
//上述中lat和lon是裡層的資料，需要對location做處理，新增一個屬性，json化一下這樣就和外層的資料在一個級别，便于我後期取值
  mutate {
   add_field => {
     "local_value" => "%{location}"
     }
  }
  json {
  source => "local_value"
  remove_field => ["location","local_value"]
  } 
  //取出來的資料如果有的沒有這個字段，後期取值回事%{data_date},為了避免如果沒有這個字段的我們新增一個屬性
 if![data_date]{
   mutate {
    add_field => {
      "data_date" => ""
     }
   }
 }
}

3.output，以csv的格式放到hdfs上

output {
  webhdfs { 
    host => "192.168.100.11"
    port => 2222
    user => hadoop
    flush_size => 5000
    idle_flush_time => 10 
    path => "/tmp/position/test-%{+YYYY}-%{+MM}-%{+dd}/position-%{+HH}.csv"
    //以下為需要的字段，且字段之間使用\u0001來分隔
    codec => plain {
     format => "%{[@metadata]_id}%{entname}%{pripid}%{ent_type}%{ent_status}%{adcode}%{city_code}%{formatted_address}%{district}%{province}%{city}%{lon}%{lat}%{level}%{dom}%{data_date}%{update_date}"
  }
}
}

logstash 從elasticsearch中以csv的資料格式到hdfs

繼續閱讀

k8s部署es叢集和kibana

ElasticSearch：部署ElasticSearch & Kibana

ES分詞插件IK Analyzer安裝

【elasticsearch】The number of object passed must be even but was [1]1.概述

跟據經緯度實作附近搜尋Java實作

【最新 v7.9】Elasticsearch的基本概念與配置

圖解elasticsearch的_source、_all、store和index

深入elasticsearch源碼之環境搭建

elasticsearch 的 Percolator操作

es使用項目中遇到的問題

15.profile-api

【轉】ElasticSearch是什麼以及應用場景

ElasticSearch是什麼以及應用場景ES是如何産生的？ES 基礎一網打盡ES特點和優勢為什麼要用ES？ES的應用場景是怎樣的？

延雲行業搜尋資料庫在大資料生态中位置和重要性大資料的挑戰大資料技術的現狀延雲行業搜尋資料庫

尚矽谷—韓順平—圖解 Java設計模式（結構型）（55～）

30天了解30種技術系列---(10)面向Cloud的搜尋引擎 ElasticSearch