天天看點

ES 基本使用《一》--分析

1. 建立索引

建立測試資料資料來源官方文檔:

PUT /megacorp/employee/1
{
    "first_name" : "John",
    "last_name" :  "Smith",
    "age" :        25,
    "about" :      "I love to go rock climbing",
    "interests": [ "sports", "music" ]
}

PUT /megacorp/employee/2
{
    "first_name" :  "Jane",
    "last_name" :   "Smith",
    "age" :         32,
    "about" :       "I like to collect rock albums",
    "interests":  [ "music" ]
}

PUT /megacorp/employee/3
{
    "first_name" :  "Douglas",
    "last_name" :   "Fir",
    "age" :         35,
    "about":        "I like to build cabinets",
    "interests":  [ "forestry" ]
}
           

2. 搜尋

2.1 搜尋全部員工資料

GET /megacorp/employee/_search    搜尋全部員工 資料存在hits 數組下面
{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 1,
      "hits": [
         {
            "_index": "megacorp",
            "_type": "employee",
            "_id": "2",
            "_score": 1,
            "_source": {
               "first_name": "Jane",
               "last_name": "Smith",
               "age": 32,
               "about": "I like to collect rock albums",
               "interests": [
                  "music"
               ]
            }
         },
         {
            "_index": "megacorp",
            "_type": "employee",
            "_id": "1",
            "_score": 1,
            "_source": {
               "first_name": "John",
               "last_name": "Smith",
               "age": 25,
               "about": "I love to go rock climbing",
               "interests": [
                  "sports",
                  "music"
               ]
            }
         },
         {
            "_index": "megacorp",
            "_type": "employee",
            "_id": "3",
            "_score": 1,
            "_source": {
               "first_name": "Douglas",
               "last_name": "Fir",
               "age": 35,
               "about": "I like to build cabinets",
               "interests": [
                  "forestry"
               ]
            }
         }
      ]
   }
}

           

2.2 搜尋單條員工資訊

GET /megacorp/employee/1    1号員工,原始json 文檔存在在_source字段裡面
{
   "_index": "megacorp",
   "_type": "employee",
   "_id": "1",
   "_version": 1,
   "found": true,
   "_source": {
      "first_name": "John",
      "last_name": "Smith",
      "age": 25,
      "about": "I love to go rock climbing",
      "interests": [
         "sports",
         "music"
      ]
   }
}
           

2.3 條件查找

GET /megacorp/employee/_search?q=last_name:Smith  get參數模式加上q=字段名:value ,查找last name是Smith 的員工
{
   "took": 16,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0.2876821,
      "hits": [
         {
            "_index": "megacorp",
            "_type": "employee",
            "_id": "2",
            "_score": 0.2876821,
            "_source": {
               "first_name": "Jane",
               "last_name": "Smith",
               "age": 32,
               "about": "I like to collect rock albums",
               "interests": [
                  "music"
               ]
            }
         },
         {
            "_index": "megacorp",
            "_type": "employee",
            "_id": "1",
            "_score": 0.2876821,
            "_source": {
               "first_name": "John",
               "last_name": "Smith",
               "age": 25,
               "about": "I love to go rock climbing",
               "interests": [
                  "sports",
                  "music"
               ]
            }
         }
      ]
   }
}
           

3. 使用DSL搜尋

查詢字元串搜尋便于通過指令行完成特定(ad hoc)的搜尋,但是它也有局限性(參閱簡單搜尋章節)。Elasticsearch提供豐富且靈活的查詢語言叫做DSL查詢(Query DSL),它允許你建構更加複雜、強大的查詢。

DSL(Domain Specific Language特定領域語言)以JSON請求體的形式出現。我們可以這樣表示之前關于“Smith”的查詢:

GET /megacorp/employee/_search
{
    "query" : {
        "match" : {
            "last_name" : "Smith"
        }
    }
} 
以上是使用DSL搜尋形式,這會傳回與之前查詢相同的結果。你可以看到有些東西改變了,我們不再使用查詢字元串(query string)做為參數,而是使用請求體代替
---------------------------------------------------------------------
{
   "took": 101,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0.2876821,
      "hits": [
         {
            "_index": "megacorp",
            "_type": "employee",
            "_id": "2",
            "_score": 0.2876821,
            "_source": {
               "first_name": "Jane",
               "last_name": "Smith",
               "age": 32,
               "about": "I like to collect rock albums",
               "interests": [
                  "music"
               ]
            }
         },
         {
            "_index": "megacorp",
            "_type": "employee",
            "_id": "1",
            "_score": 0.2876821,
            "_source": {
               "first_name": "John",
               "last_name": "Smith",
               "age": 25,
               "about": "I love to go rock climbing",
               "interests": [
                  "sports",
                  "music"
               ]
            }
         }
      ]
   }
}
           

3.1 複雜查詢

GET /megacorp/employee/_search    
{
    "query": { 
    "bool": {
        "must":   { "match": { "last_name": "Smith" }},
        "filter": 
            { "range": { "age": { "gte": "30" }}}
        
    }
    }
}
--------------------------------------------------------------

{
   "took": 1,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.2876821,
      "hits": [
         {
            "_index": "megacorp",
            "_type": "employee",
            "_id": "2",
            "_score": 0.2876821,
            "_source": {
               "first_name": "Jane",
               "last_name": "Smith",
               "age": 32,
               "about": "I like to collect rock albums",
               "interests": [
                  "music"
               ]
            }
         }
      ]
   }
}
           

可以用 bool 查詢來實作你的需求。這種查詢将多查詢組合在一起,成為使用者自己想要的布爾查詢。它接收以下參數:

must  文檔  必須  比對這些條件才能被包含進來。 must_not  文檔  必須不  比對這些條件才能被包含進來。 should  如果滿足這些語句中的任意語句,将增加  _score  ,否則,無任何影響。它們主要用于修正每個文檔的相關性得分。 filter 必須  比對,但它以不評分、過濾模式來進行。這些語句對評分沒有貢獻,隻是根據過濾标準來排除或包含文檔。 5.5 後不支援filtered ,使用 bool/must/filter 代替,官方文檔:https://www.elastic.co/guide/cn/elasticsearch/guide/current/combining-queries-together.html

主意官方沒有用query 開始,直接用bool ,我用官方是報錯的

{
    "bool": {
        "must":     { "match": { "title": "how to make millions" }},
        "must_not": { "match": { "tag":   "spam" }},
        "should": [
            { "match": { "tag": "starred" }}
        ],
        "filter": {
          "range": { "date": { "gte": "2014-01-01" }} 
        }
    }
}
           

3.2 全文搜尋

  搜尋所有喜歡“rock climbing”的員工

GET /megacorp/employee/_search   結果出來的根據相關性排序,記錄二其實跟我們搜尋的相關性不大
{
    "query" : {
        "match" : {
            "about" : "rock climbing"
        }
    }
}
------------------------------------------------

{
   "took": 1385,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0.53484553,
      "hits": [
         {
            "_index": "megacorp",
            "_type": "employee",
            "_id": "1",
            "_score": 0.53484553,
            "_source": {
               "first_name": "John",
               "last_name": "Smith",
               "age": 25,
               "about": "I love to go rock climbing",
               "interests": [
                  "sports",
                  "music"
               ]
            }
         },
         {
            "_index": "megacorp",
            "_type": "employee",
            "_id": "2",
            "_score": 0.26742277,
            "_source": {
               "first_name": "Jane",
               "last_name": "Smith",
               "age": 32,
               "about": "I like to collect rock albums",
               "interests": [
                  "music"
               ]
            }
         }
      ]
   }
}
           

3.3 短語搜尋

 全文搜尋搜尋的結果不一定是你想要的,比如有的不是攀岩的員工也展示出來,下面查詢match_phrase能查出所有喜歡攀岩的使用者展示出來。

GET /megacorp/employee/_search
{
    "query" : {
        "match_phrase" : {
            "about" : "rock climbing"
        }
    },
     "highlight": {                             高亮顯示字段
        "fields" : {
            "about" : {}
        }
    }
}
-------------------------------------
{
   "took": 429,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 1,
      "max_score": 0.53484553,
      "hits": [
         {
            "_index": "megacorp",
            "_type": "employee",
            "_id": "1",
            "_score": 0.53484553,
            "_source": {
               "first_name": "John",
               "last_name": "Smith",
               "age": 25,
               "about": "I love to go rock climbing",
               "interests": [
                  "sports",
                  "music"
               ]
            },
            "highlight": {
               "about": [
                  "I love to go <em>rock</em> <em>climbing</em>" 高亮顯示
               ]
            }
         }
      ]
   }
}
           

4. 聚合查詢

4.1 簡單聚合 

最後,我們還有一個需求需要完成:允許管理者在職員目錄中進行一些分析。 Elasticsearch有一個功能叫做聚合(aggregations),它允許你在資料上生成複雜的分析統計。它很像SQL中的GROUP BY但是功能更強大。

GET /megacorp/employee/_search 
{ 
  "aggs": { 
    "all_interests": { 
      "terms": { "field": "interests" } 
    } 
  } 
}
----------------------------------------------------------
{
   "took": 6,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 1,
      "hits": [
         {
            "_index": "megacorp",
            "_type": "employee",
            "_id": "2",
            "_score": 1,
            "_source": {
               "first_name": "Jane",
               "last_name": "Smith",
               "age": 32,
               "about": "I like to collect rock albums",
               "interests": [
                  "music"
               ]
            }
         },
         {
            "_index": "megacorp",
            "_type": "employee",
            "_id": "1",
            "_score": 1,
            "_source": {
               "first_name": "John",
               "last_name": "Smith",
               "age": 25,
               "about": "I love to go rock climbing",
               "interests": [
                  "sports",
                  "music"
               ]
            }
         },
         {
            "_index": "megacorp",
            "_type": "employee",
            "_id": "3",
            "_score": 1,
            "_source": {
               "first_name": "Douglas",
               "last_name": "Fir",
               "age": 35,
               "about": "I like to build cabinets",
               "interests": [
                  "forestry"
               ]
            }
         }
      ]
   },
   "aggregations": {
      "all_interests": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "music",
               "doc_count": 2
            },
            {
               "key": "forestry",
               "doc_count": 1
            },
            {
               "key": "sports",
               "doc_count": 1
            }
         ]
      }
   }
}
           

報錯:Fielddata is disabled on text fields by default. Set fielddata=true on [your_field_name] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory.

解決:參考官方:https://www.elastic.co/guide/en/elasticsearch/reference/current/fielddata.html

在建立索引時,這個字段預設采用的是text 的類型,而該類型并不支援聚合(aggregations),因為這個如果load 到記憶體需要消耗很多空間。官方在設定打開前,特别提醒請考慮是否真的有必要打開。是以請自己考慮!!

是以我們打開一個fielddata  的參數:

PUT /megacorp/_mapping/employee
{
  "properties": {
    "interests": { 
      "type":     "text",
      "fielddata": true
    }
  }
}
           

4.2 條件聚合

  加上where條件的聚合,查找所有姓"Smith"的人最大的共同點(興趣愛好):

GET /megacorp/employee/_search
{
  "query": {
    "match": {
      "last_name": "smith"
    }
  },
  "aggs": {
    "all_interests": {
      "terms": {
        "field": "interests"
      }
    }
  }
}
------------------------------------------------------

{
   "took": 9,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 2,
      "max_score": 0.2876821,
      "hits": [
         {
            "_index": "megacorp",
            "_type": "employee",
            "_id": "2",
            "_score": 0.2876821,
            "_source": {
               "first_name": "Jane",
               "last_name": "Smith",
               "age": 32,
               "about": "I like to collect rock albums",
               "interests": [
                  "music"
               ]
            }
         },
         {
            "_index": "megacorp",
            "_type": "employee",
            "_id": "1",
            "_score": 0.2876821,
            "_source": {
               "first_name": "John",
               "last_name": "Smith",
               "age": 25,
               "about": "I love to go rock climbing",
               "interests": [
                  "sports",
                  "music"
               ]
            }
         }
      ]
   },
   "aggregations": {
      "all_interests": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "music",
               "doc_count": 2
            },
            {
               "key": "sports",
               "doc_count": 1
            }
         ]
      }
   }
}
           

4.3 分級聚合

   統計每種興趣下職員的平均年齡:

GET /megacorp/employee/_search
{
    "aggs" : {
        "all_interests" : {
            "terms" : { "field" : "interests" },
            "aggs" : {
                "avg_age" : {
                    "avg" : { "field" : "age" }
                }
            }
        }
    }
}
------------------------------------------------------

{
   "took": 182,
   "timed_out": false,
   "_shards": {
      "total": 5,
      "successful": 5,
      "failed": 0
   },
   "hits": {
      "total": 3,
      "max_score": 1,
      "hits": [
         {
            "_index": "megacorp",
            "_type": "employee",
            "_id": "2",
            "_score": 1,
            "_source": {
               "first_name": "Jane",
               "last_name": "Smith",
               "age": 32,
               "about": "I like to collect rock albums",
               "interests": [
                  "music"
               ]
            }
         },
         {
            "_index": "megacorp",
            "_type": "employee",
            "_id": "1",
            "_score": 1,
            "_source": {
               "first_name": "John",
               "last_name": "Smith",
               "age": 25,
               "about": "I love to go rock climbing",
               "interests": [
                  "sports",
                  "music"
               ]
            }
         },
         {
            "_index": "megacorp",
            "_type": "employee",
            "_id": "3",
            "_score": 1,
            "_source": {
               "first_name": "Douglas",
               "last_name": "Fir",
               "age": 35,
               "about": "I like to build cabinets",
               "interests": [
                  "forestry"
               ]
            }
         }
      ]
   },
   "aggregations": {
      "all_interests": {
         "doc_count_error_upper_bound": 0,
         "sum_other_doc_count": 0,
         "buckets": [
            {
               "key": "music",
               "doc_count": 2,
               "avg_age": {
                  "value": 28.5
               }
            },
            {
               "key": "forestry",
               "doc_count": 1,
               "avg_age": {
                  "value": 35
               }
            },
            {
               "key": "sports",
               "doc_count": 1,
               "avg_age": {
                  "value": 25
               }
            }
         ]
      }
   }
}
           

官方文檔:https://www.elastic.co/guide/cn/elasticsearch/guide/current/index.html

繼續閱讀