1、根據使用者ID、是否隐藏、文章ID、發帖日期來搜尋文章
(1)插入一些測試文章資料
POST /forum/article/_bulk
{ "index": { "_id": 1 }}
{ "articleID" : "XHDK-A-1293-#fJ3", "userID" : 1, "hidden": false, "postDate": "2017-01-01" }
{ "index": { "_id": 2 }}
{ "articleID" : "KDKE-B-9947-#kL5", "userID" : 1, "hidden": false, "postDate": "2017-01-02" }
{ "index": { "_id": 3 }}
{ "articleID" : "JODL-X-1937-#pV7", "userID" : 2, "hidden": false, "postDate": "2017-01-01" }
{ "index": { "_id": 4 }}
{ "articleID" : "QQPX-R-3956-#aD8", "userID" : 2, "hidden": true, "postDate": "2017-01-02" }
初步來說,就先搞4個字段,因為整個
es是支援json document格式的,是以說擴充性和靈活性非常之好。如果後續随着業務需求的增加,要在document中增加更多的field,那麼我們可以很友善的随時添加field。但是如果是在關系型資料庫中,比如mysql,我們建立了一個表,現在要給表中新增一些column,那就很坑爹了,必須用複雜的修改表結構的文法去執行。而且可能對系統代碼還有一定的影響。
GET /forum/_mapping/article
{
"forum": {
"mappings": {
"article": {
"properties": {
"articleID": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"hidden": {
"type": "boolean"
},
"postDate": {
"type": "date"
},
"userID": {
"type": "long"
}
}
}
}
}
}
現在es 5.2版本,type=text,預設會設定兩個field,一個是field本身,比如articleID,就是分詞的;還有一個的話,就是field.keyword,articleID.keyword,預設不分詞,會最多保留256個字元
(2)根據使用者ID搜尋文章
GET /forum/article/_search
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"userID" : 1
}
}
}
}
}
term filter/query:對搜尋文本不分詞,直接拿去反向索引中比對,你輸入的是什麼,就去比對什麼
比如說,如果對搜尋文本進行分詞的話,“helle world” --> “hello”和“world”,兩個詞分别去反向索引中比對
term,“hello world” --> “hello world”,直接去反向索引中比對“hello world”
(3)搜尋沒有隐藏的文章
GET /forum/article/_search
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"hidden" : false
}
}
}
}
}
(4)根據發帖日期搜尋文章
GET /forum/article/_search
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"postDate" : "2017-01-01"
}
}
}
}
}
(5)根據文章ID搜尋文章
GET /forum/article/_search
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"articleID" : "XHDK-A-1293-#fJ3"
}
}
}
}
}
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 0,
"max_score": null,
"hits": []
}
}
GET /forum/article/_search
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"articleID.keyword" : "XHDK-A-1293-#fJ3"
}
}
}
}
}
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 5,
"successful": 5,
"failed": 0
},
"hits": {
"total": 1,
"max_score": 1,
"hits": [
{
"_index": "forum",
"_type": "article",
"_id": "1",
"_score": 1,
"_source": {
"articleID": "XHDK-A-1293-#fJ3",
"userID": 1,
"hidden": false,
"postDate": "2017-01-01"
}
}
]
}
}
articleID.keyword,是es最新版本内置建立的field,就是不分詞的。是以一個articleID過來的時候,會建立兩次索引,一次是自己本身,是要分詞的,分詞後放入反向索引;另外一次是基于articleID.keyword,不分詞,保留256個字元最多,直接一個字元串放入反向索引中。
是以term filter,對text過濾,可以考慮使用内置的field.keyword來進行比對。但是有個問題,預設就保留256個字元。是以盡可能還是自己去手動建立索引,指定not_analyzed吧。在最新版本的es中,不需要指定not_analyzed也可以,将type=keyword即可。
(6)檢視分詞
GET /forum/_analyze
{
"field": "articleID",
"text": "XHDK-A-1293-#fJ3"
}
預設是analyzed的text類型的field,建立反向索引的時候,就會對所有的articleID分詞,分詞以後,原本的articleID就沒有了,隻有分詞後的各個word存在于反向索引中。
term,是不對搜尋文本分詞的,XHDK-A-1293-#fJ3 --> XHDK-A-1293-#fJ3;但是articleID建立索引的時候,XHDK-A-1293-#fJ3 --> xhdk,a,1293,fj3
(7)重建索引
DELETE /forum
PUT /forum
{
"mappings": {
"article": {
"properties": {
"articleID": {
"type": "keyword"
}
}
}
}
}
POST /forum/article/_bulk
{ "index": { "_id": 1 }}
{ "articleID" : "XHDK-A-1293-#fJ3", "userID" : 1, "hidden": false, "postDate": "2017-01-01" }
{ "index": { "_id": 2 }}
{ "articleID" : "KDKE-B-9947-#kL5", "userID" : 1, "hidden": false, "postDate": "2017-01-02" }
{ "index": { "_id": 3 }}
{ "articleID" : "JODL-X-1937-#pV7", "userID" : 2, "hidden": false, "postDate": "2017-01-01" }
{ "index": { "_id": 4 }}
{ "articleID" : "QQPX-R-3956-#aD8", "userID" : 2, "hidden": true, "postDate": "2017-01-02" }
(8)重新根據文章ID和發帖日期進行搜尋
GET /forum/article/_search
{
"query" : {
"constant_score" : {
"filter" : {
"term" : {
"articleID" : "XHDK-A-1293-#fJ3"
}
}
}
}
}
2、梳理學到的知識點
(1)term filter:根據exact value進行搜尋,數字、boolean、date天然支援
(2)text需要建索引時指定為not_analyzed,才能用term query
(3)相當于SQL中的單個where條件
select *
from forum.article
where articleID='XHDK-A-1293-#fJ3'
參考内容:
Elasticsearch進階實戰