Elasticsearch 中处理关联关系

#● 关系型数据库，⼀般会考虑 Normalize #数据；在 Elasticsearch，往往考虑 #Denormalize 数据

#● Denormalize 的好处：读的速度变快 / #⽆需表连接 / ⽆需⾏锁

#● Elasticsearch #并不擅⻓处理关联关系。我们⼀般采⽤以下#四种⽅法处理关联

#○ 对象类型

#○ 嵌套对象(Nested Object)

#○ ⽗⼦关联关系(Parent / Child )

#○ 应⽤端关联

DELETE blog

设置blog的 Mapping

PUT /blog
{
  "mappings": {
    "properties": {
      "content": {
        "type": "text"
      },
      "time": {
        "type": "date"
      },
      "user": {
        "properties": {
          "city": {
            "type": "text"
          },
          "userid": {
            "type": "long"
          },
          "username": {
            "type": "keyword"
          }
        }
      }
    }
  }
}

插入一条 Blog 信息

PUT blog/_doc/1
{
  "content":"I like Elasticsearch",
  "time":"2019-01-01T00:00:00",
  "user":{
    "userid":1,
    "username":"Jack",
    "city":"Shanghai"
  }
}

找到文章中包含elasticsearch 作者是jack的文章

POST blog/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "content": "Elasticsearch"
          }
        },
        {
          "match": {
            "user.username": "Jack"
          }
        }
      ]
    }
  }
}

## 

DELETE my_movies

电影的Mapping信息

PUT my_movies
{
      "mappings" : {
      "properties" : {
        "actors" : {
          "properties" : {
            "first_name" : {
              "type" : "keyword"
            },
            "last_name" : {
              "type" : "keyword"
            }
          }
        },
        "title" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
}

写入一条电影信息

POST my_movies/_doc/1
{
  "title":"Speed",
  "actors":[
    {
      "first_name":"Keanu",
      "last_name":"Reeves"
    },

    {
      "first_name":"Dennis",
      "last_name":"Hopper"
    }

  ]
}

我们像之前一样进行查询，查询名字为Keanu，lastName为Hopper的，本应该查不到数据因为没有这样一个演员

但仍然查询到了数据因为在存储的时候，内部对象的边间没有考虑在内，json格式被处理成扁平化键值对的结构。

例：“title”:“Speed”

“actors.first_name”:[“Keanu”,“Dennis”]

“actors.last_name”:[“Reeves”,“Hopper”]

我们可以使用Nested Data Type解决这个问题

POST my_movies/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "actors.first_name": "Keanu"
          }
        },
        {
          "match": {
            "actors.last_name": "Hopper"
          }
        }
      ]
    }
  }
}

Nested Data Type 嵌套数据类型

Nested数据类型，允许对象数字中的对象被独立索引(存储)

在内部，nested文档会被保存在俩个lucene中，在查询的时候做join处理

重新创建索引，指定类型nested

在内部指定嵌套的数据属性

DELETE my_movies
PUT my_movies
{
  "mappings": {
    "properties": {
      "actors": {
        "type": "nested",
        "properties": {
          "first_name": {
            "type": "keyword"
          },
          "last_name": {
            "type": "keyword"
          }
        }
      },
      "title": {
        "type": "text"
      }
    }
  }
}

POST my_movies/_doc/1
{
  "title":"Speed",
  "actors":[
    {
      "first_name":"Keanu",
      "last_name":"Reeves"
    },

    {
      "first_name":"Dennis",
      "last_name":"Hopper"
    }

  ]
}

对应嵌套对象我们在进行查询的时候也需要指定进行嵌套查询，并且指定嵌套的路径

在需要进行嵌套查询的地方指定nested 指定path 路径

当我们指定错误的演员名称的时候就不能获取到数据，正确的时候，就可以

POST my_movies/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "Speed"
          }
        },
        {
          "nested": {
            "path": "actors",
            "query": {
              "bool": {
                "must": [
                  {
                    "match": {
                      "actors.first_name": "Keanu"
                    }
                  },
                  {
                    "match": {
                      "actors.last_name": "Hopper"
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

POST my_movies/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "Speed"
          }
        },
        {
          "nested": {
            "path": "actors",
            "query": {
              "bool": {
                "must": [
                  {
                    "match": {
                      "actors.first_name": "Keanu"
                    }
                  },
                  {
                    "match": {
                      "actors.last_name": "Reeves"
                    }
                  }
                ]
              }
            }
          }
        }
      ]
    }
  }
}

Nested Aggregation 的嵌套桶聚合

按照作者的姓进行分组

我们按照之前的方式进行aggregation ，发现其实他不能进行工作

POST my_movies/_search
{
  "size": 0,
  "aggs": {
    "actor_name": {
      "terms": {
        "field": "actors.first_name"
      }
    }
  }
}

对于嵌套对象进行聚合分析的时候，我们需要指定聚合的字段为nested 嵌套对象，并且指定路径将我们的聚合分析，写到nested内的子聚合分析

可以看到我们的子聚合是进行聚合分析的，主聚合没有进行聚合分析工作

POST my_movies/_search
{
  "size": 0,
  "aggs": {
    "actors": {
      "nested": {
        "path": "actors"
      },
      "aggs": {
        "actor_name": {
          "terms": {
            "field": "actors.first_name",
            "size": 10
          }
        }
      }
    },
    "actor_name": {
      "terms": {
        "field": "actors.first_name"
      }
    }
  }
}

https://www.elastic.co/guide/en/elasticsearch/reference/7.1/query-dsl-nested-query.html

elasticsearch Nested Object 嵌套对象映射，查询，以及聚合分析Elasticsearch 中处理关联关系设置blog的 Mapping插入一条 Blog 信息电影的Mapping信息写入一条电影信息https://www.elastic.co/guide/en/elasticsearch/reference/7.1/query-dsl-nested-query.html