Elasticsearch7.x使用(五) ICU分词插件

2023-04-13 20:53:24

1、查看当前已经安装的分词插件

[[email protected] bin]$ ./elasticsearch-plugin list
analysis-icu
analysis-ik

引用网上对ICU分词的介绍：

ICU Analysis插件是一组将Lucene ICU模块集成到Elasticsearch中的库。本质上，ICU的目的是增加对Unicode和全球化的支持，以提供对亚洲语言更好的文本分割分析。从Elasticsearch的角度来看，此插件提供了文本分析中的新组件，如下表所示:

Elasticsearch7.x使用(五) ICU分词插件

常用分词

1)普通分词
GET _analyze
{
  "text": ["他是一个前端开发工程师"],
  "analyzer": "standard"
}

GET _analyze
{
  "text": ["他是一个前端开发工程师"],
  "analyzer": "keyword"
}

2)IK 分词
GET _analyze
{
  "text": ["他是一个前端开发工程师"],
  "analyzer": "ik_max_word"
}

3) ICU 分词
GET _analyze
{
  "text": ["他是一个前端开发工程师"],
  "analyzer": "icu_analyzer"
}

ICU分词测试：

{
  "tokens" : [
    {
      "token" : "他是",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "<IDEOGRAPHIC>",
      "position" : 0
    },
    {
      "token" : "一个",
      "start_offset" : 2,
      "end_offset" : 4,
      "type" : "<IDEOGRAPHIC>",
      "position" : 1
    },
    {
      "token" : "前端",
      "start_offset" : 4,
      "end_offset" : 6,
      "type" : "<IDEOGRAPHIC>",
      "position" : 2
    },
    {
      "token" : "开发",
      "start_offset" : 6,
      "end_offset" : 8,
      "type" : "<IDEOGRAPHIC>",
      "position" : 3
    },
    {
      "token" : "工程",
      "start_offset" : 8,
      "end_offset" : 10,
      "type" : "<IDEOGRAPHIC>",
      "position" : 4
    },
    {
      "token" : "师",
      "start_offset" : 10,
      "end_offset" : 11,
      "type" : "<IDEOGRAPHIC>",
      "position" : 5
    }
  ]
}

Elasticsearch7.x使用(五) ICU分词插件

继续阅读

k8s部署es集群和kibana

ElasticSearch：部署ElasticSearch & Kibana

ES分词插件IK Analyzer安装

【elasticsearch】The number of object passed must be even but was [1]1.概述

跟据经纬度实现附近搜索Java实现

【最新 v7.9】Elasticsearch的基本概念与配置

图解elasticsearch的_source、_all、store和index

深入elasticsearch源码之环境搭建

elasticsearch 的 Percolator操作

es使用项目中遇到的问题

15.profile-api

【转】ElasticSearch是什么以及应用场景

ElasticSearch是什么以及应用场景ES是如何产生的？ES 基础一网打尽ES特点和优势为什么要用ES？ES的应用场景是怎样的？

延云行业搜索数据库在大数据生态中位置和重要性大数据的挑战大数据技术的现状延云行业搜索数据库

尚硅谷—韩顺平—图解 Java设计模式（结构型）（55～）

30天了解30种技术系列---(10)面向Cloud的搜索引擎 ElasticSearch