天天看點

索引子產品-ICU分析插件(ICU Analysis Plugin)ICU NormalizationICU FoldingICU Collation

ICU Analysis Plugin

The ICU analysis plugin allows for unicode normalization, collation and folding. The plugin is called elasticsearch-analysis-icu.

The plugin includes the following analysis components:

ICU Normalization

Normalizes characters as explained here. It registers itself by default under

icu_normalizer

or

icuNormalizer

using the default settings. Allows for the name parameter to be provided which can include the following values:

nfc

,

nfkc

, and

nfkc_cf

. Here is a sample settings:

{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "normalization" : {
                    "tokenizer" : "keyword",
                    "filter" : ["icu_normalizer"]
                }
            }
        }
    }
}      

ICU Folding

Folding of unicode characters based on

UTR#30

. It registers itself under

icu_folding

and

icuFolding

names.

The filter also does lowercasing, which means the lowercase filter can normally be left out. Sample setting:

{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "folding" : {
                    "tokenizer" : "keyword",
                    "filter" : ["icu_folding"]
                }
            }
        }
    }
}      

Filtering

The folding can be filtered by a set of unicode characters with the parameter

unicodeSetFilter

. This is useful for a non-internationalized search engine where retaining a set of national characters which are primary letters in a specific language is wanted. See syntax for the UnicodeSet here.

The Following example excempt Swedish characters from the folding. Note that the filtered characters are NOT lowercased which is why we add that filter below.

{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "folding" : {
                    "tokenizer" : "standard",
                    "filter" : ["my_icu_folding", "lowercase"]
                }
            }
            "filter" : {
                "my_icu_folding" : {
                    "type" : "icu_folding"
                    "unicodeSetFilter" : "[^åäöÅÄÖ]"
                }
            }
        }
    }
}      

ICU Collation

Uses collation token filter. Allows to either specify the rules for collation (defined here) using the

rules

parameter (can point to a location or expressed in the settings, location can be relative to config location), or using the

language

parameter (further specialized by country and variant). By default registers under

icu_collation

or

icuCollation

and uses the default locale.

Here is a sample settings:

{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "collation" : {
                    "tokenizer" : "keyword",
                    "filter" : ["icu_collation"]
                }
            }
        }
    }
}      

And here is a sample of custom collation:

{
    "index" : {
        "analysis" : {
            "analyzer" : {
                "collation" : {
                    "tokenizer" : "keyword",
                    "filter" : ["myCollator"]
                }
            },
            "filter" : {
                "myCollator" : {
                    "type" : "icu_collation",
                    "language" : "en"
                }
            }
        }
    }
}



http://shop.paipai.com/799078779
http://shop.paipai.com/799078779
http://shop.paipai.com/799078779
http://shop.paipai.com/799078779
http://shop.paipai.com/799078779
http://shop.paipai.com/799078779
http://shop.paipai.com/799078779
http://shop.paipai.com/799078779      
  • 索引子產品-ICU分析插件(ICU Analysis Plugin)ICU NormalizationICU FoldingICU Collation
  • 索引子產品-ICU分析插件(ICU Analysis Plugin)ICU NormalizationICU FoldingICU Collation
  • 索引子產品-ICU分析插件(ICU Analysis Plugin)ICU NormalizationICU FoldingICU Collation

12345

  • 索引子產品-ICU分析插件(ICU Analysis Plugin)ICU NormalizationICU FoldingICU Collation
  • 索引子產品-ICU分析插件(ICU Analysis Plugin)ICU NormalizationICU FoldingICU Collation
索引子產品-ICU分析插件(ICU Analysis Plugin)ICU NormalizationICU FoldingICU Collation
  • 索引子產品-ICU分析插件(ICU Analysis Plugin)ICU NormalizationICU FoldingICU Collation
  • 索引子產品-ICU分析插件(ICU Analysis Plugin)ICU NormalizationICU FoldingICU Collation
索引子產品-ICU分析插件(ICU Analysis Plugin)ICU NormalizationICU FoldingICU Collation
索引子產品-ICU分析插件(ICU Analysis Plugin)ICU NormalizationICU FoldingICU Collation
索引子產品-ICU分析插件(ICU Analysis Plugin)ICU NormalizationICU FoldingICU Collation
索引子產品-ICU分析插件(ICU Analysis Plugin)ICU NormalizationICU FoldingICU Collation
索引子產品-ICU分析插件(ICU Analysis Plugin)ICU NormalizationICU FoldingICU Collation

繼續閱讀