前言

elasticsearch給我們提供了很強大的搜尋功能，但是有時候僅僅隻用相關度打分是不夠的，是以elasticsearch給我們提供了自定義打分函數function_score，本文結合簡單案例詳解function_score的使用方法，關于function-score-query的文檔最權威的莫過于官方文檔: function_score官方文檔

基本資料準備

我們建立一張新聞表，包含如下字段：

字段	類型	說明
id	Long	新聞ID
title	string	标題
tags	string	标簽
read_count	long	閱讀數
like_count	long	點贊數
comment_count	long	評論數
rank	double	自定義權重
location	arrays	文章釋出經緯度
pub_time	date	釋出時間

建立elasticsearch的 Mapping：

json複制代碼PUT /news
{
  "mappings": {
    "properties": {
      "id": {
        "type": "long"
      },
      "title": {
        "type": "text",
        "analyzer": "standard"
      },
      "tags": {
        "type": "keyword"
      },
      "read_count": {
        "type": "long"
      },
     "like_count": {
        "type": "long"
      },
     "comment_count": {
        "type": "long"
      },
      "rank": {
        "type": "double"
      },
      "location": {
          "type": "geo_point"
        },
      "pub_time": {
        "type": "date",
        "format": "yyyy-MM-dd HH:mm:ss||yyyy-MM-dd HH:mm||yyyy-MM-dd||epoch_millis"
      }
    }
  }
}

準備測試資料：

id	title	tags	read_count	comment_count	like_count	rank	location	pub_time
1	台風“杜蘇芮”登陸福建晉江多部門多地全力應對	台風;杜蘇芮;福建	10000	2000	600	118.55199,24.78144	2023-07-29 09:47
2	受台風“杜蘇芮”影響北京7月29日至8月1日将有強降雨	台風;杜蘇芮;北京	1000	200	60	116.23128,40.22077	2023-06-29 14:49:38
3	杭州解除台風藍色預警信号	台風;杭州	10	2	6	0.9	120.21201,30.2084	2020-07-29 14:49:38

批量添加資料到elasticsearch中:

json複制代碼POST _bulk
{"create": {"_index": "news", "_id": 1}}
{"comment_count":600,"id":1,"like_count":2000,"location":[118.55199,24.78144],"pub_time":"2023-07-29 09:47","rank":0.0,"read_count":10000,"tags":["台風","杜蘇芮","福建"],"title":"台風“杜蘇芮”登陸福建晉江 多部門多地全力應對"}
{"create": {"_index": "news", "_id": 2}}
{"comment_count":60,"id":2,"like_count":200,"location":[116.23128,40.22077],"pub_time":"2023-06-29 14:49:38","rank":0.0,"read_count":1000,"tags":["台風","杜蘇芮","北京"],"title":"受台風“杜蘇芮”影響 北京7月29日至8月1日将有強降雨"}
{"create": {"_index": "news", "_id": 3}}
{"comment_count":6,"id":3,"like_count":20,"location":[120.21201,30.208],"pub_time":"2020-07-29 14:49:38","rank":0.99,"read_count":100,"tags":["台風","杭州"],"title":"杭州解除台風藍色預警信号"}

random_score的使用

我們通過random_score了解一下weight、score_mode,boost_mode的作用分别是什麼，先直接看Demo

json複制代碼
GET /news/_search
{
  "query": {
    "function_score": {
      "query": {"match": {
        "title": "台風"
      }},
      "functions": [
        {
          "random_score": {}, 
          "weight": 1
        },
         {
          "filter": { "match": { "title": "杭州" } },
          "weight":42
        }
      ],
      "score_mode": "sum",
      "boost_mode": "replace"
    }
  }
}

對應JAVA查詢代碼:

java複制代碼        BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery();
        queryBuilder.should(QueryBuilders.matchQuery("title","杭州"));
        FunctionScoreQueryBuilder.FilterFunctionBuilder[] filterFunctionBuilders = new FunctionScoreQueryBuilder.FilterFunctionBuilder[1];
        ScoreFunctionBuilder<RandomScoreFunctionBuilder> randomScoreFilter = new RandomScoreFunctionBuilder();
        ((RandomScoreFunctionBuilder) randomScoreFilter).seed(2);
        filterFunctionBuilders[0] = new FunctionScoreQueryBuilder.FilterFunctionBuilder(randomScoreFilter);
        FunctionScoreQueryBuilder query = QueryBuilders.functionScoreQuery(queryBuilder, filterFunctionBuilders).scoreMode(FunctionScoreQuery.ScoreMode.SUM).boostMode(CombineFunction.SUM);
        SearchSourceBuilder searchSourceBuilder= new SearchSourceBuilder().query(query);
        SearchRequest searchRequest= new SearchRequest().searchType(SearchType.DFS_QUERY_THEN_FETCH).indices("news").source(searchSourceBuilder);
        SearchResponse response =  restClient.search(searchRequest, RequestOptions.DEFAULT);
        SearchHits hits = response.getHits();
        String searchSource;
        for (SearchHit hit : hits)
        {
            searchSource = hit.getSourceAsString();
            System.out.println(searchSource);
        }

查詢結果：

通過案例實戰詳解elasticsearch自定義打分function_score的使用

這個查詢使用的function_score，query中通過title搜尋“台風”，在functions我們增加了兩個打分，一個是random_score，随機生成一個得分，得分的weight權重是1，第二個是如果标題中有“杭州”，得分權重為42，

random_score 顧名思義就是生成一個(0,1)之間的随機得分，我能想到的一個應用場景是，有一天産品要求：每個人看到新聞都不一樣，要做到“千人千面”，而且隻給你一天的時間，這樣我們就可以使用random_score，每次拉取的資料都是随機的，每個人看到的新聞都是不一樣的，這個随機查詢比Mysql實作簡單多了，0成本實作了“千人千面”。
weight 這個就是給生成的得分增加一個權重，在上面的Demo中，我們第一個 weight=1,第二個weight=42,從搜尋結果得分可以看出“杭州解除台風藍色預警信号”這條得分是42.40192，而下面的隻有0.8194501，因為增加了42倍的權重。
score_mode

score_mode的作用是對functions中計算出來的多個得分做彙總計算，比如我用了是sum，就是指将上面random_score得到的打分和filter中得到的42分相加，也就是說第一條42.40192得分是random_score生成了0.40192再加上filter中得到了42分。score_mode預設是采用multiply，總共有6種計算方式：

random_score函數	計算方式
multiply	scores are multiplied (default)
sum	scores are summed
avg	scores are averaged
first	the first function that has a matching filter is applied
max	maximum score is used
min	minimum score is used

boost_mode

boost_mode作用是将functions得到的總分數和我們query查詢的得到的分數做計算，比如我們使用的是replace就是完全使用functions中的得分替代query中的得分，boost_mode總共有6種計算方式：

boost_mode函數	得分計算方式
multiply	query score and function score is multiplied (default)
replace	only function score is used, the query score is ignored
sum	query score and function score are added
avg	average
max	max of query score and function score
min	min of query score and function score

script_score的使用

script_score就是用記可以通過各種函數計算你文檔中出現的字段，算出一個自己想要的得分，我們直接看Demo

json複制代碼GET /news/_search
{
  "query": {
    "function_score": {
      "query": {
        "match": { "title": "台風" }
      },
      "script_score": {
        "script": {
          "params": {
            "readCount": 1,
            "likeCount":5,
            "commentCount":10
          },
          "source": "Math.log(params.readCount* doc['read_count'].value +params.likeCount* doc['like_count'].value+params.commentCount* doc['comment_count'].value) "
        }
      },
      "boost_mode": "multiply"
    }
  }
}

每篇新聞有閱讀數、點贊資料、評論數，我們可以通過這三個名額算出一個分值來評價一篇文章的熱度，然後将這個熱度和query中的得分相乘，這樣熱度很高的文章可以排到更前面。在這個Demo中我使用了一個簡單的權重來計算文章熱度，一般來說閱讀數是最大的，點贊數次之，評論數是最小的。

文章熱度=Log(評論數×10+點贊數×5+閱讀數)文章熱度=Log(評論數\times 10+點贊數\times5+閱讀數)文章熱度=Log(評論數×10+點贊數×5+閱讀數)

這裡為了示範，簡單算一下文章熱度，真實的要比這個複雜的多，可能不同種類的文章重要性也是不一樣的。

field_value_factor的使用

field_value_factor可以了解成elasticsearch給你一些内置的script_score，每次寫script_score必定不是太友善，如果有一些内置的函數，開箱即用就友善多了，我們直接看Demo

json複制代碼GET /news/_search
{
    "query": {
        "function_score": {
            "query": {
                "match": {
                    "title": "台風"
                }
            },
            "field_value_factor": {
                "field": "rank",
                "factor": 10,
                "modifier": "sqrt",
                "missing": 1
            },
            "boost_mode": "multiply"
        }
    }
}

這裡的field_value_factor就對相當script_score的 sqrt(10 * doc['rank'].value)，這裡的factor是乘以多少倍，預設是1倍，missing是如果沒有這個字段預設值為1，modifier是計算函數，field是要計算的字段。

modifier計算函數有以下類型可以選擇

modifier函數	得分計算方式
none	Do not apply any multiplier to the field value
log	Take the common logarithm of the field value. Because this function will return a negative value and cause an error if used on values between 0 and 1, it is recommended to use log1p instead.
log1p	Add 1 to the field value and take the common logarithm
log2p	Add 2 to the field value and take the common logarithm
ln	Take the natural logarithm of the field value. Because this function will return a negative value and cause an error if used on values between 0 and 1, it is recommended to use ln1p instead.
ln1p	Add 1 to the field value and take the natural logarithm
ln2p	Add 2 to the field value and take the natural logarithm
square	Square the field value (multiply it by itself)
sqrt	Take the square root of the field value
reciprocal	Reciprocate the field value, same as 1/x where x is the field’s value

衰減函數Decay functions的使用

衰減函數可以了解成計算文檔中某一個字段與給定值的距離，如果距離越近得分就越高，距離越遠得分就越低，這個就比較适用于新聞釋出時間的衰減了，越久前釋出的新聞，得分應該越小，排序越往後。我們直接看Demo

json複制代碼GET /news/_search
{
    "query": {
        "function_score": {
            "query": {
                "match": {
                    "title": "台風"
                }
            },
            "functions": [
                {
                    "gauss": {
                        "pub_time": {
                            "origin": "now",
                            "offset": "7d",
                            "scale": "60d",
                            "decay": 0.9
                        }
                    }
                },
                {
                    "exp": {
                        "location": {
                            "origin": {
                                "lat": 120.21551,
                                "lon": 30.25308
                            },
                            "offset": "50km",
                            "scale": "50km",
                            "decay": 0.1
                        }
                    }
                }
            ],
            "score_mode": "sum", 
            "boost_mode": "sum"
        }
    }
}

搜尋結果：

衰減函數有3種，分别為gauss高斯函數、lin線程函數、exp對數函數，具體的計算公式可以參考官方文檔，這裡我們主要了解衰減函數的4個參數作用是什麼。

origin 可以了解成計算距離的原點，比如上面計算新聞釋出時間的原點是目前時間，計算經緯度的原點是使用者搜尋位置，比如我在杭州，那麼origin就是杭州的經緯度
offset 這個偏移量可以了解成不需要衰減的距離，比如在上面的Demo中，距離pub_time的offset為7d，意思是說近7天内釋出的新聞都不需要衰減，得分直接為1。計算經緯度中的offset為50km意思是說距離使用者50km裡的新聞不需要衰減，50km内的基本都是杭州本地的新聞，就沒必要衰減了。
scale和decay 這兩個參數可以參考官方給的三種函數衰減圖，scale和decay表示距離為scale後得分衰減到原來的scale倍。比如上面時間衰減offset=7d， scale=60d，decay= 0.9加起來的意思就是7天内的新聞不衰減，67天(7d+60d)前的新聞得分為0.9，在經緯度衰減中offset=50km， scale=50km，decay= 0.1的意思是50km内的距離不衰減，100km(50km+50km)外的資料得分為0.1。

總結

elasticsearch的function_score給我提供了好幾種很靈活的自定義打分政策，在實際項目中需要根據自己的需求合理的組合這些打分政策并調整對應參數才能滿足自己的搜尋需求，本文主要介紹function_score的使用，接下來我會根據一個實際的搜尋應用介紹一下如何組合、設定這些函數以達到比較了解的搜尋效果。

通過案例實戰詳解elasticsearch自定義打分function_score的使用

前言

基本資料準備

random_score的使用

script_score的使用

field_value_factor的使用

衰減函數Decay functions的使用

總結

繼續閱讀

linux下shell讀寫檔案優化操作總結

Visual Tracking 和 Motion Estimation的差別

關于Flex及AS3的百多條小小知識

AS3 Signals之入門篇

Flash AS3 連續加載外部若幹圖檔

【轉】ElasticSearch是什麼以及應用場景

ElasticSearch是什麼以及應用場景ES是如何産生的？ES 基礎一網打盡ES特點和優勢為什麼要用ES？ES的應用場景是怎樣的？

2.4和2.6核心的netfilter差異點

What are training set, validation set and test set?

ms sqlserver常用sql語句

延雲行業搜尋資料庫在大資料生态中位置和重要性大資料的挑戰大資料技術的現狀延雲行業搜尋資料庫

可變參數宏， Variadic Macros

不用iconv函數實作UTF-8編碼轉換GB2312的PHP函數

尚矽谷—韓順平—圖解 Java設計模式（結構型）（55～）

無元件上傳圖檔到資料庫中，最完整解決方案

30天了解30種技術系列---(10)面向Cloud的搜尋引擎 ElasticSearch