博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
ES 相似度算法设置(续)
阅读量:7119 次
发布时间:2019-06-28

本文共 2011 字,大约阅读时间需要 6 分钟。

Tuning BM25

One of the nice features of BM25 is that, unlike TF/IDF, it has two parameters that allow it to be tuned:

k1
This parameter controls how quickly an increase in term frequency results in term-frequency saturation. The default value is 
1.2. Lower values result in quicker saturation, and higher values in slower saturation.
b
This parameter controls how much effect field-length normalization should have. A value of 
0.0disables normalization completely, and a value of 
1.0 normalizes fully. The default is 
0.75.

The practicalities of tuning BM25 are another matter. The default values for k1 and b should be suitable for most document collections, but the optimal values really depend on the collection. Finding good values for your collection is a matter of adjusting, checking, and adjusting again.

The similarity algorithm can be set on a per-field basis. It’s just a matter of specifying the chosen algorithm in the field’s mapping:

PUT /my_index{  "mappings": {    "doc": {      "properties": {        "title": {          "type":       "string",          "similarity": "BM25"
},        "body": {          "type":       "string",          "similarity": "default"
}      }  }}

The title field uses BM25 similarity.

The body field uses the default similarity (see ).

Currently, it is not possible to change the similarity mapping for an existing field. You would need to reindex your data in order to do that.

Configuring BM25

Configuring a similarity is much like configuring an analyzer. Custom similarities can be specified when creating an index. For instance:

PUT /my_index{  "settings": {    "similarity": {      "my_bm25": {
"type": "BM25",        "b":    0
}    }  },  "mappings": {    "doc": {      "properties": {        "title": {          "type":       "string",          "similarity": "my_bm25"
},        "body": {          "type":       "string",          "similarity": "BM25"
}      }    }  }} 参考:https://www.elastic.co/guide/en/elasticsearch/guide/current/changing-similarities.html

转载地址:http://wdfel.baihongyu.com/

你可能感兴趣的文章
ES6的一些小技巧,代替lodash
查看>>
为什么ES6新增了Promise对象来处理异步调用
查看>>
珍惜每一个假期
查看>>
解决循环引用
查看>>
使用harbor和nexus作为docker registry
查看>>
rdc第四天
查看>>
关于 Android studio 在xml中不提示的问题
查看>>
Spring系列之AOP分析开篇(一)
查看>>
关于Android中多module使用fat-aar合并的坑
查看>>
同时兼容iOS、Android、微信小程序的UI引擎
查看>>
KVC的取值赋值
查看>>
Vue2.x+axios+iview+mui带你撸一个App
查看>>
首屏预渲染方案
查看>>
漫谈直播:从零开始认识直播并快速搭建专属直播平台
查看>>
vue学习第一天 - 安装
查看>>
Vue源码分析系列三:render
查看>>
2018上半年信息系统项目管理师真题
查看>>
为 Charles 添加代理页面按钮(Rewrite)
查看>>
决战燕京城-03 公司倒闭风波
查看>>
python面向对象[基础]
查看>>