0%

elasticsearch(十一)Highlight、Suggest

一、 Highlight-高亮查询

三种高亮方式

1
2
3
4
5
unified:默认的高亮方式,使用Lucene的实现方式

plain:性能较高,消耗少量内存,性价比高

fvh => fast vactor highlighter 适合字段较大,较复杂的查询情况

指定高亮类型

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
# type => unified plain fvh(fvh需要在mapping中指定该字段的属性term_vector:with_positions_offsets)
GET /person/_search
{
"query": {
"match": {
"name": "测试"
}
},
"highlight": {
"fields": {
"name": {
"type":"unified"
}
}
}
}

#创建mapping,指定term_vector开可以使用fast vactor highlighter方式
PUT /person
{
"mappings": {
"properties": {
"name":{
"analyzer": "ik_max_word",
"type": "text",
"term_vector": "with_positions_offsets"
},
"age":{
"type": "long"
},
"des":{
"analyzer": "ik_max_word",
"type": "text"
}
}
}
}

单字段高亮

​ 默认的高亮标签为<em>

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
GET /person/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"name": "测试"
}
},{
"match": {
"des": "测试"
}
}
]
}
},
"highlight": {
"fields": {
"name": {
"type": "fvh",
"post_tags": "</b>",
"pre_tags": "<b>"
},
"des": {}
}
}
}

全局字段高亮

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
GET /person/_search
{
"query": {
"bool": {
"should": [
{
"match": {
"name": "测试"
}
},
{
"match": {
"des": "测试"
}
}
]
}
},
"highlight": {
"post_tags": "</b>",
"pre_tags": "<b>",
"fields": {
"name": {
},
"des": {}
}
}
}

二、Suggest-搜索推荐

四种suggest:term suggester、phrase suggester、completion suggester、context suggester

2.1、term suggester

根据词项的词频来推荐

参数说明

1
2
3
4
5
6
7
8
9
10
11
12
13
14
text:用户搜索的文本
field:要从哪个字段选取推荐数据
analyzer:使用哪种分词器
size:每个建议返回的最大结果数
sort:如何按照提示词项排序,参数值只可以是以下两个枚举:
- score:分数>词频>词项本身
- frequency:词频>分数>词项本身
max_edits:可以具有最大偏移距离候选建议以便被认为是建议。只能是1到2之间的值。任何其他值都将导致引发错误的请求错误。默认为2
prefix_length:前缀匹配的时候,必须满足的最少字符
min_doc_freq:最少的文档频率
suggest_mode:搜索推荐的推荐模式,参数值亦是枚举:
- missing 匹配不再索引中的词项(不包含自己的结果)
- popular 匹配比原始词项的文档词频更高的词项(比自己高的结果)
- always 匹配推荐的任意词项(匹配所有结果)

推荐模式(默认missing)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
GET /news/_search
{
"suggest": {
"missing_suggest": {
"text": "baoqian baoqiang",
"term": {
"suggest_mode": "missing",
"field": "title"
}
},
"popular_suggest": {
"text": "baoqian baoqiang",
"term": {
"suggest_mode": "popular",
"field": "title"
}
},
"always_suggest": {
"text": "baoqian baoqiang",
"term": {
"suggest_mode": "always",
"field": "title"
}
}
}
}

2.2、phrase suggester

phrase suggester和term suggester相比,对建议的文本会参考上下文,也就是一个句子的其他token,不只是单纯的token距离匹配,它可以基于共生和频率选出更好的建议。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
GET /news/_search
{
"suggest": {
"my-suggestion": {
"text": "baoqing baoqiang",
"phrase": {
"field": "title",
"size": 3,
"highlight": {
"pre_tag": "<h1>",
"post_tag": "</h1>"
},
"direct_generator": [
{
"suggest_mode": "always",
"field": "content"
},{
"suggest_mode": "popular",
"field": "content"
}
]
}
}
}
}

2.3、completion suggester(支持中文)

自动补全,自动完成,基于内存,性能很高,支持三种查询【前缀查询(prefix)/模糊查询(fuzzy)/正则表达式查询(regex)】

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
#创建mapping,指端suggest类型
PUT suggest_carinfo
{
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "ik_max_word",
"fields": {
"suggest": {
"type": "completion",
"analyzer": "ik_max_word"
}
}
},
"content": {
"type": "text",
"analyzer": "ik_max_word"
}
}
}
}

prefix query

​ 基于前缀查询的搜索提示,是最常用的一种搜索推荐查询。只匹配前缀的话召回率低。

1
2
3
4
prefix:客户端搜索词
field:建议词字段
size:需要返回的建议词数量
skip_duplicates:是否过滤掉重复建议,默认false
1
2
3
4
5
6
7
8
9
10
11
GET suggest_carinfo/_search?pretty
{
"suggest": {
"car_suggest" : {
"prefix" : "A6",
"completion" : {
"field" : "title.suggest"
}
}
}
}

fuzzy query

1
2
3
4
5
6
fuzziness:允许的偏移量,默认auto
transpositions:如果设置为true,则换位计为一次更改而不是两次更改,默认为true。
min_length:返回模糊建议之前的最小输入长度,默认 3
prefix_length:输入的最小长度(不检查模糊替代项)默认为 1
unicode_aware:如果为true,则所有度量(如模糊编辑距离,换位和长度)均以Unicode代码点而不是以字节为单位。这比原始字节略慢,因此默认情况下将其设置为false。
skip_duplicates:是否过滤掉重复建议,默认false
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
POST suggest_carinfo/_search
{
"suggest": {
"car_suggest": {
"prefix": "宝马5系",
"completion": {
"field": "title.suggest",
"skip_duplicates":true,
"fuzzy": {
"fuzziness": 2
}
}
}
}
}

regex query

可以用正则表示前缀,不建议使用

1
2
3
4
5
6
7
8
9
10
11
POST suggest_carinfo/_search
{
"suggest": {
"car_suggest": {
"regex": "[\\s\\S]*",
"completion": {
"field": "title.suggest"
}
}
}
}