Closed
Description
Sorry for the wrong proposal. The problem should only be related to gse-bleve
.
- Gse version (or commit ref): 0.70.1
- Go version: 1.17
- Operating system and bit: Ubuntu 20.04 64bit
- Can you reproduce the bug at Examples:
- Yes (provide example code)
- No
- Not relevant
- Provide example code:
package main
import (
"fmt"
"os"
"github.com/blevesearch/bleve/v2"
gse "github.com/vcaesar/gse-bleve"
)
func main() {
opt := gse.Option{
Index: "test.blv",
// Dicts: "embed, ja",
// Dicts: "embed, zh",
Dicts: "dict.txt",
Stop: "",
Opt: "search-hmm",
Trim: "trim",
}
index, err := gse.New(opt)
if err != nil {
fmt.Println("new mapping error is: ", err)
return
}
text := `見解では、謙虚なヴォードヴィリアンのベテランは、運命の犠牲者と悪役の両方の変遷として代償を払っています`
err = index.Index("1", text)
index.Index("3", text+"浮き沈み")
index.Index("4", `In view, a humble vaudevillian veteran cast vicariously as both victim and villain vicissitudes of fate.`)
index.Index("2", `It's difficult to understand the sum of a person's life.`)
if err != nil {
fmt.Println("index error: ", err)
}
query := "運命の犠牲者"
req := bleve.NewSearchRequest(bleve.NewQueryStringQuery(query))
req.Highlight = bleve.NewHighlight()
res, err := index.Search(req)
fmt.Println(res, err)
os.RemoveAll("test.blv")
}
I've tested these dictionary configurations, trying to find out the bug:
- Using
embed, zh
asDicts
. 43Mb of binary generated. - Using
dict.txt
(a custom dictionary, only 3 lines) asDicts
. 43Mb of binary generated. - Replace
gse-bleve
with the defaultbleve
. 11Mb of binary generated.
It seems that whatever the dictionary is, it would always include the embedded dictionaries in the binary file. I managed to find the bug, but I failed.
Thanks for your help.