ego008 avatar

Porter:一个好用的英文分词算法

🕓 by ego008

The Porter Stemming Algorithm 已有多种语言实现

http://tartarus.org/~martin/PorterStemmer/

python 实现 https://pypi.python.org/pypi/stemming/1.0 纯python https://pypi.python.org/pypi/PorterStemmer wrap c实现

porterstemmer 示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
from porterstemmer import Stemmer

stemmer = Stemmer()
print stemmer("foo")
print stemmer(u"foo")
print stemmer("er")
print stemmer(u"er")
print stemmer("")
print stemmer(u'')
try:
    stemmer()
except:
    print "exception raised."

try:
    stemmer(None)
except:
    print "exception raised."

go 实现 https://github.com/agonopol/go-stem

Go:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
package main

import (
    "bufio"
    "github.com/agonopol/go-stem"
    "os"
)

func main() {
    in := bufio.NewReader(os.Stdin)
    out := bufio.NewWriter(os.Stdout)
    defer out.Flush()

    for word, err := in.ReadSlice('\n'); err == nil; word, err = in.ReadSlice('\n') {
        out.Write(stemmer.Stem(word))
        out.WriteString("\n")
    }
}

💘 相关文章

写一条评论

Based on Golang + fastHTTP + sdb | go1.17.1 Processed in 0ms