ego008
ego008
7587 0 0

Porter:一个好用的英文分词算法

The Porter Stemming Algorithm 已有多种语言实现

http://tartarus.org/~martin/PorterStemmer/

python 实现
https://pypi.python.org/pypi/stemming/1.0 纯python
https://pypi.python.org/pypi/PorterStemmer wrap c实现

porterstemmer 示例

from porterstemmer import Stemmer

stemmer = Stemmer()
print stemmer("foo")
print stemmer(u"foo")
print stemmer("er")
print stemmer(u"er")
print stemmer("")
print stemmer(u'')
try:
    stemmer()
except:
    print "exception raised."

try:
    stemmer(None)
except:
    print "exception raised."

go 实现 https://github.com/agonopol/go-stem

package main

import (
    "bufio"
    "github.com/agonopol/go-stem"
    "os"
)

func main() {
    in := bufio.NewReader(os.Stdin)
    out := bufio.NewWriter(os.Stdout)
    defer out.Flush()

    for word, err := in.ReadSlice('\n'); err == nil; word, err = in.ReadSlice('\n') {
        out.Write(stemmer.Stem(word))
        out.WriteString("\n")
    }
}

0

See Also

Nearby


Discussion

Login Topics