의 train()
및 evaluate()
함수에 대한 입력에는 튜플 목록의 목록이 필요합니다. 각 내부 목록은 각 튜플이 한 쌍의 문자열 인 목록입니다.
을 감안할 때 train.txt
및 test.txt
:
$ cat train.txt
This foo
is foo
a foo
sentence bar
. .
That foo
is foo
another foo
sentence bar
in foo
conll bar
format bar
. .
$ cat test.txt
What foo
is foo
this foo
sentence bar
? ?
How foo
about foo
that foo
sentence bar
? ?
은 튜플의리스트에 CoNLL 형식의 파일을 읽습니다.
# Using https://github.com/alvations/lazyme
>>> from lazyme import per_section
>>> tagged_train_sentences = [[tuple(token.split('\t')) for token in sent] for sent in per_section(open('train.txt'))]
# Or otherwise
>>> def per_section(it, is_delimiter=lambda x: x.isspace()):
... """
... From http://stackoverflow.com/a/25226944/610569
... """
... ret = []
... for line in it:
... if is_delimiter(line):
... if ret:
... yield ret # OR ''.join(ret)
... ret = []
... else:
... ret.append(line.rstrip()) # OR ret.append(line)
... if ret:
... yield ret
...
>>>
>>> tagged_test_sentences = [[tuple(token.split('\t')) for token in sent] for sent in per_section(open('test.txt'))]
>>> tagged_test_sentences
[[('What', 'foo'), ('is', 'foo'), ('this', 'foo'), ('sentence', 'bar'), ('?', '?')], [('How', 'foo'), ('about', 'foo'), ('that', 'foo'), ('sentence', 'bar'), ('?', '?')]]
이제/열차 술래 평가할 수 :
>>> from lazyme import per_section
>>> tagged_train_sentences = [[tuple(token.split('\t')) for token in sent] for sent in per_section(open('train.txt'))]
>>> from nltk.tag.perceptron import PerceptronTagger
>>> pct = PerceptronTagger(load=False)
>>> pct.train(tagged_train_sentences)
>>> pct.tag('Where do I find a foo bar sentence ?'.split())
[('Where', 'foo'), ('do', 'foo'), ('I', '.'), ('find', 'foo'), ('a', 'foo'), ('foo', 'bar'), ('bar', 'foo'), ('sentence', 'bar'), ('?', '.')]
>>> tagged_test_sentences = [[tuple(token.split('\t')) for token in sent] for sent in per_section(open('test.txt'))]
>>> pct.evaluate(tagged_test_sentences)
0.8
이 감사를! 이것은 훌륭한 설명이었다. – ellen