1
gensim에서 doc2vec를 구현하려고하지만 일부 오류가 있거나 웹에서 설명서 나 도움말이 충분하지 않습니다. 여기 내 작업 코드의 일부입니다doc2vec의 파이썬 간단한 구현?
from gensim.models import Doc2Vec
from gensim.models.doc2vec import LabeledSentence
class LabeledLineSentence(object):
def __init__(self, filename):
self.filename = filename
def __iter__(self):
with open(self.filename, 'r') as f:
for uid, line in enumerate(f):
print LabeledSentence(line.split(), tags=['TXT_%s' % uid])
yield LabeledSentence(words=line.split(), tags=['TXT_%s' % uid])
sentences = LabeledLineSentence('myfile.txt')
내 txt 파일은 어떻게 생겼는지 : 모델 초기화
1 hi how are you
2 hi how are you
3 hi how are you
4 its such a great day
5 its such a great day
6 its such a great day
7 i like dogs
8 i like cats
9 i like snakes
10 the ice cream was yummy
11 the cake was awesome
model = Doc2Vec(alpha=0.025, min_alpha=0.025, size=50, window=5, min_count=5,
dm=1, workers=8, sample=1e-5)
예를 들어, 인쇄 출력 :
LabeledSentence(['hi', 'how', 'are', 'you'], ['TXT_0'])
LabeledSentence(['hi', 'how', 'are', 'you'], ['TXT_1'])
LabeledSentence(['hi', 'how', 'are', 'you'], ['TXT_2'])
LabeledSentence(['its', 'such', 'a', 'great', 'day'], ['TXT_3'])
LabeledSentence(['its', 'such', 'a', 'great', 'day'], ['TXT_4'])
여기는 th 오류 :
for epoch in range(500):
try:
print 'epoch %d' % (epoch)
model.train(sentences)
model.alpha *= 0.99
model.min_alpha = model.alpha
except (KeyboardInterrupt, SystemExit):
break
RuntimeError: you must first build vocabulary before training the model
이유가 무엇입니까?