2016-10-29 12 views
1

gensim에서 doc2vec를 구현하려고하지만 일부 오류가 있거나 웹에서 설명서 나 도움말이 충분하지 않습니다. 여기 내 작업 코드의 일부입니다doc2vec의 파이썬 간단한 구현?

from gensim.models import Doc2Vec 
from gensim.models.doc2vec import LabeledSentence 

class LabeledLineSentence(object): 
    def __init__(self, filename): 
     self.filename = filename 
    def __iter__(self): 
     with open(self.filename, 'r') as f: 
      for uid, line in enumerate(f): 
       print LabeledSentence(line.split(), tags=['TXT_%s' % uid]) 
       yield LabeledSentence(words=line.split(), tags=['TXT_%s' % uid]) 

sentences = LabeledLineSentence('myfile.txt') 

내 txt 파일은 어떻게 생겼는지 : 모델 초기화

1 hi how are you 
    2 hi how are you 
    3 hi how are you 
    4 its such a great day 
    5 its such a great day 
    6 its such a great day 
    7 i like dogs 
    8 i like cats 
    9 i like snakes 
10 the ice cream was yummy 
11 the cake was awesome 

model = Doc2Vec(alpha=0.025, min_alpha=0.025, size=50, window=5, min_count=5, 
       dm=1, workers=8, sample=1e-5)  

예를 들어, 인쇄 출력 :

LabeledSentence(['hi', 'how', 'are', 'you'], ['TXT_0']) 
LabeledSentence(['hi', 'how', 'are', 'you'], ['TXT_1']) 
LabeledSentence(['hi', 'how', 'are', 'you'], ['TXT_2']) 
LabeledSentence(['its', 'such', 'a', 'great', 'day'], ['TXT_3']) 
LabeledSentence(['its', 'such', 'a', 'great', 'day'], ['TXT_4']) 

여기는 th 오류 :

for epoch in range(500): 
    try: 
     print 'epoch %d' % (epoch) 
     model.train(sentences) 
     model.alpha *= 0.99 
     model.min_alpha = model.alpha 
    except (KeyboardInterrupt, SystemExit): 
     break 

RuntimeError: you must first build vocabulary before training the model 

이유가 무엇입니까?

답변