LDA 주제 모델링을 위해 Gensim 패키지를 사용하는 동안 IndexError

고유 토큰 360331 개가있는 총 54892 개의 문서가 있습니다. 사전의 길이는 내가이 스크립트를 실행 할 때마다 나는이 오류를 얻을 88LDA 주제 모델링을 위해 Gensim 패키지를 사용하는 동안 IndexError

mm = corpora.MmCorpus('PRC.mm') 
dictionary = corpora.Dictionary('PRC.dict') 
lda = gensim.models.ldamodel.LdaModel(corpus=mm, id2word=dictionary, num_topics=50, update_every=0, chunksize=19188, passes=650)

입니다 :

내가 인터넷에 확인

Traceback (most recent call last): 
File "C:\Users\modelDeTopics.py", line 19, in <module> 
lda = gensim.models.ldamodel.LdaModel(corpus=mm, id2word=dictionary, num_topics=50, update_every=0, chunksize=19188, passes=650) 
File "C:\Python27\lib\site-packages\gensim-0.8.6-py2.7.egg\gensim\models\ldamodel.py", line 265, in __init__ 
self.update(corpus) 
File "C:\Python27\lib\site-packages\gensim-0.8.6-py2.7.egg\gensim\models\ldamodel.py", line 445, in update 
self.do_estep(chunk, other) 
File "C:\Python27\lib\site-packages\gensim-0.8.6-py2.7.egg\gensim\models\ldamodel.py", line 365, in do_estep 
gamma, sstats = self.inference(chunk, collect_sstats=True) 
File "C:\Python27\lib\site-packages\gensim-0.8.6-py2.7.egg\gensim\models\ldamodel.py", line 318, in inference 
expElogbetad = self.expElogbeta[:, ids] 
IndexError: index 8 is out of bounds for axis 1 with size 8

, 내가이 RAM과 관련이있을 수 있다는 언급을 컴퓨터가 있습니다. Windows 7 32 비트와 4GB RAM을 사용하고 있습니다. 스크립트에서 어떤 변경을해야합니까?

도와주세요!

출처

2014-01-23 Animesh Pandey

dictionary에 문제가있는 것 같습니다. 88 개의 고유 단어가 합리적으로 들리지 않습니다.

전체 로그를 게시하면 더 많은 정보가 표시됩니다.

출처

2014-01-30 20:35:36 Radim

LDA 주제 모델링을 위해 Gensim 패키지를 사용하는 동안 IndexError

답변

관련 문제