2014-11-13 4 views

답변

1

난 당신이 유틸

gensim.utils.simple_preprocess(doc, deacc=False, min_len=2, max_len=15) 
Convert a document into a list of tokens. 

This lowercases, tokenizes, de-accents (optional). – the output are final tokens = unicode strings, that won’t be processed any further. 
에서 simple_preprocess을 살펴 수 있다고 생각