1
왜 문장의 스플리터/토크 나이저가 작동하지 않습니까? nltk는 잘 작동하는 것 같습니다. 여기 내 작은 경험이 있습니다.문장의 문장 토큰 화가 나쁜 (?)
import spacy
nlp = spacy.load('fr')
import nltk
text_fr = u"Je suis parti a la boulangerie. J'ai achete trois croissants. C'etait super bon."
nltk.sent_tokenize(text_fr)
# [u'Je suis parti a la boulangerie.',
# u"J'ai achete trois croissants.",
# u"C'etait super bon."
doc = nlp(text_fr)
for s in doc.sents: print s
# Je suis parti
# a la boulangerie. J'ai
# achete trois croissants. C'
# etait super bon.
영어에 대해서도 동일한 동작이 나타납니다. 텍스트의이 작품을 위해 :
text = u"I went to the library. I did not know what book to buy, but then the lady working there helped me. It was cool. I discovered a lot of new things."
나는 (nlp=spacy.load('en')
후) 적응으로 얻을 :이 대
I
went to the library. I
did not know what book to buy, but
then the lady working there helped me. It was cool. I discovered a
lot of new things.
을 좋아 보인다 NLTK로 :
[u'I went to the library.',
u'I did not know what book to buy, but then the lady working there helped me.',
u'It was cool.',
u'I discovered a lot of new things.']
: "현재 문장 의존성은 – dada
spacy 버전이 너무 오래되었습니다 (0.100), v2로 spacy가 예상대로 작동합니다. – dada
예, spacy 버전을 업데이트하십시오. – alvas