2016-11-18 18 views

이 코드로 quanteda NB에서 감정 분석을 예측하기 위해 노력하고있어 구현되지 :r에 quanteda 오류가 predict.textmodel_NB_fitted :

X_train <-c("I love this sandwich.", 
      "This is an amazing place!", 
      "I feel very good about these beers.", 
      "This is my best work.", 
      "What an awesome view", 
      "I do not like this restaurant", 
      "I am tired of this stuff.", 
      "I can't deal with this", 
      "He is my sworn enemy!", 
      "this guy is horrible.") 

Y_train <- c(1,1,1,1,1,0,0,0,0,0) 

Y_train <- c(1,1,1,1,1,0,0,0,0,0) 
X_test <- c("The beer was good.", 
      "I do not enjoy my job", 
      "I ain't feeling dandy today.", 
      "I feel amazing! pos", 
      "Gary is a friend of mine.", 
      "I can't believ I'm doing this.", 
      "very sad about Iran", 
      "You're the only one who can see this cause no one else is following me this is for you because you're pretty awesome", 
      "ok thats it you win.", 
      "My horsie is moving on Saturday morning.", 
      "times by like a million", 
      "but i'm proud.", 
      "i want a hug)") 
Y_test <- c(1,0,0,1,1,0,0,1,1,0,1,1,1) 
dfm_mat <- dfm(X_train) 
tfidf_mat <- tfidf(dfm_mat, normalize = TRUE) 
model <- textmodel_NB(tfidf_mat, Y_train, distribution = "multinomial") 

predict(model, X_test) 

을 그리고 다음과 같은 오류 메시지가 있어요 :

Error in newdata %*% t(log(object$PwGc)) : not-yet-implemented method for <character> %*% <dgeMatrix> 
5.stop(gettextf("not-yet-implemented method for <%s> %%*%% <%s>", class(x), class(y)), domain = NA) 
4.newdata %*% t(log(object$PwGc)) 
3.newdata %*% t(log(object$PwGc)) 
2.predict.textmodel_NB_fitted(model, X_test) 
1.predict(model, X_test) 

실행을 : quanteda_0.9.8.5
R 버전 3.3.1 (2016-06-21)
플랫폼 : x86_6 4-pc-linux-gnu (64-bit)
다음 실행 중 : 우분투 16.10

아무도 몰라요?



여기서 문제는 문자 벡터에 맞는 Naives Bayes 모델을 예측하려고한다는 것입니다. 오류 벡터가 문자 벡터에 대해 정의되어 있지는 않지만 오류 메시지가 의미하는 바는 분명합니다.

해결 방법은 dfm 개체에서 모델을 예측하는 것이지만 기능이 교육 dfm과 일치하는 모델을 예측하는 것입니다.

# this creates a test dfm, and matches its features to the training dfm 
dfm_test <- dfm_select(dfm(X_test), dfm_mat) 
## found 15 features from 36 supplied types in a dfm, padding 0s for another 21 

그런 다음 predict() 방법은 잘 작동 :

predict(model, dfm_test) 
## Predicted textmodel of type: Naive Bayes 
##    lp(1)  lp(0)  Pr(1) Pr(0) Predicted 
## text1 -4.2419639 -4.3728368 0.5327 0.4673   1 
## text2 -15.1799166 -14.8238632 0.4119 0.5881   0 
## text3 -4.2637198 -4.2239433 0.4901 0.5099   0 
## text4 -11.3125631 -11.5833225 0.5673 0.4327   1 
## text5 -7.9101340 -7.7336472 0.4560 0.5440   0 
## text6 -11.5324821 -11.2864767 0.4388 0.5612   0 
## text7 -7.7907806 -8.0525264 0.5651 0.4349   1 
## text8 -18.3944576 -18.5330895 0.5346 0.4654   1 
## text9 -0.6931472 -0.6931472 0.5000 0.5000   1 
## text10 -7.7792864 -7.7569503 0.4944 0.5056   0 
## text11 -4.3754953 -4.2186861 0.4609 0.5391   0 
## text12 -0.6931472 -0.6931472 0.5000 0.5000   1 
## text13 -4.2637198 -4.2239433 0.4901 0.5099   0 

들으 많은 켄. 내가 솔루션을 가지고있을 때 분명해 보인다 :) – alEx