Tidytext :: Unnest_Tokens와 함께 중첩 목록 열 접근법 및 Purrr 사용

각 행이 다른 사람을 나타내는 설문 조사 응답을 포함하는 데이터 프레임이 있습니다. 하나의 열 ("텍스트")은 자유 텍스트 텍스트 질문입니다. 다음Tidytext :: Unnest_Tokens와 함께 중첩 목록 열 접근법 및 Purrr 사용

Satisfaction<-c ("Satisfied","Satisfied","Dissatisfied","Satisfied","Dissatisfied") 
Text<-c("I'm very satisfied with the services", "Your service providers are always late which causes me a lot of frustration", "You should improve your staff training, service providers have bad customer service","Everything is great!","Service is bad") 
Gender<-c("M","M","F","M","F") 
df<-data.frame(Satisfaction,Text,Gender)

I : 나는 각 행에서 텍스트 분석을 할 수 있도록 내가 감정 점수, 단어 수, 등 여기

이 예를 들어 간단한 dataframe입니다 포함, Tidytext :: unnest_tokens를 사용하고 싶습니다 텍스트 열을 문자로 변환했습니다.

df$Text<-as.character(df$Text)

다음으로 id 열로 그룹화하고 데이터 프레임을 중첩했습니다.

df<-df%>%mutate(id=row_number())%>%group_by(id)%>%unnest_tokens(word,Text)%>%nest(-id)

지금까지 얻은 것 같은데 괜찮은 것 같지만 지금은 어떻게 중첩 목록 열 "단어"에서 작동하도록 purrr :: map 함수를 사용합니까? 예를 들어, 각 행에 단어 수와 함께 dplyr :: mutate를 사용하여 새 열을 만들려는 경우?

또한 "텍스트"열만 중첩 된 목록이되도록 데이터 프레임을 중첩하는 더 좋은 방법이 있습니까?

출처

2017-02-13 Mike

원하는 것이 분명하지 않습니다. 당신은'purrr :: nest'를 사용하지 않고도 텍스트 분석을 할 수 있습니다. 단지'unnest_tokens' 다음에 멈추십시오. 단어 열만 중첩 시키려면'nest (word)'를 사용할 수 있지만 작동하려면 먼저 데이터 프레임을 '그룹 해제'해야합니다 (또는 처음에는 id로 그룹화하지 마십시오) – FlorianGD

나는 purrr::map을 사용하여 modeling for different groups을 좋아하지만, 당신이하고있는 것에 대해 당신은 단지 똑 바른 dplyr를 고집 할 수 있다고 생각합니다.

이처럼 dataframe을 설정할 수 있습니다

library(dplyr) 
library(tidytext) 

Satisfaction <- c("Satisfied", 
        "Satisfied", 
        "Dissatisfied", 
        "Satisfied", 
        "Dissatisfied") 

Text <- c("I'm very satisfied with the services", 
      "Your service providers are always late which causes me a lot of frustration", 
      "You should improve your staff training, service providers have bad customer service", 
      "Everything is great!", 
      "Service is bad") 

Gender <- c("M","M","F","M","F") 

df <- data_frame(Satisfaction, Text, Gender) 

tidy_df <- df %>% 
    mutate(id = row_number()) %>% 
    unnest_tokens(word, Text)

그런 다음 찾기, 예를 들어, 라인 당 단어 수, 당신은 group_by 및 mutate를 사용할 수 있습니다.

tidy_df %>% 
    group_by(id) %>% 
    mutate(num_words = n()) %>% 
    ungroup 
#> # A tibble: 37 × 5 
#> Satisfaction Gender id  word num_words 
#>   <chr> <chr> <int>  <chr>  <int> 
#> 1  Satisfied  M  1  i'm   6 
#> 2  Satisfied  M  1  very   6 
#> 3  Satisfied  M  1 satisfied   6 
#> 4  Satisfied  M  1  with   6 
#> 5  Satisfied  M  1  the   6 
#> 6  Satisfied  M  1 services   6 
#> 7  Satisfied  M  2  your  13 
#> 8  Satisfied  M  2 service  13 
#> 9  Satisfied  M  2 providers  13 
#> 10 Satisfied  M  2  are  13 
#> # ... with 27 more rows

당신은 내부를 구현하여 감정 분석을 수행 할 수 있습니다

가입; some examples here을 확인하십시오.

출처

2017-04-07 19:43:13

감사합니다 도움과 예제! – Mike

Tidytext :: Unnest_Tokens와 함께 중첩 목록 열 접근법 및 Purrr 사용

답변

관련 문제