데이터 프레임의 변수에서 정보를 추출하려고합니다. 나는 R 3.3.3을 사용하고있다. 로 포맷문자열에서 연속 된 콜론으로 문자를 추출합니다.
정보는 다음과 같습니다
t <- "China: Officially the People's Republic of China (PRC), is a unitary sovereign state in East Asia and the world's most populous country, with a population of around 1.404 billion. GUAM: Is an unincorporated and organized territory of the United States in Micronesia in the western Pacific Ocean. MICRONESIA: Is a subregion of Oceania, comprising thousands of small islands in the western Pacific Ocean. It has a shared cultural history with two other island regions, Polynesia to the east and Melanesia to the south. DOMINCAN REPUBLIC: Is a country located on the island of Hispaniola, in the Greater Antilles archipelago of the Caribbean region."
내가 지금처럼 별도의 변수로 각 섹션을 분해하고 싶습니다 :
w = "China: Officially the People's Republic of China (PRC), is a unitary sovereign state in East Asia and the world's most populous country, with a population of around 1.404 billion."
x = "GUAM: Is an unincorporated and organized territory of the United States in Micronesia in the western Pacific Ocean."
y = "MICRONESIA: Is a subregion of Oceania, comprising thousands of small islands in the western Pacific Ocean. It has a shared cultural history with two other island regions, Polynesia to the east and Melanesia to the south."
z = "DOMINCAN REPUBLIC: Is a country located in the island of Hispaniola, in the Greater Antilles archipelago of the Caribbean region."
나는이 정보를 추출하려고 몇 가지 어려움을 겪고 있어요. this 및 this과 같은 질문은 매우 도움이되었습니다. 이것들로부터, 나는 어떤 형태의 stringr/gsub가이 정보를 얻기 위해 사용될 수 있다고했지만, gsub 문 내에서 범위를 지정하는 방법을 알 수는 없다.
내가 어떻게 제 1 부분 끌어 운동 할 수 있었다 :
>test4 <- gsub("(.*{1})(:.*)","\\1", t)
[1] "CHINA: Officially the People's Republic of China (PRC), is a unitary sovereign state in East Asia and the world's most populous country, with a population of around 1.404 billion. GUAM: Is an unincorporated and organized territory of the United States in Micronesia in the western Pacific Ocean. MICRONESIA: Is a subregion of Oceania, comprising thousands of small islands in the western Pacific Ocean. It has a shared cultural history with two other island regions, Polynesia to the east and Melanesia to the south. DOMINCAN REPUBLIC"
내 전반적인 질문을 준다 :
는[1] "CHINA: Officially the People's Republic of China (PRC), is a unitary sovereign state in East Asia and the world's most populous country, with a population of around 1.404 billion. GUAM: Is an unincorporated and organized territory of the United States in Micronesia in the western Pacific Ocean. MICRONESIA: Is a subregion of Oceania, comprising thousands of small islands in the western Pacific Ocean. It has a shared cultural history with two other island regions, Polynesia to the east and Melanesia to the south. DOMINCAN REPUBLIC"
이 될 것이다 만약 문자열의 끝에서 "DOMINICAN REPUBLIC"부분을 정리할 필요가 없다면 좋을 것입니다. 요약
:
당신은 콜론의 연속에 의해 문자열에서 문자를 추출하는 방법1.? (1 ~ 2 콜론, 2 ~ 3 등)
2. 콜론의 앞부분에도 단어를 유지할 수있는 방법이 있습니까?
모든 정보 또는 지침을 주시면 감사하겠습니다.
위대한 작품! 데이터를 어떻게 분할하고 있는지 완전히 이해하는 데 시간이 걸릴 것입니다.하지만 대단히 감사합니다! – Jbnimble
대단히 감사합니다! 정말 고마워! – Jbnimble