BeautifulSoup 및 Selenium으로 웹 사이트 구문 분석하기

avg. 내가 성공적으로 웹 페이지의 소스 코드를 수집 할 수 있습니다,하지만 난 그것을 통해 구문 분석 문제 만에서 높은 임시 직원, 낮은 임시 직원, 강수량과 평균 값을주는 데 https://usclimatedata.com/climate/binghamton/new-york/united-states/usny0124 BeautifulSoup 및 Selenium으로 웹 사이트 구문 분석하기

: 그들을 긁어하여 실제 온도 온도 "역사"탭,하지만 나는 "없음"으로 유일한 결과를 얻지 않고 올바른 클래스/ID를 해결할 수 없습니다.

이

내가 마지막 줄은 높은 임시 직원 얻기 위해 시도되는으로, 지금까지 무엇을 가지고 : - align_right 및 temperature_red -이 두 개의 다른 클래스 모두의

from lxml import html 
from bs4 import BeautifulSoup 
from selenium import webdriver 

url = "https://usclimatedata.com/climate/binghamton/new-york/unitedstates/usny0124" 
browser = webdriver.Chrome() 
browser.get(url) 
soup = BeautifulSoup(browser.page_source, "lxml") 
data = soup.find("table", {'class': "align_right_climate_table_data_td_temperature_red"})

출처

2017-11-27 that1blondekid

이것은 아마도 https://codereview.stackexchange.com/ – kuanb

@kuanb에 더 적합한 질문 일 것입니다. codereview에서는 신속하게 닫을 수 있습니다. * 작업 코드 만 * 검토합니다. – alecxe

아, 감사합니다 @alecxe. 내 잘못이야. – kuanb

먼저했습니다 어떤 이유로 든 table_data_td을 추가했습니다. 그리고이 두 클래스를 갖는 요소는 td 요소이며 table이 아닙니다.

climate_table = soup.find(id="climate_table")

또 다른 중요한 일을 "타이밍"에 대한 가능성이 있음을 참고 : 당신이 div 요소가 id="climate_table"을 가진 찾고해야 원하는 경우

는 기후 표를 얻기 위해, 그것은 본다 여기에서 문제 - driver.page_source 값을 얻으면 기후 정보가 표시되지 않을 수 있습니다. 이것은 보통 페이지로 이동 후 Explicit Wait를 추가 접근 :

from selenium import webdriver 
from selenium.webdriver.common.by import By 
from selenium.webdriver.support.ui import WebDriverWait 
from selenium.webdriver.support import expected_conditions as EC 
from bs4 import BeautifulSoup 


url = "https://usclimatedata.com/climate/binghamton/new-york/unitedstates/usny0124" 
browser = webdriver.Chrome() 

try: 
    browser.get(url) 

    # wait for the climate data to be loaded 
    WebDriverWait(browser, 10).until(EC.presence_of_element_located((By.ID, "climate_table"))) 

    soup = BeautifulSoup(browser.page_source, "lxml") 
    climate_table = soup.find(id="climate_table") 

    print(climate_table.prettify()) 
finally: 
    browser.quit()

참고 안전하게 오류의 경우 브라우저를 닫을 것 try/finally의 추가 - 또한 브라우저 창을 "걸려"피하기 위해 도움이 될 것이다.

그리고 DataFrame으로 기후 정보 테이블을 자동으로 읽을 수있는 pandas.read_html()을 조사하십시오.

출처

2017-11-27 23:11:59 alecxe

BeautifulSoup 및 Selenium으로 웹 사이트 구문 분석하기

답변

관련 문제