2017-12-16 27 views
1

BeautifulSoup를 사용하여 웹 사이트에서 제품 크기를 얻으려고하고 있지만 여기에 붙어 있습니다.BeautifulSoup의 li 태그 사이에 span 태그의 텍스트를 가져 오는 방법은 무엇입니까?

S, M, L, XL, XXL, XXXL, 4XL, 5XL 

코드는 :

import bs4 

from urllib.request import urlopen as uReq 

from bs4 import BeautifulSoup as soup 

myurl = 'https://www.aliexpress.com/item/Vfemage-Womens-Elegant-Ruched-Bow-Contrast-Patchwork-3-4-Sleeve-Vintage-Pinup-Work-Office-Party-Fitted/32831085887.html?spm=2114.search0103.3.12.iQlXqu&ws_ab_test=searchweb0_0,searchweb201602_3_10152_10065_10151_10344_10068_10345_10342_10325_10343_51102_10546_10340_10548_10341_10609_10541_10084_10083_10307_10610_10539_10312_10313_10059_10314_10534_100031_10604_10603_10103_10605_10594_10142_10107,searchweb201603_25,ppcSwitch_5&algo_expid=a3e03a67-d922-4c90-aba7-d3cc80101a75-1&algo_pvid=a3e03a67-d922-4c90-aba7-d3cc80101a75&rmStoreLevelAB=0' 

uClient = uReq(myurl) 

page_html = uClient.read() 

uClient.close() 

page_soup = soup(page_html, "html.parser") 

size = page_soup.findAll("ul",{"id":"j-sku-list-2"}) 
print(size) 

가 반환

[ 
<ul class="sku-attr-list util-clearfix" data-sku-prop-id="5" data-sku-show-type="none" id="j-sku-list-2"> 
    <li><a data-role="sku" data-sku-id="100014064" href="javascript:void(0)" id="sku-2-100014064"><span>S</span></a></li> 
    <li><a data-role="sku" data-sku-id="361386" href="javascript:void(0)" id="sku-2-361386"><span>M</span></a></li> 
    <li><a data-role="sku" data-sku-id="361385" href="javascript:void(0)" id="sku-2-361385"><span>L</span></a></li> 
    <li><a data-role="sku" data-sku-id="100014065" href="javascript:void(0)" id="sku-2-100014065"><span>XL</span></a></li> 
    <li><a data-role="sku" data-sku-id="4182" href="javascript:void(0)" id="sku-2-4182"><span>XXL</span></a></li> 
    <li><a data-role="sku" data-sku-id="4183" href="javascript:void(0)" id="sku-2-4183"><span>XXXL</span></a></li> 
    <li><a data-role="sku" data-sku-id="200000990" href="javascript:void(0)" id="sku-2-200000990"><span>4XL</span></a></li> 
    <li><a data-role="sku" data-sku-id="200000991" href="javascript:void(0)" id="sku-2-200000991"><span>5XL</span></a></li> 
</ul>] 

답변

0

당신은 li 요소가 get_text()를 호출을 찾고 더 아래로 ul로 볼 필요가 난 단지 텍스트를 얻을 필요 각 :

sizes = page_soup.find("ul", {"id":"j-sku-list-2"}).find_all("li") 
print([size.get_text(strip=True) for size in sizes]) 
# prints ['S', 'M', 'L', 'XL', 'XXL', 'XXXL', '4XL', '5XL'] 
더 간결하게

또는 CSS selector A를 :

sizes = page_soup.select("ul#j-sku-list-2 li")