Beautiful Soup을 사용하거나 웹 스크래핑을하는 것이 처음입니다. 나는 지금까지 얼마나 멀리까지 갔는지 충분히 행복하지만 나는 약간의 장애물이있다.아름다운 수프로 포럼 스크랩하기 - 인용 답장을 제외하는 방법?
특정 스레드의 모든 게시물을 다 써 버리려합니다. 그러나 인용 된 답글에서 텍스트를 제외하고 싶습니다.
나는 빨간색 상자에 표시된 지역 내의 텍스트를 긁어없이 이러한 게시물의 텍스트를 긁어 싶습니다.
html에서 내가 제외하고 싶은 부분은 내가 어려움을 겪고있는 메시지를 선택해야하는 부분입니다. 내가 아래에있는 내 코드를 포함 한 HTML의 스크린 샷
<div id="post_message_39096267"><!-- google_ad_section_start --><div style="margin:20px; margin-top:5px; ">
<div class="smallfont" style="margin-bottom:2px">Quote:</div>
<table cellpadding="6" cellspacing="0" border="0" width="100%">
<tbody><tr>
<td class="alt2" style="border:1px inset">
<div>
Originally Posted by <strong>SAAN</strong>
<a href="http://www.city-data.com/forum/economics/2056372-minimum-wage-vs-liveable-wage-post33645660.html#post33645660" rel="nofollow"><img class="inlineimg li fs-viewpost" src="http://pics3.city-data.com/trn.gif" border="0" alt="View Post" title="View Post"></a>
</div>
<div style="font-style:italic">I agree with trying to buy a
cheap car outright, the problem is everyone I know that has done that $2-
5000 car, always ended up with these huge repair bills that are equivalent
to car payments. Most cars after 100K will need all sort of regulatr
maintance that is easily a $200 repair to go along with anything that may
break which is common with cars as they age.<br>
<br>
I have a 2yr old im making payments on and 14yr old car that is paid off,
but needs $2000 in maintenance. When car shopping this summer, I saw many
cars i could buy outright, but after adding u everything needed to make sure
it needs nothing, your back into the price range of a car payment.</div>
</td>
</tr>
</tbody></table>
</div>Depends on how long the car loan would be stretched. Just because you
can get an 8 year loan and reduce payments to a level like the repairs on
your old car doesn't make it a good idea, especially for new cars that <a
href="/knowledge/Depreciation.html" title="View 'depreciate' definition from
Wikipedia" class="knldlink" rel="nofollow">depreciate</a> quickly. You'd
just be putting yourself into negative equity territory.<!--
google_ad_section_end --></div>
을 포함했다 : 희망이 당신은 내가 무슨 말을하고 무엇을 이해하는 데 도움이 될 것입니다. 당신이 나에게 일을하고 말 그대로 9 살 것처럼 나에게 어떤 일을 설명 할 것입니다 무슨의 코드 예제를 줄 수있는 경우
from bs4 import BeautifulSoup
import urllib2
num_pages = 101
page_range = range(1,num_pages+1)
clean_posts = []
for page in page_range:
print("Reading page: ", page, "...")
if page == 1:
page_url = urllib2.urlopen('http://www.city-data.com/forum/economics/2056372-minimum-wage-vs-liveable-wage.html')
else:
page_url = urllib2.urlopen('http://www.city-data.com/forum/economics/2056372-minimum-wage-vs-liveable-wage'+'-'+str(page)+'.html')
soup = BeautifulSoup(page_url)
postData = soup.find_all("div", id=lambda value: value and value.startswith("post_message_"))
posts = []
for post in postData:
posts.append(BeautifulSoup(str(post)).get_text().encode("utf-8").strip().replace("\t", ""))
posts_stripped = [x.replace("\n","") for x in posts]
clean_posts.append(posts_stripped)
마지막으로, 나는 대단히 감사하겠습니다!
건배 Diarmaid
예! 고맙습니다!!!! –