http://www.immihelp.com/h1b-sponsoring-companies-database/display-2-2010.html의 일부 열을 CSV 시트에서 추출하려고합니다.BeautifulSoup의 for 루프 관련 문제
from bs4 import BeautifulSoup
import urllib2
import csv
f = csv.writer(open("H1B_apps.csv", "w"))
f.writerow(["Name", "Jobs", "Positions", "Wage", "City", "State", "Zip"]) # Write column headers as the first line
for x in range (2,5):
soup = BeautifulSoup(urllib2.urlopen('http://www.immihelp.com/h1b-sponsoring-companies-database/display-'+str(x)+'-2010.html').read())
table = soup.find('table', cellspacing = '1')
rows = table.findAll('tr')
for tr in rows:
cols = tr.findAll('nobr')
for data in cols:
name = cols[0].findAll(text=True)
jobs = cols[1].findAll(text=True)
position = cols[2].findAll(text=True)
wage = cols[3].findAll(text=True)
city = cols[4].findAll(text=True)
state = cols[5].findAll(text=True)
zip = cols[6].findAll(text=True)
print(name,jobs,position,wage,city,state,zip)
f.writerow([name,jobs,position,wage,city,state,zip])
코드가 일반적으로 잘 작동하는 것 같습니다. 그러나 나는 다음과 같은 문제가 있습니다
- 출력이 자체
- 출력 텍스트 [ 'U 텍스트를'] 오는 (? 내 루프,하지만 그것을 알아낼 수 없습니다에 뭔가 문제)를 7 번 반복 유지 - 나는 단지 텍스트 비트를 원한다. 어떤 도움을 주시면 감사하겠습니다
([u'22ND CENTURY TECHNOLOGIES, INC'], [u'1'], [u'COMPUTER SUPPORT SPECIALISTS'], [u'43139.0/Year'], [u'SOMERSET'], [u'NJ'], [u'08873']) ([u'22ND CENTURY TECHNOLOGIES, INC'], [u'1'], [u'COMPUTER SUPPORT SPECIALISTS'], [u'43139.0/Year'], [u'SOMERSET'], [u'NJ'], [u'08873']) ([u'22ND CENTURY TECHNOLOGIES, INC'], [u'1'], [u'COMPUTER SUPPORT SPECIALISTS'], [u'43139.0/Year'], [u'SOMERSET'], [u'NJ'], [u'08873']) ([u'22ND CENTURY TECHNOLOGIES, INC'], [u'1'], [u'COMPUTER SUPPORT SPECIALISTS'], [u'43139.0/Year'], [u'SOMERSET'], [u'NJ'], [u'08873']) ([u'22ND CENTURY TECHNOLOGIES, INC'], [u'1'], [u'COMPUTER PROGRAMMERS'], [u'55994.0/Year'], [u'SOMERSET'], [u'NJ'], [u'08873']) ([u'22ND CENTURY TECHNOLOGIES, INC'], [u'1'], [u'COMPUTER PROGRAMMERS'], [u'55994.0/Year'], [u'SOMERSET'], [u'NJ'], [u'08873']) ([u'22ND CENTURY TECHNOLOGIES, INC'], [u'1'], [u'COMPUTER PROGRAMMERS'], [u'55994.0/Year'], [u'SOMERSET'], [u'NJ'], [u'08873']) ([u'22ND CENTURY TECHNOLOGIES, INC'], [u'1'], [u'COMPUTER PROGRAMMERS'], [u'55994.0/Year'], [u'SOMERSET'], [u'NJ'], [u'08873']) ([u'22ND CENTURY TECHNOLOGIES, INC'], [u'1'], [u'COMPUTER PROGRAMMERS'], [u'55994.0/Year'], [u'SOMERSET'], [u'NJ'], [u'08873']) ([u'22ND CENTURY TECHNOLOGIES, INC'], [u'1'], [u'COMPUTER PROGRAMMERS'], [u'55994.0/Year'], [u'SOMERSET'], [u'NJ'], [u'08873']) ([u'22ND CENTURY TECHNOLOGIES, INC'], [u'1'], [u'COMPUTER PROGRAMMERS'], [u'55994.0/Year'], [u'SOMERSET'], [u'NJ'], [u'08873']) ([u'22ND CENTURY TECHNOLOGIES, INC'], [u'1'], [u'COMPUTER PROGRAMMERS'], [u'67995.0/Year'], [u'SOMERSET'], [u'NJ'], [u'08873']) ([u'22ND CENTURY TECHNOLOGIES, INC'], [u'1'], [u'COMPUTER PROGRAMMERS'], [u'67995.0/Year'], [u'SOMERSET'], [u'NJ'], [u'08873']) ([u'22ND CENTURY TECHNOLOGIES, INC'], [u'1'], [u'COMPUTER PROGRAMMERS'], [u'67995.0/Year'], [u'SOMERSET'], [u'NJ'], [u'08873']) ([u'22ND CENTURY TECHNOLOGIES, INC'], [u'1'], [u'COMPUTER PROGRAMMERS'], [u'67995.0/Year'], [u'SOMERSET'], [u'NJ'], [u'08873']) ([u'22ND CENTURY TECHNOLOGIES, INC'], [u'1'], [u'COMPUTER PROGRAMMERS'], [u'67995.0/Year'], [u'SOMERSET'], [u'NJ'], [u'08873'])
: 여기
출력의 샘플입니다. 감사합니다.
'findAll'은 단지 하나 이상인 경우 * all *을 찾을 수 있도록 설계되었습니다. 그래서'findAll'의 출력은 발견 된 모든것의 목록입니다. 단 하나의 항목이 아닙니다. 첫 번째 파일을 찾으려면 목록의 첫 번째 요소 ('findAll (...) [0]')에 액세스하거나 첫 번째 위치에서'find'를 사용하십시오. – poke
감사합니다. (findAll (...) [0]) 시도 할 때 cols [5]에 대한 IndexError가 있습니다. 찾기를 시도했지만 효과적 이었지만 여전히 7 명의 데이터 엔터티를 얻습니다. – user3316270