import re 
import urlparse 

domain = ... 
html = ... 
links = re.findall('href=[\'"](.*?)[\'"]', html) 
links = [urlparse.urljoin(domain, link) for link in links if link]

출처

2010-05-27 01:45:43 hoju

뭔가

doc.search("a").map do |a| 
    url = a.attribute("href") 
    #this part could be a lot more robust, but you get the idea... 
    full_url = url.match("^http://") ? url : "http://somedomain.com/#{url}" 
end.select{|url| url.match("^http://somedomain.com")}

같은 : 도메인과 네이트 것은은 "HTTP"여기

파이썬 예입니다로 시작하지 않는 경우

출처

2010-05-27 02:13:09

노코 기리 나는 문서는 네 개의 링크로 포함 <a href="http://somedomain.com/somedir/example.html" rel="nofollow noreferrer">http://somedomain.com/somedir/example.html</a></p> <p>에있는 HTML 문서가

답변

관련 문제