2017-12-24 30 views
0

일부 웹 사이트의 whois 레코드를 수집하기 위해 파이썬 whois 라이브러리를 사용하려고합니다.파이썬 whois 라이브러리는 활성 도메인 이름에 대해 아무것도 반환하지 않습니다.

문제는 활성 도메인 이름 인 nih.gov와 같은 일부 웹 사이트에는 아무 것도 없다는 것입니다.

w = whois.whois("nih.gov") 
print w 
{u'updated_date': None, u'status': u'ACTIVE', u'name': None, u'dnssec': None, u'city': None, u'expiration_date': None, u'zipcode': None, u'domain_name': u'NIH.GOV', u'country': None, u'whois_server': None, u'state': None, u'registrar': None, u'referral_url': None, u'address': None, u'name_servers': None, u'org': None, u'creation_date': None, u'emails': None} 

문제점과 라이브러리 또는 모든 상황을 어떻게 처리해야합니까?

+1

[whois nih.gov] (https://www.whois.com/whois/nih.gov)와 [whois stackoverflow.com] (https://www.whois.com/whois/)을 비교해보십시오. stackoverflow.com). 이것이 바로 'whois'가 제공하는 모든 정보입니다. – unutbu

답변

1

다음은 작업을 수행 할 수있는 코드입니다.

import sys 
import socket 
from datetime import datetime as dt 
import time 

def whois(ip): 

    s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) 
    s.connect(("whois.arin.net", 43)) 
    s.send(('n ' + ip + '\r\n').encode()) 

    response = b"" 

    # setting time limit in secondsmd 
    startTime = time.mktime(dt.now().timetuple()) 
    timeLimit = 3 
    while True: 
     elapsedTime = time.mktime(dt.now().timetuple()) - startTime 
     data = s.recv(4096) 
     response += data 
     if (not data) or (elapsedTime >= timeLimit): 
      break 
    s.close() 

    print(response.decode()) 

def main(): 
    domain = sys.argv[1]; 
    ip = socket.gethostbyname(domain); 
    whois(ip) 

main() 
예를 들어

:

c:\Temp>py test.py www.google.com 

# 
# ARIN WHOIS data and services are subject to the Terms of Use 
# available at: https://www.arin.net/whois_tou.html 
# 
# If you see inaccuracies in the results, please report at 
# https://www.arin.net/public/whoisinaccuracy/index.xhtml 
# 


# 
# The following results may also be obtained via: 
# https://whois.arin.net/rest/nets;q=216.58.213.196?showDetails=true&showARIN=false&showNonArinTopLevelNet=false&ext=netref2 
# 

NetRange:  216.58.192.0 - 216.58.223.255 
CIDR:   216.58.192.0/19 
NetName:  GOOGLE 
NetHandle:  NET-216-58-192-0-1 
Parent:   NET216 (NET-216-0-0-0-0) 
NetType:  Direct Allocation 
OriginAS:  AS15169 
Organization: Google LLC (GOGL) 
RegDate:  2012-01-27 
Updated:  2012-01-27 
Ref:   https://whois.arin.net/rest/net/NET-216-58-192-0-1 


OrgName:  Google LLC 
OrgId:   GOGL 
Address:  1600 Amphitheatre Parkway 
City:   Mountain View 
StateProv:  CA 
PostalCode:  94043 
Country:  US 
RegDate:  2000-03-30 
Updated:  2017-12-21 
Ref:   https://whois.arin.net/rest/org/GOGL 


OrgAbuseHandle: ABUSE5250-ARIN 
OrgAbuseName: Abuse 
OrgAbusePhone: +1-650-253-0000 
OrgAbuseEmail: [email protected] 
OrgAbuseRef: https://whois.arin.net/rest/poc/ABUSE5250-ARIN 

OrgTechHandle: ZG39-ARIN 
OrgTechName: Google LLC 
OrgTechPhone: +1-650-253-0000 
OrgTechEmail: [email protected] 
OrgTechRef: https://whois.arin.net/rest/poc/ZG39-ARIN 


# 
# ARIN WHOIS data and services are subject to the Terms of Use 
# available at: https://www.arin.net/whois_tou.html 
# 
# If you see inaccuracies in the results, please report at 
# https://www.arin.net/public/whoisinaccuracy/index.xhtml 
# 

그리고 특히 www.nih.gov을 위해 우리가 얻을 :

c:\Temp>py test.py www.nih.gov 

# 
# ARIN WHOIS data and services are subject to the Terms of Use 
# available at: https://www.arin.net/whois_tou.html 
# 
# If you see inaccuracies in the results, please report at 
# https://www.arin.net/public/whoisinaccuracy/index.xhtml 
# 


# 
# The following results may also be obtained via: 
# https://whois.arin.net/rest/nets;q=23.21.241.1?showDetails=true&showARIN=false&showNonArinTopLevelNet=false&ext=netref2 
# 

NetRange:  23.20.0.0 - 23.23.255.255 
CIDR:   23.20.0.0/14 
NetName:  AMAZON-EC2-USEAST-10 
NetHandle:  NET-23-20-0-0-1 
Parent:   NET23 (NET-23-0-0-0-0) 
NetType:  Direct Allocation 
OriginAS:  AS16509 
Organization: Amazon.com, Inc. (AMAZO-4) 
RegDate:  2011-09-19 
Updated:  2014-09-03 
Comment:  The activity you have detected originates from a dynamic hosting environment. 
Comment:  For fastest response, please submit abuse reports at http://aws-portal.amazon.com/gp/aws/html-forms-controller/contactus/AWSAbuse 
Comment:  For more information regarding EC2 see: 
Comment:  http://ec2.amazonaws.com/ 
Comment:  All reports MUST include: 
Comment:  * src IP 
Comment:  * dest IP (your IP) 
Comment:  * dest port 
Comment:  * Accurate date/timestamp and timezone of activity 
Comment:  * Intensity/frequency (short log extracts) 
Comment:  * Your contact details (phone and email) Without these we will be unable to identify the correct owner of the IP address at that point in time. 
Ref:   https://whois.arin.net/rest/net/NET-23-20-0-0-1 


OrgName:  Amazon.com, Inc. 
OrgId:   AMAZO-4 
Address:  Amazon Web Services, Inc. 
Address:  P.O. Box 81226 
City:   Seattle 
StateProv:  WA 
PostalCode:  98108-1226 
Country:  US 
RegDate:  2005-09-29 
Updated:  2017-01-28 
Comment:  For details of this service please see 
Comment:  http://ec2.amazonaws.com/ 
Ref:   https://whois.arin.net/rest/org/AMAZO-4 


OrgAbuseHandle: AEA8-ARIN 
OrgAbuseName: Amazon EC2 Abuse 
OrgAbusePhone: +1-206-266-4064 
OrgAbuseEmail: [email protected] 
OrgAbuseRef: https://whois.arin.net/rest/poc/AEA8-ARIN 

OrgTechHandle: ANO24-ARIN 
OrgTechName: Amazon EC2 Network Operations 
OrgTechPhone: +1-206-266-4064 
OrgTechEmail: [email protected] 
OrgTechRef: https://whois.arin.net/rest/poc/ANO24-ARIN 

OrgNOCHandle: AANO1-ARIN 
OrgNOCName: Amazon AWS Network Operations 
OrgNOCPhone: +1-206-266-4064 
OrgNOCEmail: [email protected] 
OrgNOCRef: https://whois.arin.net/rest/poc/AANO1-ARIN 


# 
# ARIN WHOIS data and services are subject to the Terms of Use 
# available at: https://www.arin.net/whois_tou.html 
# 
# If you see inaccuracies in the results, please report at 
# https://www.arin.net/public/whoisinaccuracy/index.xhtml 
# 

다른 옵션

것은 여기에 또 다른 옵션들 중 하나입니다.

이 코드 덩어리는 다른 서비스의 whois 요청 HTML을 사용하여 스크립트의 폴더에 파일을 만듭니다. 당신은 당신의 필요에 맞게 수정할 수 있습니다, 나는 방금 기본을 썼습니다.

import urllib.request 
import tempfile 
import io 
from bs4 import BeautifulSoup 
import sys 

def writeFile(text): 
    with io.open('whoisData.txt', "w", encoding="utf-8") as f: 
     f.write(text) 
    f.close() 

def readHTML(domain): 
    url = 'https://www.whois.com/whois/' + domain 
    html = urllib.request.urlopen(url).read() 
    soup = BeautifulSoup(html) 

    # kill all script and style elements 
    for script in soup(["script", "style"]): 
     script.extract() # rip it out 

    # get text 
    text = soup.get_text() 

    # break into lines and remove leading and trailing space on each 
    lines = (line.strip() for line in text.splitlines()) 
    # break multi-headlines into a line each 
    chunks = (phrase.strip() for line in lines for phrase in line.split(" ")) 
    # drop blank lines 
    text = '\n'.join(chunk for chunk in chunks if chunk) 
    writeFile(text) 

def main(): 
    domain = sys.argv[1] 
    readHTML(domain) 

main() 

는 (분석에 HTMLs) here 일부 참조했다.

+0

감사합니다 @ oBit91, 나는 그것을 사용할 것입니다. 매우 효율적입니다. 그리고 또 다른 것은, 파이썬 whois 라이브러리를 사용할 때 whois 응답의 레지스트라 레코드를 추출 할 수있었습니다. 우리는 response.decode(). registrar에 의해 레지스트라 이름을 추출 할 수 있습니까? 정말 미안해, 무릎 위에 쓰지 않아도 돼. –

+0

하지만 WHOIS 레코드를 반환하지 않는 것 같습니다. 필자는 python test.py bbc.com으로 테스트했으며 인쇄 된 값은 내 whois.whois ("bbc.com") 출력이 더 유효하지 않습니다. –

+0

@ShahroozPooryousef 온라인 웹을 사용하고 HTML 파일을 구문 분석하는 두 번째 옵션을 추가했습니다. 어쨌든 당신은 당신에게 가장 어울리는 것을 볼 수 있습니다. – oBit91