2017-10-28 13 views
0

제발 도와주세요. 나는 스플래쉬가 렌더링 된 HTML 응답했다하지 않은 이유를 이해하려고 노력 붙어 :스플래시의 응답으로 javascript가 html로 렌더링되지 않습니다

  • 첫째, 성공적으로 엔드 포인트 로드, 그리고
  • SplashRequest FormRequest scrapy와 함께 기록을하지만 난이 response.body를 인쇄 할 때, 페이지가 렌더링되지 않았습니다.

추가 정보 : -이 페이지는 아래로 스크롤 할 때 더 많은 결과를 추가합니다. - page.com은 실제 웹 페이지가 아닙니다. 고맙습니다!

import scrapy 
    from scrapy_splash import SplashRequest,SplashFormRequest 

    class LoginSpider(scrapy.Spider): 
     name = 'page' 
     start_urls = ['https://www.page.com'] 

     def parse(self, response): 
      return scrapy.FormRequest(
      'https://www.page.com/login/loginInitAction.do?method=processLogin', 
      formdata={'username':'userid','password':'key', 'remember':'on'}, 
     callback=self.after_login 
    ) 

     def after_login(self, response): 

      yield SplashRequest("https://www.page.com/search/all/simple?typeaheadTermType=&typeaheadTermId=&searchType=21&keywords=&pageValue=22", self.parse_page2, meta={ 
      'splash': { 
       'endpoint': 'render.html', 
       'args': {'wait': 10, 'render_all': 1,'html': 1}, 
      } 
     }) 



     def parse_page2(self, response): 

      print(response.body) 

      return 

CMD는

2017-10-28 11:53:43 [scrapy.utils.log] INFO: Scrapy 1.4.0 started (bot: scrapybot) 
2017-10-28 11:53:43 [scrapy.utils.log] INFO: Overridden settings: {'SPIDER_LOADER_WARN_ONLY': True} 
2017-10-28 11:53:43 [scrapy.middleware] INFO: Enabled extensions: 
['scrapy.extensions.corestats.CoreStats', 
'scrapy.extensions.telnet.TelnetConsole', 
'scrapy.extensions.logstats.LogStats'] 
2017-10-28 11:53:43 [scrapy.middleware] INFO: Enabled downloader 
middlewares: 
['scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware', 
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware', 
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware', 
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware', 
'scrapy.downloadermiddlewares.retry.RetryMiddleware', 
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware', 
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware', 
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware', 
'scrapy_splash.SplashCookiesMiddleware', 
'scrapy_splash.SplashMiddleware', 
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware', 
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware', 
'scrapy.downloadermiddlewares.stats.DownloaderStats'] 
2017-10-28 11:53:43 [scrapy.middleware] INFO: Enabled spider middlewares: 
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware', 
'scrapy_splash.SplashDeduplicateArgsMiddleware', 
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware', 
'scrapy.spidermiddlewares.referer.RefererMiddleware', 
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware', 
'scrapy.spidermiddlewares.depth.DepthMiddleware'] 
2017-10-28 11:53:43 [scrapy.middleware] INFO: Enabled item pipelines: 
[] 
2017-10-28 11:53:43 [scrapy.core.engine] INFO: Spider opened 
2017-10-28 11:53:43 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 
2017-10-28 11:53:43 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023 
2017-10-28 11:53:44 [scrapy.downloadermiddlewares.redirect] DEBUG: 
Redirecting (301) to <GET https://www.page.com/technology/home.jsp> from 
<GET https://www.page.com> 
2017-10-28 11:53:45 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.page.com/technology/home.jsp> (referer: None) 
2017-10-28 11:53:45 [scrapy.core.engine] DEBUG: Crawled (200) <POST https://www.page.com/login/loginInitAction.do?method=processLogin> (referer: https://www.page.com/technology/home.jsp) 
2017-10-28 11:53:59 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.page.com/search/all/simple?typeaheadTermType=&typeaheadTermId=&searchType=21&keywords=&pageValue=1 via http://192.168.0.20:8050/render.html> (referer: None) 

답변

0

당신이 세션 쿠키를 보낼 필요가 로그인 할 수 있지만, render.html 엔드 포인트를 사용하는 경우 scrapy-시작 쿠키를 처리하지 않습니다. 쿠키를 작동 시키려면 다음과 같이 시도하십시오.

import scrapy 
from scrapy_splash import SplashRequest 

script = """ 
function main(splash) 
    splash:init_cookies(splash.args.cookies) 
    assert(splash:go(splash.args.url)) 
    assert(splash:wait(0.5)) 

    return { 
    url = splash:url(), 
    cookies = splash:get_cookies(), 
    html = splash:html(), 
    } 
end 
""" 

class MySpider(scrapy.Spider): 


    # ... 
    def parse(self, response): 
     # ... 
     yield SplashRequest(url, self.parse_result, 
      endpoint='execute', 
      cache_args=['lua_source'], 
      args={'lua_source': script}, 
     ) 

이 예는 scrapy-splash README에서 수정되었습니다. 왜 이것이 필요한지 더 잘 이해하려면 here을 참조하십시오.