완성도, 여기 내 대답의 구현입니다. 결국 S3FeedStorage
(starrify으로 권장)의 서브 클래스가 아닌 내 AWS 구성 (Jordan Phillips으로 권장)을 수정하는 것이 더 쉽다는 것을 알았습니다.
# Adapted from trcook/docker-scrapy
FROM python:alpine
RUN apk --update add libxml2-dev libxslt-dev libffi-dev gcc musl-dev libgcc openssl-dev
RUN pip install scrapy botocore awscli
RUN aws configure set aws_access_key_id foo
RUN aws configure set aws_secret_access_key bar
RUN aws configure set default.region eu-central-1
RUN aws configure set default.s3.signature_version s3v4
COPY . /scraper
WORKDIR /scraper
CMD ["scrapy", "crawl", "quotes"]
곳 foo
및 bar
는 각각 실제 AWS 액세스 키 ID 및 AWS 비밀 액세스 키입니다 : 나는 스크레이퍼를 실행하려면 다음 Dockerfile
을 사용했다.나는 docker build --tag quotes .
가 docker run quotes
다음 경우, 스크레이퍼는 오류없이 실행 :
2017-05-16 13:03:37 [scrapy.utils.log] INFO: Scrapy 1.3.3 started (bot: tutorial)
2017-05-16 13:03:37 [scrapy.utils.log] INFO: Overridden settings: {'BOT_NAME': 'tutorial', 'NEWSPIDER_MODULE': 'tutorial.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['tutorial.spiders']}
2017-05-16 13:03:37 [botocore.credentials] DEBUG: Looking for credentials via: env
2017-05-16 13:03:37 [botocore.credentials] DEBUG: Looking for credentials via: assume-role
2017-05-16 13:03:37 [botocore.credentials] DEBUG: Looking for credentials via: shared-credentials-file
2017-05-16 13:03:37 [botocore.credentials] INFO: Found credentials in shared credentials file: ~/.aws/credentials
2017-05-16 13:03:37 [botocore.loaders] DEBUG: Loading JSON file: /usr/local/lib/python3.6/site-packages/botocore/data/endpoints.json
2017-05-16 13:03:37 [botocore.loaders] DEBUG: Loading JSON file: /usr/local/lib/python3.6/site-packages/botocore/data/s3/2006-03-01/service-2.json
2017-05-16 13:03:37 [botocore.loaders] DEBUG: Loading JSON file: /usr/local/lib/python3.6/site-packages/botocore/data/_retry.json
2017-05-16 13:03:37 [botocore.client] DEBUG: Registering retry handlers for service: s3
2017-05-16 13:03:37 [botocore.hooks] DEBUG: Event creating-client-class.s3: calling handler <function add_generate_presigned_post at 0x7f8c2f2f6a60>
2017-05-16 13:03:37 [botocore.hooks] DEBUG: Event creating-client-class.s3: calling handler <function add_generate_presigned_url at 0x7f8c2f2f6840>
2017-05-16 13:03:37 [botocore.client] DEBUG: Switching signature version for service s3 to version s3v4 based on config file override.
2017-05-16 13:03:37 [botocore.endpoint] DEBUG: Setting s3 timeout as (60, 60)
2017-05-16 13:03:37 [botocore.client] DEBUG: Defaulting to S3 virtual host style addressing with path style addressing fallback.
2017-05-16 13:03:37 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.feedexport.FeedExporter',
'scrapy.extensions.logstats.LogStats']
2017-05-16 13:03:37 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2017-05-16 13:03:37 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2017-05-16 13:03:37 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2017-05-16 13:03:37 [scrapy.core.engine] INFO: Spider opened
2017-05-16 13:03:37 [botocore.credentials] DEBUG: Looking for credentials via: env
2017-05-16 13:03:37 [botocore.credentials] DEBUG: Looking for credentials via: assume-role
2017-05-16 13:03:37 [botocore.credentials] DEBUG: Looking for credentials via: shared-credentials-file
2017-05-16 13:03:37 [botocore.credentials] INFO: Found credentials in shared credentials file: ~/.aws/credentials
2017-05-16 13:03:37 [botocore.loaders] DEBUG: Loading JSON file: /usr/local/lib/python3.6/site-packages/botocore/data/endpoints.json
2017-05-16 13:03:37 [botocore.loaders] DEBUG: Loading JSON file: /usr/local/lib/python3.6/site-packages/botocore/data/s3/2006-03-01/service-2.json
2017-05-16 13:03:37 [botocore.loaders] DEBUG: Loading JSON file: /usr/local/lib/python3.6/site-packages/botocore/data/_retry.json
2017-05-16 13:03:37 [botocore.client] DEBUG: Registering retry handlers for service: s3
2017-05-16 13:03:37 [botocore.hooks] DEBUG: Event creating-client-class.s3: calling handler <function add_generate_presigned_post at 0x7f8c2f2f6a60>
2017-05-16 13:03:37 [botocore.hooks] DEBUG: Event creating-client-class.s3: calling handler <function add_generate_presigned_url at 0x7f8c2f2f6840>
2017-05-16 13:03:37 [botocore.client] DEBUG: Switching signature version for service s3 to version s3v4 based on config file override.
2017-05-16 13:03:37 [botocore.endpoint] DEBUG: Setting s3 timeout as (60, 60)
2017-05-16 13:03:37 [botocore.client] DEBUG: Defaulting to S3 virtual host style addressing with path style addressing fallback.
2017-05-16 13:03:37 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2017-05-16 13:03:37 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6023
2017-05-16 13:03:38 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://quotes.toscrape.com/robots.txt> (referer: None)
2017-05-16 13:03:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/1/> (referer: None)
2017-05-16 13:03:38 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/2/> (referer: None)
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Albert Einstein',
'tags': ['change', 'deep-thoughts', 'thinking', 'world'],
'text': '“The world as we have created it is a process of our thinking. It '
'cannot be changed without changing our thinking.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'J.K. Rowling',
'tags': ['abilities', 'choices'],
'text': '“It is our choices, Harry, that show what we truly are, far more '
'than our abilities.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Albert Einstein',
'tags': ['inspirational', 'life', 'live', 'miracle', 'miracles'],
'text': '“There are only two ways to live your life. One is as though nothing '
'is a miracle. The other is as though everything is a miracle.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Jane Austen',
'tags': ['aliteracy', 'books', 'classic', 'humor'],
'text': '“The person, be it gentleman or lady, who has not pleasure in a good '
'novel, must be intolerably stupid.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Marilyn Monroe',
'tags': ['be-yourself', 'inspirational'],
'text': "“Imperfection is beauty, madness is genius and it's better to be "
'absolutely ridiculous than absolutely boring.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Albert Einstein',
'tags': ['adulthood', 'success', 'value'],
'text': '“Try not to become a man of success. Rather become a man of value.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'André Gide',
'tags': ['life', 'love'],
'text': '“It is better to be hated for what you are than to be loved for what '
'you are not.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Thomas A. Edison',
'tags': ['edison', 'failure', 'inspirational', 'paraphrased'],
'text': "“I have not failed. I've just found 10,000 ways that won't work.”"}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Eleanor Roosevelt',
'tags': ['misattributed-eleanor-roosevelt'],
'text': '“A woman is like a tea bag; you never know how strong it is until '
"it's in hot water.”"}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/1/>
{'author': 'Steve Martin',
'tags': ['humor', 'obvious', 'simile'],
'text': '“A day without sunshine is like, you know, night.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Marilyn Monroe',
'tags': ['friends', 'heartbreak', 'inspirational', 'life', 'love', 'sisters'],
'text': "“This life is what you make it. No matter what, you're going to mess "
"up sometimes, it's a universal truth. But the good part is you get "
"to decide how you're going to mess it up. Girls will be your friends "
"- they'll act like it anyway. But just remember, some come, some go. "
"The ones that stay with you through everything - they're your true "
"best friends. Don't let go of them. Also remember, sisters make the "
"best friends in the world. As for lovers, well, they'll come and go "
'too. And baby, I hate to say it, most of them - actually pretty much '
"all of them are going to break your heart, but you can't give up "
"because if you give up, you'll never find your soulmate. You'll "
'never find that half who makes you whole and that goes for '
"everything. Just because you fail once, doesn't mean you're gonna "
'fail at everything. Keep trying, hold on, and always, always, always '
"believe in yourself, because if you don't, then who will, sweetie? "
'So keep your head high, keep your chin up, and most importantly, '
"keep smiling, because life's a beautiful thing and there's so much "
'to smile about.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'J.K. Rowling',
'tags': ['courage', 'friends'],
'text': '“It takes a great deal of bravery to stand up to our enemies, but '
'just as much to stand up to our friends.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Albert Einstein',
'tags': ['simplicity', 'understand'],
'text': "“If you can't explain it to a six year old, you don't understand it "
'yourself.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Bob Marley',
'tags': ['love'],
'text': '“You may not be her first, her last, or her only. She loved before '
'she may love again. But if she loves you now, what else matters? '
"She's not perfect—you aren't either, and the two of you may never be "
'perfect together but if she can make you laugh, cause you to think '
'twice, and admit to being human and making mistakes, hold onto her '
'and give her the most you can. She may not be thinking about you '
'every second of the day, but she will give you a part of her that '
"she knows you can break—her heart. So don't hurt her, don't change "
"her, don't analyze and don't expect more than she can give. Smile "
'when she makes you happy, let her know when she makes you mad, and '
"miss her when she's not there.”"}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Dr. Seuss',
'tags': ['fantasy'],
'text': '“I like nonsense, it wakes up the brain cells. Fantasy is a '
'necessary ingredient in living.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Douglas Adams',
'tags': ['life', 'navigation'],
'text': '“I may not have gone where I intended to go, but I think I have '
'ended up where I needed to be.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Elie Wiesel',
'tags': ['activism',
'apathy',
'hate',
'indifference',
'inspirational',
'love',
'opposite',
'philosophy'],
'text': "“The opposite of love is not hate, it's indifference. The opposite "
"of art is not ugliness, it's indifference. The opposite of faith is "
"not heresy, it's indifference. And the opposite of life is not "
"death, it's indifference.”"}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Friedrich Nietzsche',
'tags': ['friendship',
'lack-of-friendship',
'lack-of-love',
'love',
'marriage',
'unhappy-marriage'],
'text': '“It is not a lack of love, but a lack of friendship that makes '
'unhappy marriages.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Mark Twain',
'tags': ['books', 'contentment', 'friends', 'friendship', 'life'],
'text': '“Good friends, good books, and a sleepy conscience: this is the '
'ideal life.”'}
2017-05-16 13:03:38 [scrapy.core.scraper] DEBUG: Scraped from <200 http://quotes.toscrape.com/page/2/>
{'author': 'Allen Saunders',
'tags': ['fate', 'life', 'misattributed-john-lennon', 'planning', 'plans'],
'text': '“Life is what happens to us while we are making other plans.”'}
2017-05-16 13:03:38 [scrapy.core.engine] INFO: Closing spider (finished)
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function validate_ascii_metadata at 0x7f8c2f2b0ae8>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function sse_md5 at 0x7f8c2f2acea0>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function convert_body_to_file_like_object at 0x7f8c2f2b1268>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function validate_bucket_name at 0x7f8c2f2ace18>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_cache of <botocore.utils.S3RegionRedirector object at 0x7f8c2e7a7780>>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-parameter-build.s3.PutObject: calling handler <function generate_idempotent_uuid at 0x7f8c2f2aca60>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <function conditionally_calculate_md5 at 0x7f8c2f2acd90>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <function add_expect_header at 0x7f8c2f2b0378>
2017-05-16 13:03:38 [botocore.handlers] DEBUG: Adding expect 100 continue header to request.
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-call.s3.PutObject: calling handler <bound method S3RegionRedirector.set_request_url of <botocore.utils.S3RegionRedirector object at 0x7f8c2e7a7780>>
2017-05-16 13:03:38 [botocore.endpoint] DEBUG: Making request for OperationModel(name=PutObject) (verify_ssl=True) with params: {'url_path': '/apkmirror/quotes3.json', 'query_string': {}, 'method': 'PUT', 'headers': {'User-Agent': 'Botocore/1.5.49 Python/3.6.1 Linux/4.4.0-75-generic', 'Content-MD5': 'U+PeT0soEYWoCF4DMQXEzA==', 'Expect': '100-continue'}, 'body': <tempfile._TemporaryFileWrapper object at 0x7f8c2f22e2b0>, 'url': 'https://s3.eu-central-1.amazonaws.com/apkmirror/quotes3.json', 'context': {'client_region': 'eu-central-1', 'client_config': <botocore.config.Config object at 0x7f8c2e7a7438>, 'has_streaming_input': True, 'auth_type': None, 'signing': {'bucket': 'apkmirror'}}}
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event request-created.s3.PutObject: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x7f8c2e7a73c8>>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event choose-signer.s3.PutObject: calling handler <function set_operation_specific_signer at 0x7f8c2f2ac950>
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event before-sign.s3.PutObject: calling handler <function fix_s3_host at 0x7f8c2f42dd08>
2017-05-16 13:03:38 [botocore.auth] DEBUG: Calculating signature using v4 auth.
2017-05-16 13:03:38 [botocore.auth] DEBUG: CanonicalRequest:
PUT
/apkmirror/quotes3.json
content-md5:U+PeT0soEYWoCF4DMQXEzA==
host:s3.eu-central-1.amazonaws.com
x-amz-content-sha256:UNSIGNED-PAYLOAD
x-amz-date:20170516T130338Z
content-md5;host;x-amz-content-sha256;x-amz-date
UNSIGNED-PAYLOAD
2017-05-16 13:03:38 [botocore.auth] DEBUG: StringToSign:
AWS4-HMAC-SHA256
20170516T130338Z
20170516/eu-central-1/s3/aws4_request
929e3a39776d42c15c4c7c197c718f67b6105341ed4a269365c6e6ed88378a69
2017-05-16 13:03:38 [botocore.auth] DEBUG: Signature:
81a1c8014fa22d52d371a8aea10d47e0f32e8913dcc18b2f1210c7ce458311e4
2017-05-16 13:03:38 [botocore.endpoint] DEBUG: Sending http request: <PreparedRequest [PUT]>
2017-05-16 13:03:38 [botocore.vendored.requests.packages.urllib3.connectionpool] INFO: Starting new HTTPS connection (1): s3.eu-central-1.amazonaws.com
2017-05-16 13:03:38 [botocore.awsrequest] DEBUG: Waiting for 100 Continue response.
2017-05-16 13:03:38 [botocore.awsrequest] DEBUG: 100 Continue response seen, now sending request body.
2017-05-16 13:03:38 [botocore.vendored.requests.packages.urllib3.connectionpool] DEBUG: "PUT /apkmirror/quotes3.json HTTP/1.1" 200 0
2017-05-16 13:03:38 [botocore.parsers] DEBUG: Response headers: {'x-amz-id-2': 'WB/HgvEGKd7ysqcRa1vodr2znuevKA+fTTX/2elIAcID05t7Ex2G7UTM+rl/AhvIPeB+0gL4YaY=', 'x-amz-request-id': '9C449953B48DA63F', 'date': 'Tue, 16 May 2017 13:03:39 GMT', 'etag': '"53e3de4f4b281185a8085e033105c4cc"', 'content-length': '0', 'server': 'AmazonS3'}
2017-05-16 13:03:38 [botocore.parsers] DEBUG: Response body:
b''
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <botocore.retryhandler.RetryHandler object at 0x7f8c2e774e10>
2017-05-16 13:03:38 [botocore.retryhandler] DEBUG: No retry needed.
2017-05-16 13:03:38 [botocore.hooks] DEBUG: Event needs-retry.s3.PutObject: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7f8c2e7a7780>>
2017-05-16 13:03:38 [scrapy.extensions.feedexport] INFO: Stored jsonlines feed (20 items) in: s3://apkmirror/quotes3.json
2017-05-16 13:03:38 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 675,
'downloader/request_count': 3,
'downloader/request_method_count/GET': 3,
'downloader/response_bytes': 5976,
'downloader/response_count': 3,
'downloader/response_status_count/200': 2,
'downloader/response_status_count/404': 1,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2017, 5, 16, 13, 3, 38, 317079),
'item_scraped_count': 20,
'log_count/DEBUG': 75,
'log_count/INFO': 11,
'response_received_count': 3,
'scheduler/dequeued': 2,
'scheduler/dequeued/memory': 2,
'scheduler/enqueued': 2,
'scheduler/enqueued/memory': 2,
'start_time': datetime.datetime(2017, 5, 16, 13, 3, 37, 897491)}
2017-05-16 13:03:38 [scrapy.core.engine] INFO: Spider closed (finished)
을 또한 이러한 구성 파일에서 '포착'하는 나의 거미에 나는 더 이상 해당 AWS_ACCESS_KEY_ID
및 AWS_SECRET_ACCESS_KEY
설정을 구현해야합니다.
이것은 치료의 미해결 된 문제인 것처럼 보입니다. https://github.com/scrapy/scrapy/issues/2448 –