How To Produce Custom JSON Output From Scrapy?
I am working on a Scrapy script which should make output like: { 'state': 'FL', 'date': '2017-11-03T14:52:26.007Z', 'games': [ { 'name':'Game1' }, { '
Solution 1:
Ref. https://stackoverflow.com/a/43698923/8964297
You could try to write your own pipeline like this:
Put this into your pipelines.py
file:
import json
class JsonWriterPipeline(object):
def open_spider(self, spider):
self.file = open('scraped_items.json', 'w')
# Your scraped items will be saved in the file 'scraped_items.json'.
# You can change the filename to whatever you want.
self.file.write("[")
def close_spider(self, spider):
self.file.write("]")
self.file.close()
def process_item(self, item, spider):
line = json.dumps(
dict(item),
indent = 4,
sort_keys = True,
separators = (',', ': ')
) + ",\n"
self.file.write(line)
return item
Then modify your settings.py
to include the following:
ITEM_PIPELINES = {
'YourSpiderName.pipelines.JsonWriterPipeline': 300,
}
Change YourSpiderName
to the correct name of your spider.
Note that the file gets written directly by the pipeline, so you don't have to specify file and format with the -o
and -t
command line parameters.
Hope this gets you closer to what you need.
Post a Comment for "How To Produce Custom JSON Output From Scrapy?"