Extracting Images In Scrapy
I've read through a few other answers here but I'm missing something fundamental. I'm trying to extract the images from a website with a CrawlSpider. settings.py BOT_NAME = 'healt
Solution 1:
If you want to use the standard ImagesPipeline
, you need to change your parse_items
method to something like:
import urlparse
...
def parse_items(self, response):
content = Selector(response=response).xpath('//body')
for nodes in content:
# build absolute URLs
img_urls = [urlparse.urljoin(response.url, src)
for src in nodes.xpath('//img/@src').extract()]
item = HealthycommItem()
item['page_heading'] = nodes.xpath("//title").extract()
item["page_title"] = nodes.xpath("//h1/text()").extract()
item["page_link"] = response.url
item["page_content"] = nodes.xpath('//div[@class="CategoryDescription"]').extract()
# use "image_urls" instead of "image_url"
item['image_urls'] = img_urls
yield item
And your item definition needs "images
" and "image_urls
" fields (plural, not singular)
The other way is to set IMAGES_URLS_FIELD
and IMAGES_RESULT_FIELD
to fit your item definition
Post a Comment for "Extracting Images In Scrapy"