Skip to content Skip to sidebar Skip to footer

Python Scrapy Parse Extracted Link With Another Function

I am new to scrapy i am trying to scrape yellowpages for learning purposes everything works fine but i want the email address, but to do that i need to visit links extracted inside

Solution 1:

You are yielding a dict with a Request inside of it, Scrapy won't dispatch it because it doesn't know it's there (they don't get dispatched automatically after creating them). You need to yield the actual Request.

In the parse_email function, in order to "remember" which item each email belongs to, you will need to pass the rest of the item data alongside the request. You can do this with the meta argument.

Example:

in parse:

yield scrapy.Request(url, callback=self.parse_email, meta={'item': {
    'name': brickset.css(NAME_SELECTOR).extract_first(),
    'address': brickset.css(ADDRESS_SELECTOR).extract_first(),
    'phone': brickset.css(PHONE).extract_first(),
    'website': brickset.css(WEBSITE).extract_first(),
}})

in parse_email:

item = response.meta['item']  # The item this email belongs to
item['email'] = response.xpath(EMAIL_SELECTOR).extract_first().replace('mailto:', '')
return item

Post a Comment for "Python Scrapy Parse Extracted Link With Another Function"