Python Scrapy Parse Extracted Link With Another Function

April 20, 2024 Post a Comment

I am new to scrapy i am trying to scrape yellowpages for learning purposes everything works fine but i want the email address, but to do that i need to visit links extracted inside

Solution 1:

You are yielding a dict with a Request inside of it, Scrapy won't dispatch it because it doesn't know it's there (they don't get dispatched automatically after creating them). You need to yield the actual Request.

In the parse_email function, in order to "remember" which item each email belongs to, you will need to pass the rest of the item data alongside the request. You can do this with the meta argument.

Example:

in parse:

yield scrapy.Request(url, callback=self.parse_email, meta={'item': {
    'name': brickset.css(NAME_SELECTOR).extract_first(),
    'address': brickset.css(ADDRESS_SELECTOR).extract_first(),
    'phone': brickset.css(PHONE).extract_first(),
    'website': brickset.css(WEBSITE).extract_first(),
}})

in parse_email:

item = response.meta['item']  # The item this email belongs to
item['email'] = response.xpath(EMAIL_SELECTOR).extract_first().replace('mailto:', '')
return item

Python Playground

Python Scrapy Parse Extracted Link With Another Function

Solution 1:

Post a Comment for "Python Scrapy Parse Extracted Link With Another Function"