Python Scrapy Parse Extracted Link With Another Function
I am new to scrapy i am trying to scrape yellowpages for learning purposes everything works fine but i want the email address, but to do that i need to visit links extracted inside
Solution 1:
You are yielding a dict
with a Request
inside of it, Scrapy won't dispatch it because it doesn't know it's there (they don't get dispatched automatically after creating them). You need to yield the actual Request
.
In the parse_email
function, in order to "remember" which item each email belongs to, you will need to pass the rest of the item data alongside the request. You can do this with the meta
argument.
Example:
in parse
:
yield scrapy.Request(url, callback=self.parse_email, meta={'item': {
'name': brickset.css(NAME_SELECTOR).extract_first(),
'address': brickset.css(ADDRESS_SELECTOR).extract_first(),
'phone': brickset.css(PHONE).extract_first(),
'website': brickset.css(WEBSITE).extract_first(),
}})
in parse_email
:
item = response.meta['item'] # The item this email belongs to
item['email'] = response.xpath(EMAIL_SELECTOR).extract_first().replace('mailto:', '')
return item
Post a Comment for "Python Scrapy Parse Extracted Link With Another Function"