How To Yield In Scrapy Without A Request?
I am trying to crawl a defined list of URLs with Scrapy 2.4 where each of those URLs can have up to 5 paginated URLs that I want to follow. Now also the system works, I do have one
Solution 1:
If I understand you're question correct you just need to change to start at ?pn=1 and ignore the one without pn=null, here's an option how i would do it, which also only requires one parse method.
start_urls = [
'https://example...',
'https://example2...',
]
def start_requests(self):
for url in self.start_urls:
#how many pages to crawl
for i in range(1,6):
yield scrapy.Request(
url=url + f'&pn={str(i)}'
)
def parse(self, response):
self.logger.info('Parsing %s', response.url)
Post a Comment for "How To Yield In Scrapy Without A Request?"