How To Yield In Scrapy Without A Request?
I am trying to crawl a defined list of URLs with Scrapy 2.4 where each of those URLs can have up to 5 paginated URLs that I want to follow. Now also the system works, I do have one
Solution 1:
If I understand you're question correct you just need to change to start at ?pn=1 and ignore the one without pn=null, here's an option how i would do it, which also only requires one parse method.
start_urls = [
    'https://example...',
    'https://example2...',
]
def start_requests(self):
    for url in self.start_urls:
        #how many pages to crawl
        for i in range(1,6):
            yield scrapy.Request(
                url=url + f'&pn={str(i)}'
            )
def parse(self, response):
    self.logger.info('Parsing %s', response.url) 
Post a Comment for "How To Yield In Scrapy Without A Request?"