Skip to content Skip to sidebar Skip to footer

How To Yield In Scrapy Without A Request?

I am trying to crawl a defined list of URLs with Scrapy 2.4 where each of those URLs can have up to 5 paginated URLs that I want to follow. Now also the system works, I do have one

Solution 1:

If I understand you're question correct you just need to change to start at ?pn=1 and ignore the one without pn=null, here's an option how i would do it, which also only requires one parse method.

start_urls = [
    'https://example...',
    'https://example2...',
]

def start_requests(self):
    for url in self.start_urls:
        #how many pages to crawl
        for i in range(1,6):
            yield scrapy.Request(
                url=url + f'&pn={str(i)}'
            )

def parse(self, response):
    self.logger.info('Parsing %s', response.url) 

Post a Comment for "How To Yield In Scrapy Without A Request?"