Skip to content Skip to sidebar Skip to footer

Empty List Response Extract On Scrapy

I'm new on scrapy and i have to crawl a webpage for a test. So I use the code below on a terminal but its returns a empty list i Don't understand why. When i use the same command o

Solution 1:

First of all, when I consulted the source code of the page you seemed interested to scrape the title Iced Teas in a header tags <h1>. Am I right ?

Second, I tried scrapy shell sessions to understand the issue. It seems to be a settings of user-agent request's headers. Look at the code sessions below:

Without user-agent set

scrapy shell https://www.woolworths.com.au/shop/browse/drinks/cordials-juices-iced-teas/iced-teas
In [1]: response.css('.tileList-title').extract()                               
Out[1]: []
view(response) #open the given response in your local web browser, for inspection.

screenshot without user-agent

With user agent set

scrapy shell https://www.woolworths.com.au/shop/browse/drinks/cordials-juices-iced-teas/iced-teas -s USER_AGENT='Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)'

In [1]: response.css('.tileList-title').extract()                               
Out[1]: ['<h1 class="tileList-title" ng-if="$ctrl.listTitle" tabindex="-1">Iced Teas</h1>']
#now as you can see it does not return an empty list.
view(response)

screenshot with user-agent

So to improve your future practices, know you can use -s KEYWORDSETTING=value in your scrapy shell sessions. Here the settings key words for scrapy. And to check with view(response) to see if the requests returns the expected content even if it sent a 200. For my experience, with view(response) you can see that the content page, and even source code sometimes, is a little different when you use it in scrapy shell than when you use it in a normal browser. So that's a good practice to check with this shortcut. Here the shorcuts for scrapy. They are mentioned at each scrapy shell session too.

Post a Comment for "Empty List Response Extract On Scrapy"