Parsing Stray Text With Scrapy

February 28, 2024 Post a Comment

Any idea how to extract 'TEXT TO GRAB' from this piece of markup:

Solution 1:

Not ideal:

text_to_grab = response.xpath('//span[@class="navigation-pipe"]/following-sibling::text()[1]').extract_first()

Solution 2:

It's not an ideal solution but it should do the trick:

from scrapy import Selector

content="""
<spanclass="navigation_page"><span><aitemprop="url"href="http://www.example.com"><spanitemprop="title">LINK</span></a></span><spanclass="navigation-pipe">&gt;</span>
    TEXT TO GRAB
</span>
"""
sel = Selector(text=content)
item = sel.css(".navigation_page::text")
print(item.extract()[-1].strip())

OR like this:

sel = Selector(text=content)
item = ''.join([' '.join(items.split()) for items in sel.css("span.navigation_page::text").extract()])
print(item)

Output:

TEXTTO GRAB

Python Playground

Parsing Stray Text With Scrapy

Solution 1:

Solution 2:

Post a Comment for "Parsing Stray Text With Scrapy"