Skip to content Skip to sidebar Skip to footer

Select All Divs Except Ones With Certain Classes In BeautifulSoup

As discussed in this question one can easily get all divs with certain classes. But here, I have a list of classes that I want to exclude & want to get all divs that doesn't ha

Solution 1:

Using CSS selector, try this:

divs = soup.select("div:not('.class1, .class2, .class3')")

Reference

  1. Link 1
  2. Link 2

Solution 2:

Alternate solution

soup.find_all('div', class_=lambda x: x not in classToIgnore)

Example

from bs4 import BeautifulSoup
html = """
<div class="c1"></div>
<div class="c1"></div>
<div class="c2"></div>
<div class="c3"></div>
<div class="c4"></div>
"""
soup = BeautifulSoup(html, 'html.parser')
classToIgnore = ["c1", "c2"]
print(soup.find_all('div', class_=lambda x: x not in classToIgnore))

Output

[<div class="c3"></div>, <div class="c4"></div>]

If you are dealing with nested classes then try deleting the inner unwanted classes using decompose and then just find_all('div')

for div in soup.find_all('div', class_=lambda x: x in classToIgnore):
    div.decompose()
print(soup.find_all('div'))

This might leave some extra spaces but you can strip that off easily later.


Post a Comment for "Select All Divs Except Ones With Certain Classes In BeautifulSoup"