Skip to content Skip to sidebar Skip to footer

Replacing Html Tags With Beautifulsoup

I'm currently reformatting some HTML pages with BeautifulSoup, and I ran into bit of a problem. My problem is that the original HTML has things like this:
  • stff&l

  • Solution 1:

    This question probably refered to an older version of BeautifulSoup because with bs4 you can simply use the unwrap function:

    s = BeautifulSoup('<li><div><p><strong>stff</strong></p></div><li>')
    s.div.unwrap()
    >> <div></div>
    s.p.unwrap()
    >> <p></p>
    s
    >> <html><body><li><strong>stff</strong></li><li></li></body></html>

    Solution 2:

    What you want to do can be done using replaceWith. You have to duplicate the element you want to use as the replacement, and then feed that as the argument to replaceWith. The documentation for replaceWith is pretty clear on how to do this.

    Solution 3:

    I saw many answers for this simple question, i also came here to see something useful but unfortunately i didn't get what i was looking for then after few tries I found a simple solution for this answer and here it is

    soup = BeautifulSoup(htmlData, "html.parser")
    
    h2_headers = soup.find_all("h2")
    
    for header in h2_headers:
        header.name = "h1"# replaces h2 tag with h1 

    All h2 tags converted to h1. You can convert any tag by just changing the name.

    Solution 4:

    You can write your own function to strip tags:

    import re
    
    defstrip_tags(string):
        return re.sub(r'<.*?>', '', string)
    
    strip_tags("<li><div><p><strong>stff</strong></p></div><li>")
    'stff'

    Solution 5:

    Simple solution get your whole node means div:

    1. Convert to string
    2. Replace <tag> with required tag/string.
    3. Replace corresponding tag with empty string.
    4. Convert the converted string to parsable string by passing to beautifulsoup

      What I have done for mint

      Example:

      <divclass="col-md-12 option"itemprop="text"><spanclass="label label-info">A</span>
      
      **-2<sup>31</sup> to 2<sup>31</sup>-1**
      

      sup = opt.sup 
          if sup: //opt has sup tag then//opts converted to string. 
               opt = str(opts).replace("<sup>","^").replace("</sup>","") //replacing//again converted from string to beautiful string.
               s = BeautifulSoup(opt, 'lxml')
      
               //resign to required variable after manipulation
               opts = s.find("div", class_="col-md-12 option")
      

      Output:

      -2^31to2^31-1
      without manipulation it will like this (-231to231-1)
      

    Post a Comment for "Replacing Html Tags With Beautifulsoup"