Skip to content Skip to sidebar Skip to footer

Stop BeautifulSoup From Removing Whitespace

BeautifulSoup is removing whitespace between before newlines tags: print BeautifulSoup('
\n
') The cod

Solution 1:

As a workaround, you could try replacing all <section>...</section> with <pre>...</section> before parsing. BeautifulSoup would then fully preserve the spaces. For example:

from bs4 import BeautifulSoup
import re

html = "<?xml version='1.0' encoding='UTF-8'?><section>    \n</section>"
html = re.sub(r'(\</?)(section)(\>)', r'\1pre\3', html)
soup = BeautifulSoup(html, "lxml")

print repr(soup.pre.text)    # repr used to show where the spaces are

Giving you:

u'    \n'

Post a Comment for "Stop BeautifulSoup From Removing Whitespace"