Home / Html Parsing / Python / Python 3.x

Extract Html Tags From A Text File Through Iteration And Append Them To A List And Ignore All Other Characters In Python

November 28, 2023 Post a Comment

I want to be able to read a html file and extract only the tags out of it. Read one character at a time from the file, ignoring everything to get '<'(ignore '<' as well) Re

Solution 1:

In [10]: re.findall('<(.*?)>', html)
Out[10]: ['html', 'body', 'h1', '/h1', 'h2', 'h2', '/body', '/html']

Simply use regex or a HTMLParser.

Python Playground

Extract Html Tags From A Text File Through Iteration And Append Them To A List And Ignore All Other Characters In Python

Solution 1:

Post a Comment for "Extract Html Tags From A Text File Through Iteration And Append Them To A List And Ignore All Other Characters In Python"