Skip to content Skip to sidebar Skip to footer

Python: Creating Row Breaks In A List For Openpyxl To Recognise In .xlsx

I am scraping information from a URL I can successfully get the information into a .xlsx It isn't in the format I want it to be in. element_rows = [] for table_row in Elements.find

Solution 1:

I would recommend you write to your excel file as you go along. For each table row, create a list of lists containing any sub rows present. You can then use Python's zip_longest() function to return a sub entry for each row with blanks where one list is shorter than another, for example:

from itertools import zip_longest
from bs4 import BeautifulSoup
import openpyxl


html = """
<table>
  <tr>
    <td><p>a</p><p>b</p></td>
    <td><p>1</p><p>2</p><p>3</p></td>
    <td><p>d</p></td>
  </tr>
  <tr>
    <td><p>a</p><p>b</p></td>
    <td><p>1</p><p>2</p><p>3</p></td>
    <td><p>d</p></td>
  </tr>
</table>
"""

soup = BeautifulSoup(html, "html.parser")
table = soup.table

wb = openpyxl.Workbook()
ws = wb.active
output_row = 1

for table_row in table.find_all('tr'):
    cells = table_row.find_all('td')
    row = [[row.text for row in cell.find_all('p')] for cell in cells]

    for row_number, cells in enumerate(zip_longest(*row, fillvalue=""), start=output_row):
        for col_number, value in enumerate(cells, start=1):
            ws.cell(column=col_number, row=row_number, value=value)

    output_row += len(cells)

wb.save('output.xlsx')

This would give you the following output:

Excel screenshot

The enumerate() function can be used to give you an incrementing number for each entry in a list. This can be used to give you suitable row and column numbers for openpyxl cells.


Solution 2:

Currently you're still putting all values into the same cell even though you're adding line breaks.

What you need to do is add new rows for each word. These will need to be of the form [None, 'Pear'] if you want the values in the second column.


Post a Comment for "Python: Creating Row Breaks In A List For Openpyxl To Recognise In .xlsx"