I wish to extract the data from a txt file which is given below and store in to a pandas Dataframe that has 8 columns.
Lorem | Ipsum | is | simply | dummy
text | of | the | printing | and
typesetting | industry. | Lorem
more | recently | with | desktop | publishing | software | like | Aldus
Ipsum | has | been | the | industry's
standard | dummy | text | ever | since | the | 1500s
took | a | galley | of | type | and
scrambled | it | to | make | a | type | specimen | book
It | has | survived | not | only | five | centuries, | but
the | leap | into | electronic | typesetting
remaining | essentially | unchanged
It | was | popularised | in | the | 1960s | with | the
Lorem | Ipsum | passages, | and
PageMaker | including | versions | of | Lorem | Ipsum
Data on each line is separated by a pipe sign which refers to a data inside each cell of a row and column. My end goal is to have the data inserted in dataframe as per below format.
Column 1 | Column 2 | Column 3 | Column 4 | Column 5 | Column 6 | Column 7 | Column 8
-------------------------------------------------------------------------------------
Lorem | Ipsum | is | simply | dummy |
text | of | the | printing | and |
typesetting| industry. | Lorem |
more | recently | with | desktop | publishing| software | like | Aldus |
and so on…..
I performed below but I am unable to add data dynamically into dataframe.
import pandas as pd
with open(file) as f:
data = f.read().split('\n')
columns = ['Column 1', 'Column 2', 'Column 3', 'Column 4', 'Column 5', 'Column 6', 'Column 7', 'Column 8']
df = pd.DataFrame(columns=columns)
for i in data:
row = i.split(' | ')
df = df.append({'Column 1': f'{row[0]}', 'Column 2': f'{row[1]}', 'Column 3': f'{row[2]}', 'Column 4': f'{row[3]}', 'Column 5': f'{row[4]}'}, ignore_index = True)
Above is manual way of adding row’s cells to a dataframe, but I require the dynamic way i.e. how do append the rows so as whatever may be number of cells in row, it may get added.
>Solution :
import pandas as pd
text = """
Lorem | Ipsum | is | simply | dummy
text | of | the | printing | and
typesetting | industry. | Lorem
more | recently | with | desktop | publishing | software | like | Aldus
Ipsum | has | been | the | industry's
standard | dummy | text | ever | since | the | 1500s
took | a | galley | of | type | and
scrambled | it | to | make | a | type | specimen | book
It | has | survived | not | only | five | centuries, | but
the | leap | into | electronic | typesetting
remaining | essentially | unchanged
It | was | popularised | in | the | 1960s | with | the
Lorem | Ipsum | passages, | and
PageMaker | including | versions | of | Lorem | Ipsum
"""
# Create a 'jagged' list of words...
data = [i.split(" | ") for i in text.strip().split("\n")]
# ... which you can pass to pd.DataFrame directly:
columns = [f"Column {i}" for i in range(1, 9)]
df = pd.DataFrame(data, columns=columns)
df:
Column 1 Column 2 Column 3 Column 4 Column 5 Column 6 Column 7 Column 8
0 Lorem Ipsum is simply dummy None None None
1 text of the printing and None None None
2 typesetting industry. Lorem None None None None None
3 more recently with desktop publishing software like Aldus
4 Ipsum has been the industry's None None None
5 standard dummy text ever since the 1500s None
6 took a galley of type and None None
7 scrambled it to make a type specimen book
8 It has survived not only five centuries, but
9 the leap into electronic typesetting None None None
10 remaining essentially unchanged None None None None None
11 It was popularised in the 1960s with the
12 Lorem Ipsum passages, and None None None None
13 PageMaker including versions of Lorem Ipsum None None