Consider a dataframe from txt file.
Data txt file: txt
I removed the unwanted lines that contains "Iteration"
import pandas as pd
data = pd.read_csv('results.txt', sep=" ", header=None)
data = data[~data[1].astype(str).str.startswith('Iteration')]
Now its like this:
pd
I wanted to remove all the columns that contains ":"
to result something like: {2, 7}{2, 7}{2, 7}{2, 7}{1, 2, 7}{1, 2, 6, 7} {1, 2, 6, 7}{1, 2, 3, 6, 7}{1, 2, 3, 6, 7} {1, 3, 4, 6, 7}{1, 3, 4, 5, 6} {3, 4, 5, 6, 7}{1, 3, 4, 5, 7} {1, 4, 5, 6, 7}, …….etc.
and to merge columns such as: "{2," " 7}," into one column "{2,7}"
Then I can replace all the column names with timestamps (6:30, 7:00 etc) to make it a proper table like :
6:30 7:00 7:30 8:00 8:30 9:00
{2,7} {2,7} {2,7} {2,7} {1,2,7} {1,2,6,7}
{5,6} {5,6} {5,6} {5,6} {1,5,6} {1,4,5,6}
….
My first step now is to remove columns with ":" , I tried these:
data.loc[:, ~(data[0:122].astype(str).apply(lambda x: x.str.contains(':'))).any()]
data.loc[:, ~(data.contains(':')).any()]
obviously they won’t work since contains() cannot be use on data frames, I am not very familier with pandas. I searched for many other answers and wasn’t able to get it right. I wonder what is the correct way to remove columns with cells that contains ":"
Or if my approch full of unnecessary steps, its there a better approach to turn the txt file into a dataframe of
6:30 7:00 7:30 8:00 8:30 9:00
{2,7} {2,7} {2,7} {2,7} {1,2,7} {1,2,6,7}
{5,6} {5,6} {5,6} {5,6} {1,5,6,} {1,4,5,6}
like sort it when reading the txt file at first place and loop through lines and make dictionaries instead?
‘6:30’: {2, 7}
….
>Solution :
Preprocess your file before create dataframe:
import pandas as pd
import ast
data = []
with open('results.txt') as fp:
for line in fp:
if line.startswith('Shifts:'):
data.append(ast.literal_eval(line.split(':', maxsplit=1)[1].strip()))
df = pd.DataFrame.from_dict(data)
Output:
>>> df
6:30 7:00 7:30 8:00 8:30 ... 16:30 17:00 17:30 18:00 18:30
0 {2, 7} {2, 7} {2, 7} {2, 7} {1, 2, 7} ... {8, 3, 4, 5} {8, 5} {8, 5} {8, 5} {8, 5}
1 {5, 6} {5, 6} {5, 6} {5, 6} {1, 5, 6} ... {8, 2, 3, 7} {8, 3} {8, 3} {8, 3} {8, 3}
2 {8, 1} {8, 1} {8, 1} {8, 1} {8, 1, 7} ... {3, 4, 5, 6} {4, 5} {4, 5} {4, 5} {4, 5}
3 {1, 6} {1, 6} {1, 6} {1, 6} {1, 5, 6} ... {8, 2, 3, 7} {2, 7} {2, 7} {2, 7} {2, 7}
4 {3, 4} {3, 4} {3, 4} {3, 4} {3, 4, 5} ... {8, 1, 6, 7} {8, 7} {8, 7} {8, 7} {8, 7}
.. ... ... ... ... ... ... ... ... ... ... ...
142 {5, 6} {5, 6} {5, 6} {5, 6} {5, 6, 7} ... {8, 1, 3, 4} {1, 4} {1, 4} {1, 4} {1, 4}
143 {5, 6} {5, 6} {5, 6} {5, 6} {3, 5, 6} ... {8, 1, 4, 7} {8, 1} {8, 1} {8, 1} {8, 1}
144 {1, 2} {1, 2} {1, 2} {1, 2} {1, 2, 5} ... {3, 4, 6, 7} {4, 7} {4, 7} {4, 7} {4, 7}
145 {8, 7} {8, 7} {8, 7} {8, 7} {8, 5, 7} ... {1, 2, 3, 4} {2, 3} {2, 3} {2, 3} {2, 3}
146 {1, 4} {1, 4} {1, 4} {1, 4} {1, 4, 5} ... {8, 2, 3, 7} {8, 7} {8, 7} {8, 7} {8, 7}
[147 rows x 25 columns]