Pandas/Python Drop columns base on syntax ':' existence in cell

June 22, 2022

Consider a dataframe from txt file.

Data txt file: txt

I removed the unwanted lines that contains "Iteration"

import pandas as pd
data = pd.read_csv('results.txt', sep=" ", header=None)
data = data[~data[1].astype(str).str.startswith('Iteration')]

Now its like this:
pd

I wanted to remove all the columns that contains ":"
to result something like: {2, 7}{2, 7}{2, 7}{2, 7}{1, 2, 7}{1, 2, 6, 7} {1, 2, 6, 7}{1, 2, 3, 6, 7}{1, 2, 3, 6, 7} {1, 3, 4, 6, 7}{1, 3, 4, 5, 6} {3, 4, 5, 6, 7}{1, 3, 4, 5, 7} {1, 4, 5, 6, 7}, …….etc.

and to merge columns such as: "{2," " 7}," into one column "{2,7}"
Then I can replace all the column names with timestamps (6:30, 7:00 etc) to make it a proper table like :

6:30 7:00 7:30 8:00 8:30 9:00

{2,7} {2,7} {2,7} {2,7} {1,2,7} {1,2,6,7}

{5,6} {5,6} {5,6} {5,6} {1,5,6} {1,4,5,6}

….

My first step now is to remove columns with ":" , I tried these:

data.loc[:, ~(data[0:122].astype(str).apply(lambda x: x.str.contains(':'))).any()]
data.loc[:, ~(data.contains(':')).any()]

obviously they won’t work since contains() cannot be use on data frames, I am not very familier with pandas. I searched for many other answers and wasn’t able to get it right. I wonder what is the correct way to remove columns with cells that contains ":"

Or if my approch full of unnecessary steps, its there a better approach to turn the txt file into a dataframe of

6:30 7:00 7:30 8:00 8:30 9:00

{2,7} {2,7} {2,7} {2,7} {1,2,7} {1,2,6,7}

{5,6} {5,6} {5,6} {5,6} {1,5,6,} {1,4,5,6}

like sort it when reading the txt file at first place and loop through lines and make dictionaries instead?
‘6:30’: {2, 7}
….

>Solution :

Preprocess your file before create dataframe:

import pandas as pd
import ast

data = []
with open('results.txt') as fp:
    for line in fp:
        if line.startswith('Shifts:'):
            data.append(ast.literal_eval(line.split(':', maxsplit=1)[1].strip()))
df = pd.DataFrame.from_dict(data)

Output:

>>> df
       6:30    7:00    7:30    8:00       8:30  ...         16:30   17:00   17:30   18:00   18:30
0    {2, 7}  {2, 7}  {2, 7}  {2, 7}  {1, 2, 7}  ...  {8, 3, 4, 5}  {8, 5}  {8, 5}  {8, 5}  {8, 5}
1    {5, 6}  {5, 6}  {5, 6}  {5, 6}  {1, 5, 6}  ...  {8, 2, 3, 7}  {8, 3}  {8, 3}  {8, 3}  {8, 3}
2    {8, 1}  {8, 1}  {8, 1}  {8, 1}  {8, 1, 7}  ...  {3, 4, 5, 6}  {4, 5}  {4, 5}  {4, 5}  {4, 5}
3    {1, 6}  {1, 6}  {1, 6}  {1, 6}  {1, 5, 6}  ...  {8, 2, 3, 7}  {2, 7}  {2, 7}  {2, 7}  {2, 7}
4    {3, 4}  {3, 4}  {3, 4}  {3, 4}  {3, 4, 5}  ...  {8, 1, 6, 7}  {8, 7}  {8, 7}  {8, 7}  {8, 7}
..      ...     ...     ...     ...        ...  ...           ...     ...     ...     ...     ...
142  {5, 6}  {5, 6}  {5, 6}  {5, 6}  {5, 6, 7}  ...  {8, 1, 3, 4}  {1, 4}  {1, 4}  {1, 4}  {1, 4}
143  {5, 6}  {5, 6}  {5, 6}  {5, 6}  {3, 5, 6}  ...  {8, 1, 4, 7}  {8, 1}  {8, 1}  {8, 1}  {8, 1}
144  {1, 2}  {1, 2}  {1, 2}  {1, 2}  {1, 2, 5}  ...  {3, 4, 6, 7}  {4, 7}  {4, 7}  {4, 7}  {4, 7}
145  {8, 7}  {8, 7}  {8, 7}  {8, 7}  {8, 5, 7}  ...  {1, 2, 3, 4}  {2, 3}  {2, 3}  {2, 3}  {2, 3}
146  {1, 4}  {1, 4}  {1, 4}  {1, 4}  {1, 4, 5}  ...  {8, 2, 3, 7}  {8, 7}  {8, 7}  {8, 7}  {8, 7}

[147 rows x 25 columns]