Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas/Python Drop columns base on syntax ':' existence in cell

Consider a dataframe from txt file.

Data txt file: txt

I removed the unwanted lines that contains "Iteration"

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

import pandas as pd
data = pd.read_csv('results.txt', sep=" ", header=None)
data = data[~data[1].astype(str).str.startswith('Iteration')]

Now its like this:
pd

I wanted to remove all the columns that contains ":"
to result something like: {2, 7}{2, 7}{2, 7}{2, 7}{1, 2, 7}{1, 2, 6, 7} {1, 2, 6, 7}{1, 2, 3, 6, 7}{1, 2, 3, 6, 7} {1, 3, 4, 6, 7}{1, 3, 4, 5, 6} {3, 4, 5, 6, 7}{1, 3, 4, 5, 7} {1, 4, 5, 6, 7}, …….etc.

and to merge columns such as: "{2," " 7}," into one column "{2,7}"
Then I can replace all the column names with timestamps (6:30, 7:00 etc) to make it a proper table like :

6:30 7:00 7:30 8:00 8:30 9:00

{2,7} {2,7} {2,7} {2,7} {1,2,7} {1,2,6,7}

{5,6} {5,6} {5,6} {5,6} {1,5,6} {1,4,5,6}

….

My first step now is to remove columns with ":" , I tried these:

data.loc[:, ~(data[0:122].astype(str).apply(lambda x: x.str.contains(':'))).any()]
data.loc[:, ~(data.contains(':')).any()]

obviously they won’t work since contains() cannot be use on data frames, I am not very familier with pandas. I searched for many other answers and wasn’t able to get it right. I wonder what is the correct way to remove columns with cells that contains ":"

Or if my approch full of unnecessary steps, its there a better approach to turn the txt file into a dataframe of

6:30 7:00 7:30 8:00 8:30 9:00

{2,7} {2,7} {2,7} {2,7} {1,2,7} {1,2,6,7}

{5,6} {5,6} {5,6} {5,6} {1,5,6,} {1,4,5,6}

like sort it when reading the txt file at first place and loop through lines and make dictionaries instead?
‘6:30’: {2, 7}
….

>Solution :

Preprocess your file before create dataframe:

import pandas as pd
import ast

data = []
with open('results.txt') as fp:
    for line in fp:
        if line.startswith('Shifts:'):
            data.append(ast.literal_eval(line.split(':', maxsplit=1)[1].strip()))
df = pd.DataFrame.from_dict(data)

Output:

>>> df
       6:30    7:00    7:30    8:00       8:30  ...         16:30   17:00   17:30   18:00   18:30
0    {2, 7}  {2, 7}  {2, 7}  {2, 7}  {1, 2, 7}  ...  {8, 3, 4, 5}  {8, 5}  {8, 5}  {8, 5}  {8, 5}
1    {5, 6}  {5, 6}  {5, 6}  {5, 6}  {1, 5, 6}  ...  {8, 2, 3, 7}  {8, 3}  {8, 3}  {8, 3}  {8, 3}
2    {8, 1}  {8, 1}  {8, 1}  {8, 1}  {8, 1, 7}  ...  {3, 4, 5, 6}  {4, 5}  {4, 5}  {4, 5}  {4, 5}
3    {1, 6}  {1, 6}  {1, 6}  {1, 6}  {1, 5, 6}  ...  {8, 2, 3, 7}  {2, 7}  {2, 7}  {2, 7}  {2, 7}
4    {3, 4}  {3, 4}  {3, 4}  {3, 4}  {3, 4, 5}  ...  {8, 1, 6, 7}  {8, 7}  {8, 7}  {8, 7}  {8, 7}
..      ...     ...     ...     ...        ...  ...           ...     ...     ...     ...     ...
142  {5, 6}  {5, 6}  {5, 6}  {5, 6}  {5, 6, 7}  ...  {8, 1, 3, 4}  {1, 4}  {1, 4}  {1, 4}  {1, 4}
143  {5, 6}  {5, 6}  {5, 6}  {5, 6}  {3, 5, 6}  ...  {8, 1, 4, 7}  {8, 1}  {8, 1}  {8, 1}  {8, 1}
144  {1, 2}  {1, 2}  {1, 2}  {1, 2}  {1, 2, 5}  ...  {3, 4, 6, 7}  {4, 7}  {4, 7}  {4, 7}  {4, 7}
145  {8, 7}  {8, 7}  {8, 7}  {8, 7}  {8, 5, 7}  ...  {1, 2, 3, 4}  {2, 3}  {2, 3}  {2, 3}  {2, 3}
146  {1, 4}  {1, 4}  {1, 4}  {1, 4}  {1, 4, 5}  ...  {8, 2, 3, 7}  {8, 7}  {8, 7}  {8, 7}  {8, 7}

[147 rows x 25 columns]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading