Python – generalised function to subset columns

Advertisements

I am currently trying to create a generalised function which subsets a dataset based on the list of column names specified in the argument parameters.
This function works well when one column is specified, but fails when more than one column is specified.
I would like a function which is able to accommodate multiple columns input as the argument parameter.

import pandas as pd
testdb=pd.DataFrame({'first':[1,3,4],'second':[1,3,4],'last':[1,3,4],'static':[1,3,4]})

def subsetting(df,cols):
    print(df.loc[:, [cols, "static"]])
  
# this works
subsetting(testdb,'first')

# this does not work
subsetting(testdb,str('first','second'))

>Solution :

I would design the API like this.

You can passed in a list of cols that you want to select from dataframe and then use

df.loc[:, [*cols, "static"]]

syntax to unpack it as separate column names. ie,

>>> import pandas as pd
>>> 
>>> testdb = pd.DataFrame(
...     {"first": [1, 3, 4], "second": [1, 3, 4], "last": [1, 3, 4], "static": [1, 3, 4]}
... )
>>> 
>>> 
>>> def subsetting(df, cols):
...     print(df.loc[:, [*cols, "static"]])
... 
>>> 
>>> subsetting(testdb, cols=("first", "second"))
   first  second  static
0      1       1       1
1      3       3       3
2      4       4       4

Leave a ReplyCancel reply