Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Can you create a substring from numpy function call? (i.e Extract "median" from `np.median`?)

This is kind of a strange question, but I have created a function that leverages pivot_table and some filtering and renaming to apply to a bunch of pivot/aggregation use cases I need.

One of the parameters of the function is a list of aggregation functions i.e np.median. Another parameter of the function is a string referencing that aggregation function, i.e median.

I have this latter parameter solely so I can use it to filter columns out. I was wondering if there is a way to create a substring from np.median? Ideally I wouldn’t need to have a string parameter in addition to the numpy function.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

The challenge I am finding is that np.median (or any numpy aggregation function) has a type [<function median at 0x7fa1d043b700>] so I can’t treat it with any string splitting operations to pull median from it.

Is this possible?

Sample Dataframe

data = [
    ["2", "dog", "groomed", 100],
    ["2", "dog", "groomed", 90],
    ["2", "dog", "ungroomed", 30],
    ["3", "cat", "groomed", 25],
    ["3", "cat", "ungroomed", 10],
]

df = pd.DataFrame(data, columns=["ID", "pet", "status", "amount"])

Function

from typing import List

def long_to_wide_reshape_w_agg(
    input_df: pd.DataFrame,
    index_list: List[str],
    col_to_pivot: str,
    vals: str,
    suffix: str,
    aggs: List = np.mean,
    agg_method: str = "mean",
):

    # identify possible values for the column we want to pivot on
    # we need these to filter out columns we want to rename in later steps
    cols = input_df[col_to_pivot].unique().tolist()
    str_cols = [x for x in cols if isinstance(x, str)]

    reshaped_df = input_df.pivot_table(
        index=index_list,
        columns=col_to_pivot,
        aggfunc=aggs,
        values=vals,
    ).reset_index()

    # flatten hierarchical index
    reshaped_df.columns = [" ".join(col).strip() for col in reshaped_df.columns.values]

    # identify columns to rename
    cols_to_rename = [
        s for s in reshaped_df.columns.values if any(subs in s for subs in str_cols)
    ]
    tuple_cols = tuple(cols_to_rename)

    # rename columns as needed
    reshaped_df = reshaped_df.rename(
        columns=lambda col: f"{col}{suffix}" if col in tuple_cols else col
    )
    # remove spaces and replace with underscores
    reshaped_df.columns = [cols.replace(" ", "_") for cols in reshaped_df.columns]

    # based on agg_method chosen, filter to ensure there are no null values for those columns
    col1, col2 = [col for col in reshaped_df.columns if agg_method in col]

   print(col1)
   print(col2)

   ### do stuff with col1/col2

    return reshaped_df

Function Use Case

long_to_wide_reshape(
    input_df=df,
    index_list=["ID", "pet"],
    col_to_pivot="status",
    aggs=[np.median],
    vals="amount",
    suffix="_amount",
    agg_method="median",
)

>Solution :

How about using .__name__?

...

col1, col2 = [col for col in reshaped_df.columns if aggs[0].__name__ in col]

...

Because…

>>> np.median
<function numpy.median(a, axis=None, out=None, overwrite_input=False, keepdims=False)>

>>> np.median.__name__
'median'
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading