Follow

Follow

Contact

Home Create binary columns after groupby based on occurrence

Questions

Create binary columns after groupby based on occurrence

byMR

July 25, 2023

An empty df w particular cols of interest (col1-5)

dfw_columns = pd.DataFrame({
    "col1": [],
    "col2": [],
    "col3": [],
    "col4": [],
    "col5": []
})

The df w actual entries

df = pd.DataFrame({
    "Name": ["abc", "abc", "abc", "def", "def", "ghi", "ghi"],
    "colids": ["col1", "col33", np.nan, "col5", "col1", "col2", np.nan]

})

Place values in the dfw_columns based on occurrence (1 or 0) in df for each Name and colids.

Desired output (after filling the empty dfw_columns)

desireddf = pd.DataFrame({
    "Name": ["abc", "def", "ghi"],
    "col1": [1,1, 0],
    "col2": [0,0, 1],
    "col3": [0,0, 0],
    "col4": [0,0, 0],
    "col5": [0,1,0]
})
desireddf

>Solution :

IIUC, you can pd.crosstab + .reindex:

cols_of_interest = ['col1', 'col2', 'col3', 'col4', 'col5']

out = pd.crosstab(df['Name'], df['colids']).reindex(columns=cols_of_interest, fill_value=0)
print(out)

Prints:

colids  col1  col2  col3  col4  col5
Name                                
abc        1     0     0     0     0
def        1     0     0     0     1
ghi        0     1     0     0     0

dataframe

byMR

Published July 25, 2023

Add a comment

Leave a ReplyCancel reply

Read more

Questions

Typing a polymorphic function in TypeScript

byMR

July 25, 2023

Questions

Component Constructors In HappyX

byMR

July 25, 2023

Questions

Not seeing network messages in Microsoft Edge

byMR

July 25, 2023

Questions

Same Python code returning different answer in Leetcode

byMR

July 25, 2023

Questions

how to define integer or double numbers from input argument

byMR

July 25, 2023

Questions

How can I train distilBERT more efficiently on My large text classification task?

byMR

July 25, 2023