An empty df w particular cols of interest (col1-5)
dfw_columns = pd.DataFrame({
"col1": [],
"col2": [],
"col3": [],
"col4": [],
"col5": []
})
The df w actual entries
df = pd.DataFrame({
"Name": ["abc", "abc", "abc", "def", "def", "ghi", "ghi"],
"colids": ["col1", "col33", np.nan, "col5", "col1", "col2", np.nan]
})
Place values in the dfw_columns based on occurrence (1 or 0) in df for each Name and colids.
Desired output (after filling the empty dfw_columns)
desireddf = pd.DataFrame({
"Name": ["abc", "def", "ghi"],
"col1": [1,1, 0],
"col2": [0,0, 1],
"col3": [0,0, 0],
"col4": [0,0, 0],
"col5": [0,1,0]
})
desireddf
>Solution :
IIUC, you can pd.crosstab + .reindex:
cols_of_interest = ['col1', 'col2', 'col3', 'col4', 'col5']
out = pd.crosstab(df['Name'], df['colids']).reindex(columns=cols_of_interest, fill_value=0)
print(out)
Prints:
colids col1 col2 col3 col4 col5
Name
abc 1 0 0 0 0
def 1 0 0 0 1
ghi 0 1 0 0 0