Let’s say I have the following data:
df=pd.DataFrame({"id":[1,1,1,2,2,3,4],
"date":[2019,2019,2020,2020,2020,2020,2021],
"subgroup":["con","ind","ind","con","ind","ind","ind"],
"value":[1,None,2,None,1,3,4]})
I want to group by ID and DATE and, among those duplicates, get a column which counts the number of missing values in the Value column based on the value in the Subgroup column (in this case when Subgroup=="Ind")
The output would look like this:
id date subgroup value count
1 2019 con 1 1
1 2019 ind None 1
1 2020 ind 2 0
2 2020 con None 0
2 2020 ind 1 0
3 2020 ind 3 0
4 2021 ind 4 0
How can I achieve this?
>Solution :
df['counter'] = 0
df.loc[(df.subgroup=='ind') & (df.value.isna()), 'counter'] = 1
df['goal'] = df.groupby(["id","date"])['counter'].transform('sum')
df = df.drop(columns='counter')
but as Alollz pointed out your sample code does not produce your sample dataframe.