Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Imputate Nan for categorical data depending on its "Type" column

I have dataframe with 2 columns Name and Signal. I want to fill nan values in Signal column but it should be done according to its Name. I want to imputate it with Most frequent value according to its Name. For example:

Timestamp   Name  Signal
 2021-01-01  A.     On
 2021-01-02. A      nan
 2021-01-03. A.     On 
 2021-01-01. B.     Off
 2021-01-02. B.     Off
 2021-01-03. B.     nan

For name A nan value of Signal column should be imputated with "On" since it is most frequent value but for Name B it should be filled with Off because it is the most frequent for B.

How can I achieve it?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

df = df.groupby('Name').apply(lambda x: x.fillna(x['Signal'].value_counts().index[0]))

Output:

>>> df
    Timestamp Name Signal
0  2021-01-01    A     On
1  2021-01-02    A     On
2  2021-01-03    A     On
3  2021-01-01    B    Off
4  2021-01-02    B    Off
5  2021-01-03    B    Off
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading