Get the most frequent value of several variables

May 17, 2022

I am trying to get the most frequent value for each variable in a dataset in python.
For example, I want to know the most frequent preferred color for a person per city.

data = {'Name':['Tom', 'nick', 'krish', 'jack', 'John', 'Bettany', 'Leo', 'Aubrie', 'Martha', 'Grant'],
        'Age':[20, 21, 19, 18,24,25,26,26,27, 25], 
        'Prefered color':['green', 'green', 'red', 'blue', 'white', 'black', 'green', 'blue', 'red', 'white'], 
        'state':['Utah', 'Utah', 'Idaho', 'California', 'Texas', 'Arizona', 'Idaho', 'California', 'Idaho', 'Texas'] }
df = pd.DataFrame(data)
df

I would like to see a table like this:

Utah - Green 
Idaho - Red
Texas - White
Arizona - Blue

>Solution :

Try with groupby and mode. Since a series can have multiple modes, you can concat:

>>> df.groupby("state")["Prefered color"].agg(lambda x: x.mode().str.cat(sep=","))
state
Arizona                black
California          blue,red
Idaho         blue,green,red
Texas                  white
Utah                   green
Name: Prefered color, dtype: object