I am attempting to create a new df that shows all columns and their unique values. I have this following code but I think I am referencing the column of the df in the loop wrong.
#Create empty df
df_unique = pd.DataFrame()
#Loop to take unique values from each column and append to df
for col in df:
list = df(col).unique().tolist()
df_unique.loc[len(df_unique)] = list
To visualize what I am hoping to achieve, I’ve included a before and after example below.
Before
ID Name Zip Type
01 Bennett 10115 House
02 Sally 10119 Apt
03 Ben 11001 House
04 Bennett 10119 House
After
Column List_of_unique
ID 01, 02, 03, 04
Name Bennett, Sally, Ben
Zip 10115, 10119, 11001
Type House, Apt
>Solution :
You can use:
>>> df.apply(np.unique)
ID [1, 2, 3, 4]
Name [Ben, Bennett, Sally]
Zip [10115, 10119, 11001]
Type [Apt, House]
dtype: object
# OR
>>> (df.apply(lambda x: ', '.join(x.unique().astype(str)))
.rename_axis('Column').rename('List_of_unique').reset_index())
Column List_of_unique
0 ID 1, 2, 3, 4
1 Name Bennett, Sally, Ben
2 Zip 10115, 10119, 11001
3 Type House, Apt