Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Map unique values in 2 columns to integers

I have a dataframe with 2 categorical columns (col1, col2).

  col1 col2
0    A   DE
1    A    B
2    B   BA
3    A    A
4    C    C 

I want to map the unique string values to integers, for example (A:0, B:1, BA:2, C:3, DE:4)

  col1 col2  ideal1  ideal2
0    A   DE       0       4
1    A    B       0       1
2    B   BA       1       2
3    A    A       0       0
4    C    C       3       3

I am have tried to use factorize or category, but I am not getting the same unique value for both columns, as can be seen from ROW C:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Here is my code:

df = pd.DataFrame({'col1': ["A", "A", "B", "A" , "C"], 'col2': ["DE", "B", "BA", "A", "C"]}) 

#ideal map alphabetical: A:0, B:1, BA:2, C:3, DE:4

 #ideal result 
df["ideal1"] = [0, 0, 1,0, 3]
df["ideal2"] = [4,1,2,0,3]


 #trial #1 --> C value 2 & 3 : not matching
df["cat1"] = df['col1'].astype("category").cat.codes
df["cat2"] = df['col2'].astype("category").cat.codes

 #trial #2 --> C value 2 & 4 : not matching 
df["fac1"] = pd.factorize(df["col1"])[0]
df["fac2"] = pd.factorize(df["col2"])[0]


 
print (df)

OUT: 

  col1 col2  ideal1  ideal2  cat1  cat2  fac1  fac2
0    A   DE       0       4     0     4     0     0
1    A    B       0       1     0     1     0     1
2    B   BA       1       2     1     2     1     2
3    A    A       0       0     0     0     0     3
4    C    C       3       3     2     3     2     4

>Solution :

To get the same categories across columns you need to reshape to a single dimension first. Then use factorize and restore the original shape.

Here is an example using stack/unstack:

x = df.stack()
x[:] = x.factorize()[0]
df2 = x.unstack()

Output:

  col1 col2
0    0    1
1    0    2
2    2    3
3    0    0
4    4    4

Joining to the original data:

x = df.stack()
x[:] = x.factorize()[0]
df2 = df.join(x.unstack().add_suffix('_cat'))

Output:

  col1 col2 col1_cat col2_cat
0    A   DE        0        1
1    A    B        0        2
2    B   BA        2        3
3    A    A        0        0
4    C    C        4        4
alphabetical order

If you really want alphabetical order, you could create you own custom mapping dictionary:

import numpy as np
cats = {k:v for v,k in enumerate(np.unique(df.values))}
df.replace(cats)

Output:

   col1  col2
0     0     4
1     0     1
2     1     2
3     0     0
4     3     3
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading