Home Map unique values in 2 columns to integers

Questions

Map unique values in 2 columns to integers

January 9, 2022

I have a dataframe with 2 categorical columns (col1, col2).

  col1 col2
0    A   DE
1    A    B
2    B   BA
3    A    A
4    C    C

I want to map the unique string values to integers, for example (A:0, B:1, BA:2, C:3, DE:4)

  col1 col2  ideal1  ideal2
0    A   DE       0       4
1    A    B       0       1
2    B   BA       1       2
3    A    A       0       0
4    C    C       3       3

I am have tried to use factorize or category, but I am not getting the same unique value for both columns, as can be seen from ROW C:

Here is my code:

df = pd.DataFrame({'col1': ["A", "A", "B", "A" , "C"], 'col2': ["DE", "B", "BA", "A", "C"]}) 

#ideal map alphabetical: A:0, B:1, BA:2, C:3, DE:4

 #ideal result 
df["ideal1"] = [0, 0, 1,0, 3]
df["ideal2"] = [4,1,2,0,3]


 #trial #1 --> C value 2 & 3 : not matching
df["cat1"] = df['col1'].astype("category").cat.codes
df["cat2"] = df['col2'].astype("category").cat.codes

 #trial #2 --> C value 2 & 4 : not matching 
df["fac1"] = pd.factorize(df["col1"])[0]
df["fac2"] = pd.factorize(df["col2"])[0]


 
print (df)

OUT: 

  col1 col2  ideal1  ideal2  cat1  cat2  fac1  fac2
0    A   DE       0       4     0     4     0     0
1    A    B       0       1     0     1     0     1
2    B   BA       1       2     1     2     1     2
3    A    A       0       0     0     0     0     3
4    C    C       3       3     2     3     2     4

>Solution :

To get the same categories across columns you need to reshape to a single dimension first. Then use factorize and restore the original shape.

Here is an example using stack/unstack:

x = df.stack()
x[:] = x.factorize()[0]
df2 = x.unstack()

Output:

  col1 col2
0    0    1
1    0    2
2    2    3
3    0    0
4    4    4

Joining to the original data:

x = df.stack()
x[:] = x.factorize()[0]
df2 = df.join(x.unstack().add_suffix('_cat'))

Output:

  col1 col2 col1_cat col2_cat
0    A   DE        0        1
1    A    B        0        2
2    B   BA        2        3
3    A    A        0        0
4    C    C        4        4

alphabetical order

If you really want alphabetical order, you could create you own custom mapping dictionary:

import numpy as np
cats = {k:v for v,k in enumerate(np.unique(df.values))}
df.replace(cats)

Output:

   col1  col2
0     0     4
1     0     1
2     1     2
3     0     0
4     3     3

dataframe

byMR

Published January 09, 2022

Add a comment

Name of objects containing tables in BigQuery

byMR

January 9, 2022

Questions

How to Type a request parameter with express server

byMR

January 9, 2022

Questions

Binary Search: Program doesn't terminate

byMR

January 9, 2022

Questions

When I modify a duplicated Geom object, this also modifies the underlying original Geom. Why?

byMR

January 9, 2022

Questions

Add rows based on a cell value

byMR

January 9, 2022

Questions

Force (rectangular) image to display as square with side length = variable parent div's width

byMR

January 9, 2022

Map unique values in 2 columns to integers

MEDevel.com: Open-source for Healthcare and Education

>Solution :

alphabetical order

Like this:

Leave a ReplyCancel reply

Read more

Name of objects containing tables in BigQuery

How to Type a request parameter with express server

Binary Search: Program doesn't terminate

When I modify a duplicated Geom object, this also modifies the underlying original Geom. Why?

Add rows based on a cell value

Force (rectangular) image to display as square with side length = variable parent div's width

Keep Up to Date with the Most Important News

Map unique values in 2 columns to integers

MEDevel.com: Open-source for Healthcare and Education

>Solution :

alphabetical order

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Name of objects containing tables in BigQuery

How to Type a request parameter with express server

Binary Search: Program doesn't terminate

When I modify a duplicated Geom object, this also modifies the underlying original Geom. Why?

Add rows based on a cell value

Force (rectangular) image to display as square with side length = variable parent div's width

Discover more from Dev solutions