Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Laeble encoding pandas dataframe, same label for same value

Here is a snippet of my df:

        0    1    2    3    4    5   ...   11    12    13    14    15    16
0      BSO  PRV  BSI  TUR  WSP  ACP  ...  HLR   HEX   HEX  None  None  None
1      BSO  PRV  BSI  TUR  WSP  ACP  ...  HLF   HLR   HEX   HEX   HEX  None
2      BSO  PRV  BSI  HLF  HLR  TUR  ...  HEX   RSO   RSI   HEX   HEX   HEX
3      BSO  PRV  BSI  HLF  HLR  TUR  ...  RSO   RSI   HEX   HEX   HEX  None
4      BSO  PRV  BSI  HLF  TUR  WSP  ...  RSO   RSI   HLR   HEX   HEX   HEX
    ...  ...  ...  ...  ...  ...  ...  ...   ...   ...   ...   ...   ...
32607  BSO  PRV  BSI  TUR  WSP  ACP  ...  HEX  None  None  None  None  None
32608  BSO  PRV  BSI  TUR  WSP  ACP  ...  HEX  None  None  None  None  None
32609  BSO  PRV  BSI  TUR  WSP  ACP  ...  HEX  None  None  None  None  None
32610  BSO  PRV  BSI  TUR  WSP  ACP  ...  HEX  None  None  None  None  None
32611  BSO  PRV  BSI  TUR  WSP  ACP  ...  HEX  None  None  None  None  None

each cell is a string (obviously), and i want to label encode each row with the same value for each string in each row, for example, all BSO = 1, all ‘PRV = 2’ etc. The values do not matter as long as they are the same. I would like to exclude the None value if possible, but if not thats ok.

I tried df.apply(le.fit_transform) and the result was:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

       0   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16
0       0   0   0   2   2   0   1   1   3   2   1   2   0   0   1   1   1
1       0   0   0   2   2   0   1   1   1   3   3   1   2   0   0   0   1
2       0   0   0   0   0   1   2   4   0   0   0   0   4   3   0   0   0
3       0   0   0   0   0   1   3   0   1   0   0   4   3   0   0   0   1
4       0   0   0   0   1   2   2   0   1   0   0   4   3   2   0   0   0
    ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..  ..
32607   0   0   0   2   2   0   1   2   2   1   2   0   5   4   1   1   1
32608   0   0   0   2   2   0   1   2   2   1   2   0   5   4   1   1   1
32609   0   0   0   2   2   0   1   2   2   1   2   0   5   4   1   1   1
32610   0   0   0   2   2   0   1   2   2   1   2   0   5   4   1   1   1
32611   0   0   0   2   2   0   1   2   2   1   2   0   5   4   1   1   1

and as you can compare, the integers do not match the values for each row.

>Solution :

It looks like the problem is that you have applied the transform on each column (default behaviour). Try:

df.apply(fit_transform, axis=1)

The axis=1 argument will result in fit_transform being applied to each row.

Hope it helps.

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading