Follow

Follow

Contact

Home python pandas normalization

Questions

python pandas normalization

byMR

June 10, 2022

I have a csv file where one field should be normalized over two records:

    +-----+---------+
    | id  | field   |
    +-----+---------+
    | 1   | A-a,B-b |
    | 2   | C-c     |
    +-----+---------+

so some records are comma separated with two tuples
to become different records

    +-----+---------+
    | id  | field   |
    +-----+---------+
    | 1   | A-a     |
    | 1   | B-b     |
    | 2   | C-c     |
    +-----+---------+

and then split over two fields

    +-----+---------+---------+
    | id  | field_1 | field_2 |
    +-----+---------+---------+
    | 1   | A       | a       |
    | 1   | B       | b       |
    | 2   | C       | c       |
    +-----+---------+---------+

I have this solution for the last step

df[['field_1', 'field_2']] = df['field'].str.split('-', expand = True)

but I’m missing the first step. Can you help?

>Solution :

df.field = df.field.str.split(',')
df1 = df.explode('field')
df1[['field_1', 'field_2']] = df1.field.str.split('-', expand = True)
df1
 
   id field field_1 field_2
0   1   A-a       A       a
0   1   B-b       B       b
1   2   C-c       C       c

database-normalization

byMR

Published June 10, 2022

Add a comment

Leave a ReplyCancel reply

Read more

Questions

Sum the predictions of a Linear Regression from Scikit-Learn

byMR

June 10, 2022

Questions

Create Stored Procedure | 1172 Result consisted of more that one row

byMR

June 10, 2022

Questions

how to loop map value by key

byMR

June 10, 2022

Questions

PHP xPath with multiple not conditions

byMR

June 10, 2022

Questions

How to create date intervals in javascript

byMR

June 10, 2022

Questions

Moving first element to the back in Queue

byMR

June 10, 2022