Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas: Populate rows for same index in a group with different values

Sample df:

In [2004]: df
Out[2004]: 
   index table_name column_name data_type  default  max_length
0      0   f_person      active   integer      NaN         NaN
1      0   f_person        actv   integer      NaN         NaN
2      5   f_person         ssn   varchar      NaN       256.0
3      5   f_person         ssn   varchar      NaN        99.0
4      6   f_person          pl   varchar     10.0       256.0
5      6   f_person          pl    bigint      NaN       256.0
6      8   f_person      prefix   varchar      NaN       256.0
7      8   f_person      prefix   integer      NaN       256.0

For the same index, I want to add a new column schema and populate different values for each row. Number of rows per group will be always <= 2.

Expected Output:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

In [2006]: df
Out[2006]: 
   index table_name column_name data_type  default  max_length schema
0      0   f_person      active   integer      NaN         NaN     s1
1      0   f_person        actv   integer      NaN         NaN     s2
2      5   f_person         ssn   varchar      NaN       256.0     s1
3      5   f_person         ssn   varchar      NaN        99.0     s2
4      6   f_person          pl   varchar     10.0       256.0     s1
5      6   f_person          pl    bigint      NaN       256.0     s2
6      8   f_person      prefix   varchar      NaN       256.0     s1
7      8   f_person      prefix   integer      NaN       256.0     s2

I solved it using a for loop, but there must be a better way. Can someone please suggest a more pandaic way?

>Solution :

Assuming you want to populate from a list a defined values:

values = ['s1', 's2']

d = dict(enumerate(values))

df['schema'] = df.groupby('index').cumcount().map(d)

Otherwise, this is already covered in previous questions

output:

   index table_name column_name data_type  default  max_length schema
0      0   f_person      active   integer      NaN         NaN     s1
1      0   f_person        actv   integer      NaN         NaN     s2
2      5   f_person         ssn   varchar      NaN       256.0     s1
3      5   f_person         ssn   varchar      NaN        99.0     s2
4      6   f_person          pl   varchar     10.0       256.0     s1
5      6   f_person          pl    bigint      NaN       256.0     s2
6      8   f_person      prefix   varchar      NaN       256.0     s1
7      8   f_person      prefix   integer      NaN       256.0     s2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading