Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I replicate row in a group with size < 3 within a DataFrame?

I’m new to pandas. I have the following dataframe:

ID A B
0 Data Data
1 Data Data
2 Data Data
3 Data Data
3 Data Data
3 Data Data
3 Data Data

I want to replicate every row 3 times for a group with a size < 3. The dataframe will look like this:

ID A B
0 Data Data
0 Data Data
0 Data Data
1 Data Data
1 Data Data
1 Data Data
2 Data Data
2 Data Data
2 Data Data
3 Data Data
3 Data Data
3 Data Data
3 Data Data

Does anyone have ideas? Thanks in advance.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Use Series.value_counts for count column, and if less like 3 set values to 3 else 1 for no repeat, then use Series.map and repeat rows by Index.repeat in DataFrame.loc:

s = df['ID'].value_counts().lt(3).map({True:3, False:1})

df = df.loc[df.index.repeat(df['ID'].map(s))]
print (df)
   ID     A     B
0   0  Data  Data
0   0  Data  Data
0   0  Data  Data
1   1  Data  Data
1   1  Data  Data
1   1  Data  Data
2   2  Data  Data
2   2  Data  Data
2   2  Data  Data
3   3  Data  Data
4   3  Data  Data
5   3  Data  Data
6   3  Data  Data

If there are some group with 2 values after repeat get:

print (df)
   ID     A     B
0   0  Data  Data
1   1  Data  Data
2   2  Data  Data
3   2  Data  Data
4   3  Data  Data
5   3  Data  Data
6   3  Data  Data

s = df['ID'].value_counts().lt(3).map({True:3, False:1})
print (s)
3    1
2    3
0    3
1    3
Name: ID, dtype: int64

df = df.loc[df.index.repeat(df['ID'].map(s))]
print (df)
   ID     A     B
0   0  Data  Data
0   0  Data  Data
0   0  Data  Data
1   1  Data  Data
1   1  Data  Data
1   1  Data  Data
2   2  Data  Data
2   2  Data  Data
2   2  Data  Data
3   2  Data  Data
3   2  Data  Data
3   2  Data  Data
4   3  Data  Data
5   3  Data  Data
6   3  Data  Data

But if need repeat if 1 values 3 times, if 2 values 2 times else no repeat (repeat 1) solution is change:

s = df['ID'].value_counts().map({1:3, 2:2}).fillna(1)
print (s)
3    1.0
2    2.0
0    3.0
1    3.0
Name: ID, dtype: float64

df = df.loc[df.index.repeat(df['ID'].map(s))]
print (df)
   ID     A     B
0   0  Data  Data
0   0  Data  Data
0   0  Data  Data
1   1  Data  Data
1   1  Data  Data
1   1  Data  Data
2   2  Data  Data
2   2  Data  Data
3   2  Data  Data
3   2  Data  Data
4   3  Data  Data
5   3  Data  Data
6   3  Data  Data
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading