I’m new to pandas. I have the following dataframe:
| ID | A | B |
|---|---|---|
| 0 | Data | Data |
| 1 | Data | Data |
| 2 | Data | Data |
| 3 | Data | Data |
| 3 | Data | Data |
| 3 | Data | Data |
| 3 | Data | Data |
I want to replicate every row 3 times for a group with a size < 3. The dataframe will look like this:
| ID | A | B |
|---|---|---|
| 0 | Data | Data |
| 0 | Data | Data |
| 0 | Data | Data |
| 1 | Data | Data |
| 1 | Data | Data |
| 1 | Data | Data |
| 2 | Data | Data |
| 2 | Data | Data |
| 2 | Data | Data |
| 3 | Data | Data |
| 3 | Data | Data |
| 3 | Data | Data |
| 3 | Data | Data |
Does anyone have ideas? Thanks in advance.
>Solution :
Use Series.value_counts for count column, and if less like 3 set values to 3 else 1 for no repeat, then use Series.map and repeat rows by Index.repeat in DataFrame.loc:
s = df['ID'].value_counts().lt(3).map({True:3, False:1})
df = df.loc[df.index.repeat(df['ID'].map(s))]
print (df)
ID A B
0 0 Data Data
0 0 Data Data
0 0 Data Data
1 1 Data Data
1 1 Data Data
1 1 Data Data
2 2 Data Data
2 2 Data Data
2 2 Data Data
3 3 Data Data
4 3 Data Data
5 3 Data Data
6 3 Data Data
If there are some group with 2 values after repeat get:
print (df)
ID A B
0 0 Data Data
1 1 Data Data
2 2 Data Data
3 2 Data Data
4 3 Data Data
5 3 Data Data
6 3 Data Data
s = df['ID'].value_counts().lt(3).map({True:3, False:1})
print (s)
3 1
2 3
0 3
1 3
Name: ID, dtype: int64
df = df.loc[df.index.repeat(df['ID'].map(s))]
print (df)
ID A B
0 0 Data Data
0 0 Data Data
0 0 Data Data
1 1 Data Data
1 1 Data Data
1 1 Data Data
2 2 Data Data
2 2 Data Data
2 2 Data Data
3 2 Data Data
3 2 Data Data
3 2 Data Data
4 3 Data Data
5 3 Data Data
6 3 Data Data
But if need repeat if 1 values 3 times, if 2 values 2 times else no repeat (repeat 1) solution is change:
s = df['ID'].value_counts().map({1:3, 2:2}).fillna(1)
print (s)
3 1.0
2 2.0
0 3.0
1 3.0
Name: ID, dtype: float64
df = df.loc[df.index.repeat(df['ID'].map(s))]
print (df)
ID A B
0 0 Data Data
0 0 Data Data
0 0 Data Data
1 1 Data Data
1 1 Data Data
1 1 Data Data
2 2 Data Data
2 2 Data Data
3 2 Data Data
3 2 Data Data
4 3 Data Data
5 3 Data Data
6 3 Data Data