I have a dataframe
A 2
B 4
C 3
and I would like to make a data frame with the following
A 0
A 1
B 0
B 0
B 0
B 1
C 0
C 0
C 1.
So for B, I want to make 4 rows and each one is 0 except for the last one which is 1. Similarly, for A, I’ll have 2 rows and the first one has a 0 and the second one has a 1.
In general, if I have a row in the original table with X n, I want to return n rows in the new table with n-1 of them being X 0 and the final one as X 1.
Is there a way to do this in R? Or Python or SQL?
>Solution :
In R, we may use uncount to replicate the rows from the second column and replace the second column with binary by converting the first to logical column (duplicated)
library(tidyr)
library(dplyr)
df1 %>%
uncount(v2) %>%
mutate(v2 = +(!duplicated(v1, fromLast = TRUE)))
-output
v1 v2
1 A 0
2 A 1
3 B 0
4 B 0
5 B 0
6 B 1
7 C 0
8 C 0
9 C 1
Or in Python
import pandas as pd
df1 = pd.DataFrame({"v1":["A", "B", "C"], "v2": [2, 4, 3]})
df2 = df1.reindex(df1.index.repeat(df1.v2))
df2['v2'] = (~df2.duplicated(subset = ['v2'], keep = "last")) + 0
df2
v1 v2
0 A 0
0 A 1
1 B 0
1 B 0
1 B 0
1 B 1
2 C 0
2 C 0
2 C 1
data
df1 <- structure(list(v1 = c("A", "B", "C"), v2 = c(2L, 4L, 3L)),
class = "data.frame", row.names = c(NA,
-3L))