Add a new row at selected place in pandas data frame

May 21, 2024

I have the following dataframe with large amounts of data

    Column1         Column2
0    10001         252207
1    100018        219559
2    100068        251102
3    100089        107320
4    100111        250975
5    100111        28540
6    100112        252253
7    100157        17883
.   ...            ...
10000 100998         1231233

I would like to add a new row to the first column with a specific value “t # {int}” above the specific value only if the next value in Column1 is not the same as the previous one. Below the output that I want to get

    Column1         Column2'
0    t # 0           NULL
1    10001          252207
2    t # 1           NULL
3    100018         219559
4    t # 2           NULL
5    100088         251102
6    100088         107320
7    t # 3           NULL
8    100111         250975
9    100111         28540
10    t # 4           NULL
11    100112        252253
12    t # 5          NULL
13    100157        17883
...   ...            ...
end-3  t # {int}    NULL
end-2  100998       1231233
end-1  100998       3333
end    100998       4123

What I’m trying to do is first create a new dataframe based on the Column1, and then add what I want

   with open("week-1-algorithm.txt", "r") as f:
        text = [line.split() for line in f]

    df = pandas.DataFrame(
        text,
        columns=["Column1", "Column2"],
    )

    new_df = df["Column1"].copy()
    iteration_number = 0
    for i in range(len(new_df)):
        if (new_df[i] != new_df[i+1]):
            new_df.loc[i+1]= f't # {j}'
            iteration_number += 1

Could anyone help me on how I can do this? All I get is overwriting data, not adding it.

>Solution :

Assuming that your dataframe is already sorted, you can group by Column1, then add a header row to each group:

frames = [
    subframe
    for i, (_, group) in enumerate(df.groupby("Column1"))
    for subframe in [
        pd.DataFrame([f"t # {i}"], columns=["Column1"]),
        group,
    ]
]

result = pd.concat(frames, ignore_index=True)