Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Add a new row at selected place in pandas data frame

I have the following dataframe with large amounts of data

    Column1         Column2
0    10001         252207
1    100018        219559
2    100068        251102
3    100089        107320
4    100111        250975
5    100111        28540
6    100112        252253
7    100157        17883
.   ...            ...
10000 100998         1231233

I would like to add a new row to the first column with a specific value “t # {int}” above the specific value only if the next value in Column1 is not the same as the previous one. Below the output that I want to get

    Column1         Column2'
0    t # 0           NULL
1    10001          252207
2    t # 1           NULL
3    100018         219559
4    t # 2           NULL
5    100088         251102
6    100088         107320
7    t # 3           NULL
8    100111         250975
9    100111         28540
10    t # 4           NULL
11    100112        252253
12    t # 5          NULL
13    100157        17883
...   ...            ...
end-3  t # {int}    NULL
end-2  100998       1231233
end-1  100998       3333
end    100998       4123

What I’m trying to do is first create a new dataframe based on the Column1, and then add what I want

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

   with open("week-1-algorithm.txt", "r") as f:
        text = [line.split() for line in f]

    df = pandas.DataFrame(
        text,
        columns=["Column1", "Column2"],
    )

    new_df = df["Column1"].copy()
    iteration_number = 0
    for i in range(len(new_df)):
        if (new_df[i] != new_df[i+1]):
            new_df.loc[i+1]= f't # {j}'
            iteration_number += 1

Could anyone help me on how I can do this? All I get is overwriting data, not adding it.

>Solution :

Assuming that your dataframe is already sorted, you can group by Column1, then add a header row to each group:

frames = [
    subframe
    for i, (_, group) in enumerate(df.groupby("Column1"))
    for subframe in [
        pd.DataFrame([f"t # {i}"], columns=["Column1"]),
        group,
    ]
]

result = pd.concat(frames, ignore_index=True)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading