Inserting row difference values in original DataFrame

November 7, 2023

I have a read.csv DataFrame which is continuously updating with addition of one new row on every run of my script which looks like ….

df = pd.read_csv(file_path)
print(df.to_string(index=False))

timestamp    Puts   Calls  PutCh  CallCh  ChDiff

09:41:12 AM 2027891 1820724 280101  200974   79127
09:48:51 AM 2075976 1862053 328186  242303   85883
09:58:48 AM 2091487 1885842 343697  266092   77605
10:08:21 AM 2091879 1918592 344089  298842   45247
02:26:00 PM 1995234 1941917 247444  322167  -74723
02:44:36 PM 1990071 1934874 242281  315124  -72843
02:56:17 PM 1970892 1938472 223102  318722  -95620

Now I want the difference of each succeeding row from previous one for which I have read about df.diff(). So I dropped the timestamp column to get a new datafame as df1 and wrote my script …

df1.diff()

and got my output as ….

    Puts   Calls    PutCh  CallCh    ChDiff
     NaN     NaN      NaN     NaN       NaN
 48085.0 41329.0  48085.0 41329.0    6756.0
 15511.0 23789.0  15511.0 23789.0   -8278.0
   392.0 32750.0    392.0 32750.0  -32358.0
-96645.0 23325.0 -96645.0 23325.0 -119970.0
 -5163.0 -7043.0  -5163.0 -7043.0    1880.0
-19179.0  3598.0 -19179.0  3598.0  -22777.0

Here I want these difference values are added to my original DataFrame(df) in bracket for every column. More elaborately my output should looks like(here timestamp column also should be there as in my df)….

Puts    Calls   PutCh   CallCh  ChDiff
2027891 1820724 280101  200974  79127
2075976 1862053 328186  242303  85883
(48085) (41329) (48085) (41329) (6756)
2091487 1885842 343697  266092  77605
(15511) (23789) (15511) (23789) (-8278)
2091879 1918592 344089  298842  45247
(392)   (32750) (392)   (32750) (-32358)

Is there any way to do same plz ??

>Solution :

Convert the output of diff to string, add the parentheses and concat back to the original, finally sort_index to reorganize the rows in order:

tmp = (df.drop(columns='timestamp').diff()
         .iloc[1:]
         .apply(lambda s: '('+s.astype(str)+')')
      )

out = pd.concat([df, tmp]).sort_index()

Output:

     timestamp        Puts      Calls       PutCh     CallCh       ChDiff
0  09:41:12 AM     2027891    1820724      280101     200974        79127
1  09:48:51 AM     2075976    1862053      328186     242303        85883
1          NaN   (48085.0)  (41329.0)   (48085.0)  (41329.0)     (6756.0)
2  09:58:48 AM     2091487    1885842      343697     266092        77605
2          NaN   (15511.0)  (23789.0)   (15511.0)  (23789.0)    (-8278.0)
3  10:08:21 AM     2091879    1918592      344089     298842        45247
3          NaN     (392.0)  (32750.0)     (392.0)  (32750.0)   (-32358.0)
4  02:26:00 PM     1995234    1941917      247444     322167       -74723
4          NaN  (-96645.0)  (23325.0)  (-96645.0)  (23325.0)  (-119970.0)
5  02:44:36 PM     1990071    1934874      242281     315124       -72843
5          NaN   (-5163.0)  (-7043.0)   (-5163.0)  (-7043.0)     (1880.0)
6  02:56:17 PM     1970892    1938472      223102     318722       -95620
6          NaN  (-19179.0)   (3598.0)  (-19179.0)   (3598.0)   (-22777.0)