Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to substract rows of a Pandas dataframe based upon some conditions?

I am performing analysis on this dataset.

After using the code below, I am left with the cleaned version of the data.

covid_df.drop(columns = ["Sno", "Time"], inplace = True)
covid_df["State/UnionTerritory"] = covid_df["State/UnionTerritory"].replace({
    "Bihar****": "Bihar",
    "Maharashtra***": "Maharashtra", 
    "Madhya Pradesh***": "Madhya Pradesh", 
    "Karanataka": "Karnataka",
    "Telangana": "Telengana",
    "Himanchal Pradesh": "Himachal Pradesh",
    "Dadra and Nagar Haveli": "Dadra and Nagar Haveli and Daman and Diu",
    "Daman & Diu": "Dadra and Nagar Haveli and Daman and Diu"
    })

invalid_states = ["Cases being reassigned to states", "Unassigned"]

for invalid_state in invalid_states:
  invalid_state_index = covid_df.loc[covid_df["State/UnionTerritory"] == invalid_state, :].index
  covid_df.drop(index = invalid_state_index, inplace = True)
covid_df = covid_df.groupby(["State/UnionTerritory", "Date"], as_index = False).sum()
covid_df["Date"] = pd.to_datetime(covid_df["Date"])
covid_df.sort_values(by = ["State/UnionTerritory", "Date"], inplace = True)

This cleaned data has the cumulative cases for each State/UnionTerritory for each Date. How can I extract the daily new cases for each State/UnionTerritory?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

One way I have in mind is by subtracting the previous row from the current row if both have the same State/UnionTerritory value. What would be the most efficient way to do this?

Would be very appreciated if you could suggest a better way of cleaning the data.

>Solution :

You could use shift.
For example:

df = pd.DataFrame({'cumul': [0, 2, 3, 5, 7]})
df['quantity'] = df - df.shift(1)

quantity will be:

   quantity
0    NaN
1    2.0
2    1.0
3    2.0
4    2.0

You can then fillna or just change the zeroth value in quantity for the zeroth value in cumul.

Edit: prepare the dataframe by applying your conditions first 🙂

Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading