How to grab last row of datetime in Pandas dataframe?

April 5, 2022

II currently have a very large .csv with 2 million rows. I’ve read in the csv and only have 2 columns, number and timestamp (in unix). My goal is to grab the last and largest number for each day (eg. 1/1/2021, 1/2/2021, etc.)

I have converted unix to datetime and used df.groupby(‘timestamp’).tail(1) but am still not able to return the last row per day. Am I using the groupby wrong?

import pandas as pd

def main():
    df = pd.read_csv('blocks.csv', usecols=['number', 'timestamp'])
    print(df.head())
    df['timestamp'] = pd.to_datetime(df['timestamp'],unit='s')
    x = df.groupby('timestamp').tail(1)
    print(x)

if __name__ == '__main__':
    main()

Desired Output:

number timestamp

11,509,218 2021-01-01

11,629,315 2021-01-02

11,782,116 2021-01-03

12,321,123 2021-01-04

…

>Solution :

The "problem" lies in the grouper, use .dt.date for correct grouping (assuming your data is already sorted):

x = df.groupby(df['timestamp'].dt.date).tail(1)
print(x)

pandas-groupby

byMR

Published April 05, 2022

Add a comment

What is the function for Varied amount input data for Python?

byMR

April 5, 2022

Questions

Setting up Google Drive API on NodeJS using a service account

byMR

April 5, 2022

Questions

Change <input type=color> value onChange

byMR

April 5, 2022

Questions

Python Dictionary Sort based on Value

byMR

April 5, 2022

Questions

How to add a new "top" row to a table in jquery without deleting the other rows

byMR

April 5, 2022

Questions

Pandas Dataframe select row based on a condition and the previous N rows that are previous the condition

byMR

April 5, 2022

How to grab last row of datetime in Pandas dataframe?