Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas data frame with time-series data – grouping without aggregating data

I have the following pandas dataframe:

    import pandas as pd
    df4 = pd.DataFrame({'timestamp':['2022-10-01 01:00:00', 
                                     '2022-10-02 01:00:00', 
                                     '2022-10-03 01:00:00', 
                                     '2022-10-04 01:00:00', 
                                     '2022-10-05 01:00:00',
                                     '2022-10-01 02:00:00', 
                                     '2022-10-02 02:00:00', 
                                     '2022-10-03 02:00:00', 
                                     '2022-10-04 02:00:00', 
                                     '2022-10-05 02:00:00'],
                       'A': [1,2,3,4,5,6,7,8,9,10],
                       'B': [10,9,8,7,6,5,4,3,2,1]}
                      )
    df4['timestamp'] = df4['timestamp'].astype('datetime64')
    df4

that gives the following data frame:

| timestamp          | A| B |
|--------------------|--| --|
| 2022-10-01 01:00:00| 1| 10|
| 2022-10-02 01:00:00| 2| 9 |
| 2022-10-01 01:00:00| 3| 8 |
| 2022-10-02 01:00:00| 4| 7 |
| 2022-10-01 01:00:00| 5| 6 |
| 2022-10-02 01:00:00| 6| 5 |
| 2022-10-01 01:00:00| 7| 4 |
| 2022-10-02 01:00:00| 8| 3 |
| 2022-10-01 01:00:00| 9| 2 |
| 2022-10-02 01:00:00| 10| 1|

I am trying to get a group on the day level of timestamp, across the y column, and without aggregation function, but with collecting all data in a new column (as a pandas series or NumPy array):

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

| timestamp  | no_name          | y    |
|------------| -----------------| -----|
| 2022-10-01 | 0 [1,3,5,7,9]    | A    |
| 2022-10-01 | 0 [10,8, 6, 4, 2]| B    |
| 2022-10-02 | 0 [2,4, 6, 8, 10]| A    |
| 2022-10-02 | 0 [9,7, 5, 3, 1] | B    |

I tried to follow the suggestions on the link Output pandas grouped data frame without aggregation :

but

    df6 = df4.groupby('timestamp').agg( lambda g: dict([(k, g[k].tolist()) for k in g]))

gives KeyError: 1

Also, I tried the following:

    df4.melt(id_vars=['timestamp'], var_name='target', value_name='value')

to get the target values in one column, the grouping across the timestamp (on hour level) and without aggregation did not give the desired result.

Any suggestions? What else could I try to solve it?

thx

>Solution :

Add this to your code:

df4['day'] = df4['timestamp'].dt.strftime('%Y-%m-%d')
grouped = df4.groupby('day')[['A', 'B']].agg(list)
result = grouped.melt(ignore_index=False)
print(result)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading