Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Having the same index values when pivoting a dataframe from long to wide format gives an average value

Context: I’m trying to pivot a long format dataframe to a wide format dataframe, however, I’m noticing a weird pattern on the wide format dataframe. It seems that if we have repeated values for the index (in my case, a date), it’s almost like it’s giving me an average instead of repeating each index value and keeping the original values?

Here’s a minimal reproducible example:

    import datetime
    import pandas as pd
long_dataframe = pd.DataFrame({"Date": [
    datetime.datetime.strptime("01-01-2020", '%m-%d-%Y').date(),
    datetime.datetime.strptime("01-01-2020", '%m-%d-%Y').date(),
    datetime.datetime.strptime("01-02-2020", '%m-%d-%Y').date(),
    datetime.datetime.strptime("01-02-2020", '%m-%d-%Y').date(),
    datetime.datetime.strptime("01-03-2020", '%m-%d-%Y').date(),
    datetime.datetime.strptime("01-04-2020", '%m-%d-%Y').date(),
    datetime.datetime.strptime("01-04-2020", '%m-%d-%Y').date(),
    datetime.datetime.strptime("01-01-2020", '%m-%d-%Y').date(),
    datetime.datetime.strptime("01-01-2020", '%m-%d-%Y').date(),
    datetime.datetime.strptime("01-02-2020", '%m-%d-%Y').date(),
    datetime.datetime.strptime("01-02-2020", '%m-%d-%Y').date(),
    datetime.datetime.strptime("01-03-2020", '%m-%d-%Y').date(),
    datetime.datetime.strptime("01-04-2020", '%m-%d-%Y').date(),
    datetime.datetime.strptime("01-04-2020", '%m-%d-%Y').date()
], "A": [
    "category_X", "category_X", "category_X", "category_X", "category_X", "category_X", "category_X",
    "category_Y", "category_Y", "category_Y", "category_Y", "category_Y", "category_Y", "category_Y"], "Values": [30, 40, 20, 30, 40, 50, 60,25,30,42,54,21,23,30]})

wide_dataframe = long_dataframe.reset_index().pivot_table(
    index="Date", columns="A", values="Values")

wide_dataframe

Which gives me this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

A           category_X  category_Y
Date
2020-01-01        35.0        27.5
2020-01-02        25.0        48.0
2020-01-03        40.0        21.0
2020-01-04        55.0        26.5

How can I make it so that I see the repeated dates with their original values? Why is it that for 2020-01-01 its giving the value in between this date (30 and 40)?

Desired output would look something like this:

A           category_X    category_Y
Date
2020-01-01          30       ...
2020-01-01          40
2020-01-02          20
2020-01-02          30
2020-01-03          40
2020-01-04          50
2020-01-04          60

How can I do this while keeping duplicated indices?

I was thinking of giving each row a unique ID, but I’d really like to do this directly using the dates if possible (without creting any additional IDs)

Thank you!

>Solution :

pivot_table automatically aggregates. Since you cannot have duplicate values of the index after pivoting, you need to create a unique index. You can do this with groupby() and cumcount().

long_dataframe['count'] = long_dataframe.groupby('A').cumcount()

wide_dataframe = long_dataframe.pivot(index=['Date', 'count'], columns='A', values='Values') \
                               .reset_index() \
                               .drop('count', axis=1)

Output:

A        Date  category_X  category_Y
0  2020-01-01          30          25
1  2020-01-01          40          30
2  2020-01-02          20          42
3  2020-01-02          30          54
4  2020-01-03          40          21
5  2020-01-04          50          23
6  2020-01-04          60          30
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading