Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Extract weekly data from daily and reshape it from long to wide format using Pandas

Given a sample data as follows, I hope to extract one data entry for each week, if for the week having multiple entries, then I will use the largest weekday’s data as for that week:

          date  variable      value
0    2020-11-4  quantity   564.0000
1   2020-11-11  quantity   565.0000
2   2020-11-18  quantity   566.0000
3   2020-11-25  quantity   566.0000
4    2020-11-2     price  1829.1039
5    2020-11-3     price  1789.5883
6    2020-11-4     price  1755.4307
7    2020-11-5     price  1750.0727
8    2020-11-6     price  1746.7239
9    2020-11-9     price  1756.1005
10  2020-11-10     price  1752.0820
11  2020-11-11     price  1814.3693
12  2020-11-12     price  1833.7922
13  2020-11-13     price  1833.7922
14  2020-11-16     price  1784.2302
15  2020-11-17     price  1764.1376
16  2020-11-18     price  1770.1654
17  2020-11-19     price  1757.4400
18  2020-11-20     price  1770.1654

To get week number of each date, I use df['week_number'] = pd.to_datetime(df['date']).dt.week.

          date  variable      value  week_number
0    2020-11-4  quantity   564.0000           45  --> to keep
1   2020-11-11  quantity   565.0000           46  --> to keep
2   2020-11-18  quantity   566.0000           47  --> to keep
3   2020-11-25  quantity   566.0000           48  --> to keep
4    2020-11-2     price  1829.1039           45
5    2020-11-3     price  1789.5883           45
6    2020-11-4     price  1755.4307           45
7    2020-11-5     price  1750.0727           45
8    2020-11-6     price  1746.7239           45  --> to keep, since it's the largest weekday for this week
9    2020-11-9     price  1756.1005           46
10  2020-11-10     price  1752.0820           46
11  2020-11-11     price  1814.3693           46
12  2020-11-12     price  1833.7922           46
13  2020-11-13     price  1833.7922           46  --> to keep, since it's the largest weekday for this week
14  2020-11-16     price  1784.2302           47
15  2020-11-17     price  1764.1376           47
16  2020-11-18     price  1770.1654           47
17  2020-11-19     price  1757.4400           47
18  2020-11-20     price  1770.1654           47  --> to keep, since it's the largest weekday for this week

Finally, I will reshape rows indicating to_keep to the expected result as follow:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

   variable  the_45th_week  the_46th_week  the_47th_week  the_48th_week
0  quantity       564.0000       565.0000       566.0000          566.0
1     price      1756.1005      1833.7922      1770.1654            NaN

How could I manipulate data to get the expected result? Sincere thanks.

EDIT:

df = df.sort_values(by=['variable','date'], ascending=False)
df.drop_duplicates(['variable', 'week_number'], keep='last')

Out:

          date  variable      value  week_number
0    2020-11-4  quantity   564.0000           45
3   2020-11-25  quantity   566.0000           48
2   2020-11-18  quantity   566.0000           47
1   2020-11-11  quantity   565.0000           46
4    2020-11-2     price  1829.1039           45
14  2020-11-16     price  1784.2302           47
10  2020-11-10     price  1752.0820           46

>Solution :

In your solution is possible add pivot with rename:

df['week_number'] = pd.to_datetime(df['date']).dt.week
df = df.sort_values(by=['variable','date'], ascending=False)
df = df.drop_duplicates(['variable', 'week_number'], keep='last')

f = lambda x: f'the_{x}th_week'
df = df.pivot('variable','week_number','value').rename(columns=f)
print (df)
week_number  the_45th_week  the_46th_week  the_47th_week  the_48th_week
variable                                                               
price            1829.1039       1752.082      1784.2302            NaN
quantity          564.0000        565.000       566.0000          566.0

Or remove DataFrame.drop_duplicates, so is possible use DataFrame.pivot_table with aggregate function last:

df['week_number'] = pd.to_datetime(df['date']).dt.week
df = df.sort_values(by=['variable','date'], ascending=False)

f = lambda x: f'the_{x}th_week'
df = df.pivot_table(index='variable',columns='week_number',values='value', aggfunc='last').rename(columns=f)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading