Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to save into .csv files based on weeks

I have a data set from a .csv file with header created_at,text & lable as Below

created_at,text,label
2021-07-24,Newzeland Wins the worldcup,Sport
2021-07-25,ABC Wins the worldcup,Sport
2021-07-26,Hello the worldcup,Sport
2021-07-27,Cricket worldcup,Sport
2021-07-28,Rugby worldcup,Sport
2021-07-29,LLL Wins,Sport
2021-07-30,MMM Wins the worldcup,Sport
2021-07-31,RRR Wins the worldcup,Sport
2021-08-01,OOO Wins the worldcup,Sport
2021-08-02,JJJ Wins the worldcup,Sport
2021-08-03,YYY Wins the worldcup,Sport
2021-08-04,KKK Wins the worldcup,Sport
2021-08-05,YYY Wins the worldcup,Sport
2021-08-06,GGG Wins the worldcup,Sport
2021-08-07,FFF Wins the worldcup,Sport
2021-08-08,SSS Wins the worldcup,Sport
2021-08-09,XYZ Wins the worldcup,Sport
2021-08-10,PQR Wins the worldcup,Sport

How to save these into .csv file based on weeks.
For example : I want to save into week1.csv file only the first 7 days values of above data set(from 2021-07-24 to 2021-07-30) & week2.csv(2021-07-31 to 2021-08-05) and so on

week1.csv

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

created_at,text,label
2021-07-24,Newzeland Wins the worldcup,Sport
2021-07-25,ABC Wins the worldcup,Sport
2021-07-26,Hello the worldcup,Sport
2021-07-27,Cricket worldcup,Sport
2021-07-28,Rugby worldcup,Sport
2021-07-29,LLL Wins,Sport
2021-07-30,MMM Wins the worldcup,Sport

>Solution :

IIUC you can compute a week period and use groupby:

group = pd.to_datetime(df['created_at']).dt.to_period('W-FRI')

for i, (g, d) in enumerate(df.groupby(group), start=1):
    print(f'saving week {i}: {g}')
    d.to_csv(f'week{i}.csv')

NB. using weeks ending on Fridays as period.

To compute this programatically from the first day use:

s = pd.to_datetime(df['created_at'])
dow = (s.iloc[0]-pd.Timedelta('1d')).strftime("%a")
group = s.dt.to_period(f'W-{dow}')

output:

saving week 1: 2021-07-24/2021-07-30
saving week 2: 2021-07-31/2021-08-06
saving week 3: 2021-08-07/2021-08-13

files:

week1.csv
   created_at                         text  label
0  2021-07-24  Newzeland Wins the worldcup  Sport
1  2021-07-25        ABC Wins the worldcup  Sport
2  2021-07-26           Hello the worldcup  Sport
3  2021-07-27             Cricket worldcup  Sport
4  2021-07-28               Rugby worldcup  Sport
5  2021-07-29                     LLL Wins  Sport
6  2021-07-30        MMM Wins the worldcup  Sport

week2.csv
    created_at                   text  label
7   2021-07-31  RRR Wins the worldcup  Sport
8   2021-08-01  OOO Wins the worldcup  Sport
9   2021-08-02  JJJ Wins the worldcup  Sport
10  2021-08-03  YYY Wins the worldcup  Sport
11  2021-08-04  KKK Wins the worldcup  Sport
12  2021-08-05  YYY Wins the worldcup  Sport
13  2021-08-06  GGG Wins the worldcup  Sport

week3.csv
    created_at                   text  label
14  2021-08-07  FFF Wins the worldcup  Sport
15  2021-08-08  SSS Wins the worldcup  Sport
16  2021-08-09  XYZ Wins the worldcup  Sport
17  2021-08-10  PQR Wins the worldcup  Sport
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading