Advertisements
Consider the following real dataset
import requests
import zipfile
import io
import pandas as pd
import numpy as np
from datetime import datetime
from dateutil import relativedelta
url = 'http://qed.econ.queensu.ca/jae/datasets/hsiao003/hcw-data.zip'
filename = 'hcw-data.txt'
r = requests.get(url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall()
df = (pd.read_csv(url, sep='\t+', header=None, engine='python')
.stack().rename_axis(['time', 'id']).rename('gdp').reset_index()
.assign(time=lambda x: x['time'] + 1))
df['id'] = df['id']+1
df = df[['id', 'time', 'gdp']]
df = df.sort_values(by=['id', 'time'])
df = df.reset_index()
start_date = datetime.strptime("1993-01-01", "%Y-%m-%d")
# periods means how many dates you want
date_list = pd.date_range(start_date, periods=61, freq='Q')
df['dates'] = pd.DataFrame({'dates': date_list})
Here we have a dataset of GDP per capita across 25 nations. I want to generate a column of the dataframe, for each unit (id
) that reflects the quarter-year of the time series. So, each id
from 1…25 would have 1993q1…2008q1 indexed to time
1…61. My current code returns the dataframe
index id time gdp dates
0 0 1 1 0.0620 1993-03-31
1 25 1 2 0.0590 1993-06-30
2 50 1 3 0.0580 1993-09-30
3 75 1 4 0.0620 1993-12-31
4 100 1 5 0.0790 1994-03-31
... .. ... ... ...
1520 1424 25 57 0.1110 NaT
1521 1449 25 58 0.1167 NaT
1522 1474 25 59 0.1002 NaT
1523 1499 25 60 0.1017 NaT
1524 1524 25 61 0.1238 NaT
[1525 rows x 5 columns]
This I guess is part of the way there, but how would I do this for each unit in question such that there are no missing values for the dates
column?
>Solution :
Try:
df['dates'] = (pd.to_datetime("1993-01-01") + pd.tseries.offsets.QuarterEnd() * df['time']).dt.to_period('Q').dt.strftime('%Yq%q')
print(df)
Prints:
...
1274 25 51 0.104000 2005q3
1299 25 52 0.104000 2005q4
1324 25 53 0.104000 2006q1
1349 25 54 0.110000 2006q2
1374 25 55 0.108000 2006q3
1399 25 56 0.111000 2006q4
1424 25 57 0.111000 2007q1
1449 25 58 0.116700 2007q2
1474 25 59 0.100200 2007q3
1499 25 60 0.101700 2007q4
1524 25 61 0.123800 2008q1