Generating Quarters by Unit in Dataframe

Advertisements

Consider the following real dataset

import requests
import zipfile
import io
import pandas as pd
import numpy as np

from datetime import datetime
from dateutil import relativedelta

url = 'http://qed.econ.queensu.ca/jae/datasets/hsiao003/hcw-data.zip'
filename = 'hcw-data.txt'

r = requests.get(url)
z = zipfile.ZipFile(io.BytesIO(r.content))
z.extractall()


df = (pd.read_csv(url, sep='\t+', header=None, engine='python')
        .stack().rename_axis(['time', 'id']).rename('gdp').reset_index()
        .assign(time=lambda x: x['time'] + 1))
df['id'] = df['id']+1

df = df[['id', 'time', 'gdp']]

df = df.sort_values(by=['id', 'time'])

df = df.reset_index()
start_date = datetime.strptime("1993-01-01", "%Y-%m-%d")

# periods means how many dates you want
date_list = pd.date_range(start_date, periods=61, freq='Q')

df['dates'] = pd.DataFrame({'dates': date_list})

Here we have a dataset of GDP per capita across 25 nations. I want to generate a column of the dataframe, for each unit (id) that reflects the quarter-year of the time series. So, each id from 1…25 would have 1993q1…2008q1 indexed to time 1…61. My current code returns the dataframe

      index  id  time     gdp      dates
0         0   1     1  0.0620 1993-03-31
1        25   1     2  0.0590 1993-06-30
2        50   1     3  0.0580 1993-09-30
3        75   1     4  0.0620 1993-12-31
4       100   1     5  0.0790 1994-03-31
    ...  ..   ...     ...        ...
1520   1424  25    57  0.1110        NaT
1521   1449  25    58  0.1167        NaT
1522   1474  25    59  0.1002        NaT
1523   1499  25    60  0.1017        NaT
1524   1524  25    61  0.1238        NaT

[1525 rows x 5 columns]

This I guess is part of the way there, but how would I do this for each unit in question such that there are no missing values for the dates column?

>Solution :

Try:

df['dates'] = (pd.to_datetime("1993-01-01") + pd.tseries.offsets.QuarterEnd() * df['time']).dt.to_period('Q').dt.strftime('%Yq%q')
print(df)

Prints:

...

1274  25    51  0.104000  2005q3
1299  25    52  0.104000  2005q4
1324  25    53  0.104000  2006q1
1349  25    54  0.110000  2006q2
1374  25    55  0.108000  2006q3
1399  25    56  0.111000  2006q4
1424  25    57  0.111000  2007q1
1449  25    58  0.116700  2007q2
1474  25    59  0.100200  2007q3
1499  25    60  0.101700  2007q4
1524  25    61  0.123800  2008q1

Leave a ReplyCancel reply