Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Joining Function outputs with existing columns as new variable

I’m looking to do a conversion from datetime/pandas date’s storage system to the one SAS uses for the purpose of joining with a variable.

Basically would like to go from this:

ID  date
101 20-01-01
102 21-01-01
103 22-01-01

To this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

ID DateID  date
101 10114995 20-01-01
102 10214996 21-01-01
103 10314997 22-01-01

I’ve written a function to pass to each row:

def SASdate(datein):
        return str(math.trunc(pd.to_timedelta(pd.to_datetime(datein) -  pd.to_datetime('1960-1-1'), unit='D') / pd.Timedelta(days=1)))

But I believe that it is FAR from optimized as:

  1. It takes a long time to run on my actual dataset.
  2. The output doesn’t seem to concatenate as desired.

e.g.:

str(df['ID']) + df['date'].apply([SASdate])
Out: 
                                             SASdate
0  0    101\n1    102\n2    103\nName: ID, dtype:...
1  0    101\n1    102\n2    103\nName: ID, dtype:...
2  0    101\n1    102\n2    103\nName: ID, dtype:...

I’m sure there’s a lambda method or better way to concatenate these that I’m missing, but it’s not readily apparent to me.

Any suggestions on how to accomplish this would be appreciated.

Some code below to generate a sample dataframe:

df = pd.DataFrame({'ID':  [101,102,103],
        'date': ['20-01-01', '21-01-01', '22-01-01']
        })

>Solution :

Just use your function outside apply in a vectorized way:

def SASdate(datein):
    origin = pd.to_datetime('1960-1-1')
    return pd.to_datetime(datein, format='%d-%m-%y').sub(origin).dt.days

df['DateID'] = df['ID'].astype(str) + SASdate(df['date']).astype(str)

Output:

>>> df
    ID      date    DateID
0  101  20-01-01  10114995
1  102  21-01-01  10214996
2  103  22-01-01  10314997
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading