Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Splitting a column to create new columns

I have a dataframe df and a column LongColumn that that has rows like this:

ABC.DEF.GHI.Address.Street.Phone.Email.Category1.Beta.MyResults.New
ABC.DEG.GGI.Address.Postcode.Phone.Email.Category2.Alpha.MyResults.Old
ABC.DEG.GGI.JFK.Address.Postcode.Phone.Email.Category3.Alpha.MyResults.Old
DEG.III.JFK.Address.Postcode.Phone.Email.Category2.Beta.MyResults.Old

I am only interested in the rows that contain MyResults I want to take the three parts Category1.Beta.MyResults, Category2.Alpha.MyResults etc. and make three columns out of them, but since there is a different number of "words" in every string, I want to take them bottom-up:

from pyspark.sql import functions as F
out_df = df.withColumn('class', split(df['LongColumn'], '\.').getItem(-3)).withColumn('object', split(df['LongColumn'], '\.').getItem(-2)).withColumn('title', split(df['LongColumn'], '\.').getItem(-1))

But this leads to empty new columns, so the syntax might be wrong. How do I take them bottom up?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Use expr() to select the elements within the array dynamically – (length of array – 4), (length of array – 3), (length of array – 2).

data_sdf. \
    filter(func.upper(func.col('long_column')).like('%MYRESULT%')). \
    withColumn('long_col_arr', func.split('long_column', '\.')). \
    withColumn('categories', func.expr('long_col_arr[size(long_col_arr) - 4]')). \
    withColumn('letters', func.expr('long_col_arr[size(long_col_arr) - 3]')). \
    withColumn('results', func.expr('long_col_arr[size(long_col_arr) - 2]')). \
    show(truncate=False)

# +----------------------------------------------------------------------+----------------------------------------------------------------------------------+----------+-------+---------+
# |long_column                                                           |long_col_arr                                                                      |categories|letters|results  |
# +----------------------------------------------------------------------+----------------------------------------------------------------------------------+----------+-------+---------+
# |ABC.DEF.GHI.Address.Street.Phone.Email.Category1.Beta.MyResults.New   |[ABC, DEF, GHI, Address, Street, Phone, Email, Category1, Beta, MyResults, New]   |Category1 |Beta   |MyResults|
# |ABC.DEG.GGI.Address.Postcode.Phone.Email.Category2.Alpha.MyResults.Old|[ABC, DEG, GGI, Address, Postcode, Phone, Email, Category2, Alpha, MyResults, Old]|Category2 |Alpha  |MyResults|
# +----------------------------------------------------------------------+----------------------------------------------------------------------------------+----------+-------+---------+
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading