Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

pyspark retrieve variable numbers of values from a dataframe

using pyspark I have reached a point where I can no longer move forward.
I have a table that passes me the name of certain fields separated by a hyphen (-), the number of these fields is variable.
I need to find a way to go and read (and concatenate with each other) the various values of these fields in a predetermined table.

Assuming that the field names are in a "columnsname" variable and the table (Dataframe) is called df, how can I solve this problem?

columnsnames = columnsnames1.split("-")
df = spark.read.parquet(path_table + table_name)

EDIT: I need to read the values of the columnsnames, I tried doing

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

for c in columnsnames:
F.col(c)

but it didn’t work

>Solution :

For can use concat after upacking the list of columnsnames using *.

import pyspark.sql.functions as F


df = spark.createDataFrame([('abcd','123', '456')], ['s', 'd', 'f'])

df.select(*[columnsnames]).select(F.concat(*[F.col(colname) for colname in columnsnames])).show()

Output

+---------------+
|concat(s, d, f)|
+---------------+
|     abcd123456|
+---------------+
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading