Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Transformation with spark

I have such a data table, the number of records exceeds a million rows, the unique fields attr_id, attr_name are not more than 20

object_id attr_id attr_name value
7315dcbf-6178-4aa7-97ba-8875e2542471 f5af0127-fd4c-409b-afda-b896a80f7fe1 name Oleg
7315dcbf-6178-4aa7-97ba-8875e2542471 ebdd03e5-a380-46bd-b505-f1f166be6409 age 22
7315dcbf-6178-4aa7-97ba-8875e2542471 735fab29-cde8-4b34-979f-864b375f820f number 22222
50b9d918-b0ab-41d4-957c-24f506ceb737 f5af0127-fd4c-409b-afda-b896a80f7fe1 name Max
50b9d918-b0ab-41d4-957c-24f506ceb737 ebdd03e5-a380-46bd-b505-f1f166be6409 age
50b9d918-b0ab-41d4-957c-24f506ceb737 735fab29-cde8-4b34-979f-864b375f820f number 32222

I want to transform it to this format

object_id name age number
7315dcbf-6178-4aa7-97ba-8875e2542471 Oleg 22 22222
50b9d918-b0ab-41d4-957c-24f506ceb737 Max 32222

Is it possible to do this using spark, if so, how, or what material to read?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can simply pivot the dataframe as follows:

from pyspark.sql import functions as F
dataframe.groupBy("object_id")\
         .pivot("attr_name")\
         .agg(F.first(F.col("value")))\
         .show(truncate=False)
+------------------------------------+----+----+------+
|object_id                           |age |name|number|
+------------------------------------+----+----+------+
|50b9d918-b0ab-41d4-957c-24f506ceb737|null|Max |32222 |
|7315dcbf-6178-4aa7-97ba-8875e2542471|22  |Oleg|22222 |
+------------------------------------+----+----+------+
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading