Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Create a dataframe in Pyspark using random values from a list

I need to convert this code into PySpark equivalent. I can not use pandas to create the dataframe.

This is how I create the dataframe using Pandas:

df['Name'] = np.random.choice(["Alex","James","Michael","Peter","Harry"], size=3)
df['ID'] = np.random.randint(1, 10, 3)
df['Fruit'] = np.random.choice(["Apple","Grapes","Orange","Pear","Kiwi"], size=3)

The dataframe should look like this in PySpark:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df

Name   ID  Fruit
Alex   3   Apple
James  6   Grapes
Harry  5   Pear

I have tried the following for 1 column:

sdf1 = spark.createDataFrame([(k,) for k in ['Alex','James', 'Harry']]).orderBy(rand()).limit(6).show()

>Solution :

You can first create pandas dataframe then convert it into Pyspark dataframe. Or you can zip the 3 random numpy arrays and create spark dataframe like this:

import numpy as np

names = [str(x) for x in np.random.choice(["Alex", "James", "Michael", "Peter", "Harry"], size=3)]
ids = [int(x) for x in np.random.randint(1, 10, 3)]
fruits = [str(x) for x in np.random.choice(["Apple", "Grapes", "Orange", "Pear", "Kiwi"], size=3)]

df = spark.createDataFrame(list(zip(names, ids, fruits)), ["Name", "ID", "Fruit"])

df.show()

#+-------+---+------+
#|   Name| ID| Fruit|
#+-------+---+------+
#|  Peter|  8|  Pear|
#|Michael|  7|  Kiwi|
#|  Harry|  4|Orange|
#+-------+---+------+
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading