Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How do I append new rows to a PySpark DataFrame guaranteeing a unique ID?

I have two PySpark DataFrame objects that I wish to concatenate. One of the DataFrames df_a has a column unique_id derived using pyspark.sql.functions.monotonically_increasing_id(). The other DataFrame, df_b does not. I want to append the rows of df_b to df_a, but I need to generate values for the unique_id column that do not coincide with any of the values in df_a.unique_id.

df_a = spark.createDataFrame(
    [
        (1, "a", 42949672960),
        (2, "b", 85899345920),
        (3, "c", 128849018880)
    ],
    ["number", "letter", "unique_id"]
)

df_b = spark.createDataFrame(
    [
        (3, "c"),
        (4, "c"),
        (5, "d")
    ],
    ["number", "letter"]
)
df_b = df_b.withColumn("unique_id", F.monotonically_increasing_id())

df = df_a.union(df_b)
df.show()

I looked to see if pyspark.sql.functions.monotonically_increasing_id() took a parameter enforcing a minimum value, but it does not.

One final thing to note, df_a is a massive DataFrame that needs to be appended to regularly. If I needed to reassign unique ids to df_a using a function other than pyspark.sql.functions.monotonically_increasing_id() to make a potential solution work long-term, I could do so once, but not every time I were to append new data.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Any direction would be appreciated—thank you!

>Solution :

You can always add a constant to monotonically_increasing_id():

n = df_a.select(F.max('unique_id').alias('max_n')).first().max_n
df_b = df_b.withColumn("unique_id", F.monotonically_increasing_id() + F.lit(n + 1))
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading