I am attempting to read the first X number of rows of a delta table into a dataframe, and then write (overwrite) that back to the delta table. Here is code:
# read from entire delta table into dataframe revEnrichRef = spark.read.format("delta").load("/mnt/tables/myTable") # retrieve first 5 rows dfSubset = revEnrichRef.head(5) dfSubset.write.format("delta").mode("overwrite").save("/mnt/tables/myTable")
at this point I get the error: ‘list’ object has no attribute ‘write’
I guess that means head returns list rather than a new dateframe. What I really want is a solution that will return x rows to a dataframe. Alternatively, have a way to do this without an intermediary dataframe is just as good. Any help is appreciated. Thanks
You can do so with the limit method. This returns a dataframe limited to the number of rows passed as the argument.
dfSubset = revEnrichRef.limit(5)
The head method is an action which will collect 5 rows from your dataframe as a list. (or a single Row object if n = 1)