Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

PySpark: How to filter on multiple columns coming from a list?

I have a dataframe that will get filtered (not null) on multiple columns. These columns are coming from a config file.
Lets say I have a config file that has filterCols: COLUMN_1,COLUMN_2...,COLUMN_10.
In my code I can hardcode the column names like:

df = dataDF.filter(~col("COLUMN_1").isNull() & 
                   ~col("COLUMN_2").isNull() &
                   ......
                   ~col("COLUMN_10").isNull())

But these columns can be removed, as and when required, or more columns can be added as well.
If I follow the above approach, I will have to change and deploy the code everytime.

Is there a way where I can loop over these columns and then filter?
I tried something like this:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

colList = ['COLUMN_1', 'COLUMN_2', ...,'COLUMN_10']
df = dataDF
for name in colList:
    df = df.filter(~col(name).isNull())

But df is showing zero records.

>Solution :

You could generate a query string based on your columns and use SparkSQL.

Example:

spark = SparkSession.builder.getOrCreate()
data = [
    {"a": 1, "b": 2, "c": 3},
    {"a": 1, "b": 2, "c": 3},
    {"a": 1, "b": 2, "c": 3, "d": 4},
]
df = spark.createDataFrame(data)
columns = ["a", "b", "c", "d"]
df.createTempView("table")
df = spark.sql(
    "SELECT * FROM table WHERE {}".format(
        " AND ".join(x + " IS NOT NULL" for x in columns)
    )
)

Result:

+---+---+---+---+                                                               
|a  |b  |c  |d  |
+---+---+---+---+
|1  |2  |3  |4  |
+---+---+---+---+
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading