Replacing numerical value with certain number in pyspark

Here

+--------+-------------+
| subs_no|airport_score|
+--------+-------------+
|10385193|         1.85|
|10003076|       138.75|
|10100559|       382.95|
|10867116|         37.0|
|10164103|        12.95|
|10090458|         25.9|
|11049702|        12.95|
|10128459|          7.4|
|10064536|         5.55|
|10153463|         51.8|
|10040542|          3.7|
|10108980|         51.8|
|10003439|         14.8|
|10003375|          7.4|
|10012363|         29.6|
|10009808|         11.1|
|10001949|         1.85|
|10031025|        49.95|
|11020659|          3.7|
|10050972|         44.4|
+--------+-------------+

Here’s what I want, all score more than 100 become 100

+--------+-------------+
| subs_no|airport_score|
+--------+-------------+
|10385193|         1.85|
|10003076|          100|
|10100559|          100|
|10867116|         37.0|
|10164103|        12.95|
|10090458|         25.9|
|11049702|        12.95|
|10128459|          7.4|
|10064536|         5.55|
|10153463|         51.8|
|10040542|          3.7|
|10108980|         51.8|
|10003439|         14.8|
|10003375|          7.4|
|10012363|         29.6|
|10009808|         11.1|
|10001949|         1.85|
|10031025|        49.95|
|11020659|          3.7|
|10050972|         44.4|
+--------+-------------+

>Solution :

You can easily do this with a when-otherwise statement

Data Preparation

df = pd.DataFrame({
        'airport_score':[i for i in range(0,200,10)],    
})

sparkDF = sql.createDataFrame(df)

sparkDF.show()

+-------------+
|airport_score|
+-------------+
|            0|
|           10|
|           20|
|           30|
|           40|
|           50|
|           60|
|           70|
|           80|
|           90|
|          100|
|          110|
|          120|
|          130|
|          140|
|          150|
|          160|
|          170|
|          180|
|          190|
+-------------+

Case When

sparkDF = sparkDF.withColumn('airport_score'
                             ,F.when(F.col('airport_score') >= 100,100
                                        ).otherwise(F.col('airport_score'))
                            )

sparkDF.show()

+-------------+
|airport_score|
+-------------+
|            0|
|           10|
|           20|
|           30|
|           40|
|           50|
|           60|
|           70|
|           80|
|           90|
|          100|
|          100|
|          100|
|          100|
|          100|
|          100|
|          100|
|          100|
|          100|
|          100|
+-------------+

Leave a Reply