Here
+--------+-------------+
| subs_no|airport_score|
+--------+-------------+
|10385193| 1.85|
|10003076| 138.75|
|10100559| 382.95|
|10867116| 37.0|
|10164103| 12.95|
|10090458| 25.9|
|11049702| 12.95|
|10128459| 7.4|
|10064536| 5.55|
|10153463| 51.8|
|10040542| 3.7|
|10108980| 51.8|
|10003439| 14.8|
|10003375| 7.4|
|10012363| 29.6|
|10009808| 11.1|
|10001949| 1.85|
|10031025| 49.95|
|11020659| 3.7|
|10050972| 44.4|
+--------+-------------+
Here’s what I want, all score more than 100 become 100
+--------+-------------+
| subs_no|airport_score|
+--------+-------------+
|10385193| 1.85|
|10003076| 100|
|10100559| 100|
|10867116| 37.0|
|10164103| 12.95|
|10090458| 25.9|
|11049702| 12.95|
|10128459| 7.4|
|10064536| 5.55|
|10153463| 51.8|
|10040542| 3.7|
|10108980| 51.8|
|10003439| 14.8|
|10003375| 7.4|
|10012363| 29.6|
|10009808| 11.1|
|10001949| 1.85|
|10031025| 49.95|
|11020659| 3.7|
|10050972| 44.4|
+--------+-------------+
>Solution :
You can easily do this with a when-otherwise
statement
Data Preparation
df = pd.DataFrame({
'airport_score':[i for i in range(0,200,10)],
})
sparkDF = sql.createDataFrame(df)
sparkDF.show()
+-------------+
|airport_score|
+-------------+
| 0|
| 10|
| 20|
| 30|
| 40|
| 50|
| 60|
| 70|
| 80|
| 90|
| 100|
| 110|
| 120|
| 130|
| 140|
| 150|
| 160|
| 170|
| 180|
| 190|
+-------------+
Case When
sparkDF = sparkDF.withColumn('airport_score'
,F.when(F.col('airport_score') >= 100,100
).otherwise(F.col('airport_score'))
)
sparkDF.show()
+-------------+
|airport_score|
+-------------+
| 0|
| 10|
| 20|
| 30|
| 40|
| 50|
| 60|
| 70|
| 80|
| 90|
| 100|
| 100|
| 100|
| 100|
| 100|
| 100|
| 100|
| 100|
| 100|
| 100|
+-------------+