I have a pyspark dataframe df :-
| status | Flag |
|---|---|
| present | 1 |
| present | 0 |
| na | 1 |
| Void | 0 |
| present | 1 |
| notpresent | 0 |
| present | 0 |
| present | 0 |
| ok | 1 |
I want to update the Flag as 1 wherever we have status is present or ok :-
Expected :-
| status | Flag |
|---|---|
| present | 1 |
| present | 1 |
| na | 1 |
| Void | 0 |
| present | 1 |
| notpresent | 0 |
| present | 1 |
| present | 1 |
| ok | 1 |
>Solution :
You can do so using withColumn and a check using when. You recreate the Flag column setting it to 1 if status is ok or present, otherwise you keep the existing value.
from pyspark.sql.functions import when, col, lit
data = [
('present', 0),
('ok', 0),
('present', 1),
('void', 0),
('na', 1),
('notpresent', 0)
]
df = spark.createDataFrame(data, ['status', 'Flag'])
df.show()
df.withColumn('Flag', when(col('status').isin(['ok', 'present']), lit(1)).otherwise(col('Flag'))).show()
Output
+----------+----+
| status|Flag|
+----------+----+
| present| 0|
| ok| 0|
| present| 1|
| void| 0|
| na| 1|
|notpresent| 0|
+----------+----+
+----------+----+
| status|Flag|
+----------+----+
| present| 1|
| ok| 1|
| present| 1|
| void| 0|
| na| 1|
|notpresent| 0|
+----------+----+