Home PySpark: Get first character of each word in string

Questions

PySpark: Get first character of each word in string

January 25, 2022

For an assignment I have been asked to shorten the names of clients to only the first letter of each name where they are separated by a space character.

I found a lot of solutions for this in Python, but I am not able to translate this to a dataframe.

The DF looks like this:

| ID       | Name           | 
| -------- | -------------- |
| 1        | John Doe       |
| 2        | Roy Lee Winters|
| 3        | Mary-Kate Baron|

My desired output would be:

| ID | Name | Shortened_name| 
| -------- | -------- | -------------- |
| 1 | John Doe | JD |
| 2 | Roy Lee Winters | RLW |
| 3 | Mary-Kate Baron | MB |

I’ve had some result with the code below but this is not working when there are more then 2 names. I would also like to have some more ‘flexible’ code as some people have 4 or 5 names where others only have 1.

df.withColumn("col1", F.substring(F.split(F.col("Name"), " ").getItem(0), 1, 1))\
  .withColumn("col2", F.substring(F.split(F.col("Name"), " ").getItem(1), 1, 1))\
  .withColumn('Shortened_name', F.concat('col1', 'col2'))

>Solution :

You can split the Name column then use transform function on the resulting array to get first letter of each element:

from pyspark.sql import functions as F

df = spark.createDataFrame([(1, "John Doe"), (2, "Roy Lee Winters"), (3, "Mary-Kate Baron")], ["ID", "Name"])

df1 = df.withColumn(
    "Shortened_name",
    F.array_join(F.expr("transform(split(Name, ' '), x -> left(x, 1))"), "")
)

df1.show()
# +---+---------------+--------------+
# | ID|           Name|Shortened_name|
# +---+---------------+--------------+
# |  1|       John Doe|            JD|
# |  2|Roy Lee Winters|           RLW|
# |  3|Mary-Kate Baron|            MB|
# +---+---------------+--------------+

Or by using aggregate function:

df1 = df.withColumn(
    "Shortened_name",
    F.expr("aggregate(split(Name, ' '), '', (acc, x) -> acc || left(x, 1))")
)

apache-spark-sql

byMR

Published January 25, 2022

Add a comment

Two elements aren't centered in mobile view

byMR

January 25, 2022

Questions

Can't fetch data from SQLite database in AS

byMR

January 25, 2022

Questions

How to Xpath from Form

byMR

January 25, 2022

Questions

How do I delete a row when two dataframes in pandas do not have the same length? – python

byMR

January 25, 2022

Questions

Every distinct Date between DateA and Date B -TSQL

byMR

January 25, 2022

Questions

Filter function with 2 array sets

byMR

January 25, 2022

PySpark: Get first character of each word in string

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Two elements aren't centered in mobile view

Can't fetch data from SQLite database in AS

How to Xpath from Form

How do I delete a row when two dataframes in pandas do not have the same length? – python

Every distinct Date between DateA and Date B -TSQL

Filter function with 2 array sets

Keep Up to Date with the Most Important News

PySpark: Get first character of each word in string

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Two elements aren't centered in mobile view

Can't fetch data from SQLite database in AS

How to Xpath from Form

How do I delete a row when two dataframes in pandas do not have the same length? – python

Every distinct Date between DateA and Date B -TSQL

Filter function with 2 array sets

Discover more from Dev solutions