I have a pandas dataframe with a few columns. I want to convert one of the string columns into an array of strings with fixed length.
Here is how current table looks like:
+-----+--------------------+--------------------+
|col1 | col2 | col3 |
+-----+--------------------+--------------------+
| 1 |Marco | LITMATPHY |
| 2 |Lucy | NaN |
| 3 |Andy | CHMHISENGSTA |
| 4 |Nancy | COMFRNPSYGEO |
| 5 |Fred | BIOLIT |
+-----+--------------------+--------------------+
How can I split string of "col 3" into array of string of length 3 as follows:
PS: There can be blanks or NaN in the col 3 and they should be replaced with empty array.
+-----+--------------------+----------------------------+
|col1 | col2 | col3 |
+-----+--------------------+----------------------------+
| 1 |Marco | ['LIT','MAT','PHY] |
| 2 |Lucy | [] |
| 3 |Andy | ['CHM','HIS','ENG','STA'] |
| 4 |Nancy | ['COM','FRN','PSY','GEO'] |
| 5 |Fred | ['BIO','LIT'] |
+-----+--------------------+----------------------------+
>Solution :
Use textwrap.wrap:
import textwrap
df['col3'].apply(lambda x: textwrap.wrap(x, 3) if pd.notna(x) else [])
If there are string whose lenghts aren’t the multiple of 3, the remaining letters will be pushed to the last. If you only want to have strings of lenght 3, you can apply one more to get rid of those strings:
df['col3'].apply(lambda x: textwrap.wrap(x, 3) if pd.notna(x) else []).\
apply(lambda x: x[:-1] if len(x[-1]) % 3 != 0 else x)