Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Splitting a string into elements creates a space that I can't remove

I have a data frame with a column for ‘genre’ with strings like ‘drama, comedy, action’.

I want to split the elements like this ‘drama’, ‘comedy’, ‘action’ so I’ve used;

Genre=[]

for genre_type in books['genre'].astype('str'):
    Genre.append(genre_type.split(','))
    
genre['genres_1']=genres_1

but, the result contains spaces between genres (other than the first one listed) like ‘drama’,’_comedy’,’_action’. (I used an underscore to represent the space because otherwise it’s hard to see).

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

so I tried

Genre_clean=[]
for x in books['genres_1'].astype('str'):
    Genre_clean.append(x.strip(' '))
Genre_clean

but the space remains, what am I doing wrong?

my full code is below;

import pandas as pd

# Creating sample dataframes
books = pd.DataFrame()
books['genre']=['drama, comedy, action', 'romance, sci-fi, drama','horror']

# Splitting genre
Genre=[]
for genre_type in books['genre'].astype('str'):
    Genre.append(genre_type.split(','))
    
books['genres_1']=Genre

# trying to remove the space
Genre_clean=[]
for x in books['genres_1'].astype('str'):
    Genre_clean.append(x.strip(' '))
Genre_clean

>Solution :

Don’t use traditional loops/list comprehension for pandas. Look up the equivalent, far more efficient, pandas specific function for whatever you want to do. Otherwise, there’s no reason to use pandas.

See: pandas str functions

books = pd.DataFrame()
books['genre']=['drama, comedy, action', 'romance, sci-fi, drama','horror']

books.genre = books.genre.str.split(', ')
print(books)

Output:

                      genre
0   [drama, comedy, action]
1  [romance, sci-fi, drama]
2                  [horror]

If you want this as a string, you can join the list again with:

books.genre = books.genre.str.join(',')
    # Or, all at once:
# books.genre = books.genre.str.split(', ').str.join(',')
    # Or, just replace spaces with nothing:
# books.genre = books.genre.str.replace(' ', '')
print(books)

# Output:

                  genre
0   drama,comedy,action
1  romance,sci-fi,drama
2                horror
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading