Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pandas DataFrame: .replace() and .strip() methods returning NaN values

I read a pdf file into a DataFrame using tabula and used .concat() to combine it all into one DataFrame by doing the following:

import pandas as pd
import tabula

df = tabula.read_pdf('card_details.pdf', pages='all')
df = pd.concat(df, ignore_index=True)

I want to clean some of this data as a column which contains card numbers also has some non-numeric characters (question marks) in it. I’ve tried using .replace() and .strip() to remove these in a DataFrame that I made myself and it worked.

df['card_number'] = df['card_number'].str.strip('?')

or

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df['card_number'] = df['card_number'].str.replace(r'\D+', '')

However, when I use it on this specific DataFrame that I read from the pdf, it returns NaN values for most of the data. Here’s some screenshots of the DataFrame before and after.

DataFrame before cleaning

DataFrame after cleaning

Out of 15309 rows, only 2400 are not NaN – there are only around 50 rows that contain non-numeric values in them. So I really don’t understand what’s happening here as the card numbers without non-numeric characters are becoming null. Any ideas on what I may be doing wrong?

>Solution :

This happens when there is actual numerical data in the table. You can cast all the data to string first:


df['card_number'] = df['card_number'].astype(str).str.replace(r'\D+', '')
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading