Home Pandas DataFrame: .replace() and .strip() methods returning NaN values

Questions

Pandas DataFrame: .replace() and .strip() methods returning NaN values

March 13, 2023

I read a pdf file into a DataFrame using tabula and used .concat() to combine it all into one DataFrame by doing the following:

import pandas as pd
import tabula

df = tabula.read_pdf('card_details.pdf', pages='all')
df = pd.concat(df, ignore_index=True)

I want to clean some of this data as a column which contains card numbers also has some non-numeric characters (question marks) in it. I’ve tried using .replace() and .strip() to remove these in a DataFrame that I made myself and it worked.

df['card_number'] = df['card_number'].str.strip('?')

df['card_number'] = df['card_number'].str.replace(r'\D+', '')

However, when I use it on this specific DataFrame that I read from the pdf, it returns NaN values for most of the data. Here’s some screenshots of the DataFrame before and after.

Out of 15309 rows, only 2400 are not NaN – there are only around 50 rows that contain non-numeric values in them. So I really don’t understand what’s happening here as the card numbers without non-numeric characters are becoming null. Any ideas on what I may be doing wrong?

>Solution :

This happens when there is actual numerical data in the table. You can cast all the data to string first:


df['card_number'] = df['card_number'].astype(str).str.replace(r'\D+', '')

dataframe

byMR

Published March 13, 2023

Add a comment

Databases in SQL Server

byMR

March 13, 2023

Questions

Copy protobuf field to another protobuf in C++

byMR

March 13, 2023

Questions

I want to use JavaScript async/await

byMR

March 13, 2023

Questions

converting json date format from string to datetime format in python

byMR

March 13, 2023

Questions

Returning an Array in flatMap seems to produce the wrong TypeScript type

byMR

March 13, 2023

Questions

How to get jq output without extra characters?

byMR

March 13, 2023

Pandas DataFrame: .replace() and .strip() methods returning NaN values

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

Databases in SQL Server

Copy protobuf field to another protobuf in C++

I want to use JavaScript async/await

converting json date format from string to datetime format in python

Returning an Array in flatMap seems to produce the wrong TypeScript type

How to get jq output without extra characters?

Keep Up to Date with the Most Important News

Pandas DataFrame: .replace() and .strip() methods returning NaN values

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

Databases in SQL Server

Copy protobuf field to another protobuf in C++

I want to use JavaScript async/await

converting json date format from string to datetime format in python

Returning an Array in flatMap seems to produce the wrong TypeScript type

How to get jq output without extra characters?

Discover more from Dev solutions