Home How to check valid number in pandas dataframe column?

Questions

How to check valid number in pandas dataframe column?

February 10, 2022

I am working on a project for university in which I received a dataframe:

      import pandas as pd

      df = pd.DataFrame({'id': [0,1,2,3,4,5,6,7,8],
                         'number': ['00005485', '9999', '11111', '458426', '9999999999',
                                    '11111111', '800000', '99999', '1111']})

      print(df)

      id    number
      0    00005485
      1    9999
      2    11111
      3    458426
      4    9999999999
      5    11111111
      6    800000
      7    99999
      8    1111

I would like to generate a dataframe with only the lines that have a valid ‘number’. For a ‘number’ to be considered valid, it cannot have more than four repeating digits that are the same.

I made a code using filter by ‘number’ column as follows:

      df_result = df[(df['number'] != '9999') & (df['number'] != '99999') &
                     (df['number'] != '999999') & (df['number'] != '9999999') &
                     (df['number'] != '99999999') & (df['number'] != '999999999') &
                     (df['number'] != '9999999999') & (df['number'] != '9999999999') &
                     (df['number'] != '1111') & (df['number'] != '11111') &
                     (df['number'] != '111111') & (df['number'] != '1111111') &
                     (df['number'] != '111111111') & (df['number'] != '11111111')]

       print(df_result)

       id   number
       0    00005485
       3    458426
       6    800000

The output is correct. However, I would like to discover another way to do this operation on the dataframe.

>Solution :

You can use regular expression for this, in particular, \1, which matches the first group:

valid = df[~df['number'].astype(str).str.match(r'^(\d)\1{3,}$')]

So that basically says to select all rows that have the start of the string (^), a digit (\d), and then 3+ more ({3,}) of that same digit (\1), and then the end of the string ($).

Output:

>>> valid
     number
0  00005485
3    458426
6    800000

If you want to select invalid numbers instead of valid ones, remove the tilde ~ in the condition:

invalid = df[df['number'].astype(str).str.match(r'^(\d)\1{3,}$')]

Output:

>>> invalid
       number
1        9999
2       11111
4  9999999999
5    11111111
7       99999
8        1111

dataframe

byMR

Published February 10, 2022

Add a comment

How to use forEach() method in Java 8 loop n+1 time but my list size is random like n, n-1, n-2

byMR

February 10, 2022

Questions

Show Only NaT rows in python

byMR

February 10, 2022

Questions

What is this -Command error and how do i get rid of it?

byMR

February 10, 2022

Questions

I am making a php login system, and I don't know what is going wrong

byMR

February 10, 2022

Questions

evaluate `resin(x)` function in string with regular expression

byMR

February 10, 2022

Questions

How to check inner part of 3D array numpy

byMR

February 10, 2022

How to check valid number in pandas dataframe column?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

How to use forEach() method in Java 8 loop n+1 time but my list size is random like n, n-1, n-2

Show Only NaT rows in python

What is this -Command error and how do i get rid of it?

I am making a php login system, and I don't know what is going wrong

evaluate `resin(x)` function in string with regular expression

How to check inner part of 3D array numpy

Keep Up to Date with the Most Important News

How to check valid number in pandas dataframe column?

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

How to use forEach() method in Java 8 loop n+1 time but my list size is random like n, n-1, n-2

Show Only NaT rows in python

What is this -Command error and how do i get rid of it?

I am making a php login system, and I don't know what is going wrong

evaluate `resin(x)` function in string with regular expression

How to check inner part of 3D array numpy

Discover more from Dev solutions