Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to filter rows based on cell contents in a row-based expression

I read some data from a file. The first column is assigned ‘object’ type because of the XXX in the very first data row:

tips = pd.read_csv("tips.csv")
print(tips.head())
print(tips.info())

total_bill   tip     sex smoker  day    time  size    
0        xxx  1.01  Female     No  Sun  Dinner     2    
1      10.34  1.66    Male     No  Sun  Dinner     3    
2      21.01  3.50    Male     No  Sun  Dinner     3    
3      23.68  3.31    Male     No  Sun  Dinner     2    
4      24.59  3.61  Female     No  Sun  Dinner     4    
<class 'pandas.core.frame.DataFrame'>    
RangeIndex: 244 entries, 0 to 243    
Data columns (total 7 columns):    
 #   Column      Non-Null Count  Dtype      
---  ------      --------------  -----      
 0   total_bill  244 non-null    object     
 1   tip         244 non-null    float64    
 2   sex         244 non-null    object     
 3   smoker      244 non-null    object     
 4   day         244 non-null    object     
 5   time        244 non-null    object     
 6   size        244 non-null    int64 

So, this will fail because of that one XXX in the first row of data where a number should be:

tips['tip_pct'] = tips['tip'] / (tips['total_bill'] - tips['tip'])

How do I rewrite the above line to filter out the bad row, without actually changing the contents of the DataFrame?

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

You can wrap the column that has the ‘xxx’ in pd.to_numeric using errors='coerce'. This will convert string type values to NaN so your operation can happen and your dataframe will be unchanged

tips['tip_pct'] = tips['tip'] / (pd.to_numeric(tips['total_bill'],errors='coerce') - tips['tip'])

  total_bill   tip     sex smoker  day time  size  Unnamed: 4   tip_pct
0        xxx  1.01  Female     No  Sun     Dinner           2       NaN
1      10.34  1.66    Male     No  Sun     Dinner           3  0.191244
2      21.01  3.50    Male     No  Sun     Dinner           3  0.199886
3      23.68  3.31    Male     No  Sun     Dinner           2  0.162494
4      24.59  3.61  Female     No  Sun     Dinner           4  0.172069
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading