Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Is in a column with special characters in python

I have a dataframe such as

COL1 
jcf7180001991334_2-HSPs_+__SP_1
jcf7180001991334:23992-26263(+):SP_2
jcf7180001988059:2889-4542(-):SP_3

and a list :

the_list['jcf7180001991334_2-HSPs_+__SP_1','not_in_tab1','jcf7180001991334:23992-26263(+):SP_2','not_intab2','not_intab3','jcf7180001988059:2889-4542(-):SP_3'] 

and by iterating over that list such as :

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

for element in the_list:
 if element in df['COL1']:
  print(element, " in df")
 else:
  print(element, " not in df")

I should then get the following output :

jcf7180001991334_2-HSPs_+__SP_1 in df 
not_in_tab1 not in df
jcf7180001991334:23992-26263(+):SP_2 in df
not_intab2 not in df
not_intab3 not in df
jcf7180001988059:2889-4542(-):SP_3 in df

But instead I cannot fint any o them in the df and i get :

jcf7180001991334_2-HSPs_+__SP_1 not in df 
not_in_tab1 not in df
jcf7180001991334:23992-26263(+):SP_2 not in df
not_intab2 not in df
not_intab3 not in df
jcf7180001988059:2889-4542(-):SP_3 not in df

I guess it is because of the special characters within the element such as parentheses and + or - ? Does someone know how to deal with that ?

>Solution :

By default, in checks whether the value is in the index.

Then, you may look in the values like this df['COL1'].values

import pandas as pd
data = {
  "COL1": ['jcf7180001991334_2-HSPs_+__SP_1', 'jcf7180001991334:23992-26263(+):SP_2', 'jcf7180001988059:2889-4542(-):SP_3']}

df = pd.DataFrame(data)

the_list=['jcf7180001991334_2-HSPs_+__SP_1', 'not_in_tab1', 'jcf7180001991334:23992-26263(+):SP_2', 'not_intab2', 'not_intab3','jcf7180001988059:2889-4542(-):SP_3'] 

for element in the_list:

 if element in df['COL1'].values: # Here You should look in The values
  print(element, " in df")
 else:
  print(element, " not in df")

[Output]

jcf7180001991334_2-HSPs_+__SP_1  in df
not_in_tab1  not in df
jcf7180001991334:23992-26263(+):SP_2  in df
not_intab2  not in df
not_intab3  not in df
jcf7180001988059:2889-4542(-):SP_3  in df
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading