Is in a column with special characters in python

Advertisements

I have a dataframe such as

COL1 
jcf7180001991334_2-HSPs_+__SP_1
jcf7180001991334:23992-26263(+):SP_2
jcf7180001988059:2889-4542(-):SP_3

and a list :

the_list['jcf7180001991334_2-HSPs_+__SP_1','not_in_tab1','jcf7180001991334:23992-26263(+):SP_2','not_intab2','not_intab3','jcf7180001988059:2889-4542(-):SP_3'] 

and by iterating over that list such as :

for element in the_list:
 if element in df['COL1']:
  print(element, " in df")
 else:
  print(element, " not in df")

I should then get the following output :

jcf7180001991334_2-HSPs_+__SP_1 in df 
not_in_tab1 not in df
jcf7180001991334:23992-26263(+):SP_2 in df
not_intab2 not in df
not_intab3 not in df
jcf7180001988059:2889-4542(-):SP_3 in df

But instead I cannot fint any o them in the df and i get :

jcf7180001991334_2-HSPs_+__SP_1 not in df 
not_in_tab1 not in df
jcf7180001991334:23992-26263(+):SP_2 not in df
not_intab2 not in df
not_intab3 not in df
jcf7180001988059:2889-4542(-):SP_3 not in df

I guess it is because of the special characters within the element such as parentheses and + or - ? Does someone know how to deal with that ?

>Solution :

By default, in checks whether the value is in the index.

Then, you may look in the values like this df['COL1'].values

import pandas as pd
data = {
  "COL1": ['jcf7180001991334_2-HSPs_+__SP_1', 'jcf7180001991334:23992-26263(+):SP_2', 'jcf7180001988059:2889-4542(-):SP_3']}

df = pd.DataFrame(data)

the_list=['jcf7180001991334_2-HSPs_+__SP_1', 'not_in_tab1', 'jcf7180001991334:23992-26263(+):SP_2', 'not_intab2', 'not_intab3','jcf7180001988059:2889-4542(-):SP_3'] 

for element in the_list:

 if element in df['COL1'].values: # Here You should look in The values
  print(element, " in df")
 else:
  print(element, " not in df")

[Output]

jcf7180001991334_2-HSPs_+__SP_1  in df
not_in_tab1  not in df
jcf7180001991334:23992-26263(+):SP_2  in df
not_intab2  not in df
not_intab3  not in df
jcf7180001988059:2889-4542(-):SP_3  in df

Leave a Reply Cancel reply