I have a dataframe and I need to identify values that contain numbers or symbols in order to eliminate them. Only letters and spaces are allowed. The size of the dataframe is quite big and what I am trying doesn’t result in anything:
df.NAME=df.NAME.replace(r"(/^[a-zA-Z\s]*$/)",np.nan,regex=True)
Any suggestions?
Thank you
>Solution :
If you need to only keep items with letters and spaces only, you need
df['NAME']=df[df['NAME'].str.contains(r"^[a-zA-Z\s]*$", np.nan, regex=True)]
That will keep all those items in NAME column that only contain ASCII letters or/and whitespaces.
To support any Unicode letters, you’d need
df['NAME']=df[df['NAME'].str.contains(r"^(?:[^\W\d_]|\s)*$", np.nan, regex=True)]
where (?:[^\W\d_]|\s) matches either any Unicode letter (together with most diacritics) or a whitespace char.