I have a df with journals. I have different journals.
I want to extract journals with titles below only
Blood, Cancer, Chest, Circulation, Diabetes, JAMA, Endocrinology, Gastroenterology, Gut, Medicine, Neurology, Pediatrics, Physical therapy, Radiology, Surgery, Geriatrics
Some journals have the same words – Blood circulation, Cancer History, etc. I do not want to select them.
Example
Id Title
1 Blood
2 Blood
3 Blood purification
4 Blood transfusion
5 Cancer
6 Chest
7 Cancer History
8 Chest Analysis
I want to keep the exact journal title and create new column "Influential", but cannot find the way with str.contains or str.match.
I am trying two approaches
df.loc[df['Title'].str.contains("Blood", case = True, na = False), 'Influential'] = 'Blood'
df.loc[df['Title'].str.match("Blood", case = True, na = False), 'Influential'] = 'Blood'
Expected output with the exact title of the journal:
Id Title Influential
1 Blood Blood
2 Blood Blood
3 Blood purification NA
4 Blood transfusion NA
5 Cancer Cancer
6 Chest Chest
7 Cancer History NA
8 Chest Analysis NA
Should I do it somehow via regex? Thanks.
>Solution :
If you want to set Influential column values with the values from Title column if the latter is an exact match of the words in your lst list, you can use
df = pd.DataFrame({'Id':[1,2,3,4,5,6,7,8], 'Title': ['Blood','Blood', 'Blood purification', 'Blood transfusion', 'Cancer', 'Chest', 'Cancer History', 'Chest Analysis']})
lst = ['Blood', 'Chest', 'Cancer']
df['Influential'] = np.where(df['Title'].isin(lst), df['Title'], np.nan)
# >>> df
# Id Title Influential
# 0 1 Blood Blood
# 1 2 Blood Blood
# 2 3 Blood purification NaN
# 3 4 Blood transfusion NaN
# 4 5 Cancer Cancer
# 5 6 Chest Chest
# 6 7 Cancer History NaN
# 7 8 Chest Analysis NaN
If you have a specific word like Blood and you want to set Influential column values with this word if the whole title text equals this word, you can use
df = pd.DataFrame({'Id':[1,2,3,4], 'Title': ['Blood','Blood', 'Blood purification', 'Blood transfusion']})
df['Influential'] = df.apply(lambda x: "Blood" if x['Title'] == 'Blood' else np.nan, axis=1)
# => >>> df
# Id Title Influential
# 0 1 Blood Blood
# 1 2 Blood Blood
# 2 3 Blood purification NaN
# 3 4 Blood transfusion NaN
If the Title column value is equal to Blood (see if x['Title'] == 'Blood'), the Influential column value is set to Blood, else, to np.nan.
Or, just use numpy.where (also suggested in the comments):
import numpy as np
#...
df['Influential'] = np.where(df['Title']=='Blood', df['Title'], np.nan)