i am writing to filter some code from a dataframe.
students = [('jack', 34, 'Sydeny', 'Australia'),
('Riti', 30, 'Delhi', 'India'),
('Vikas', 31, 'Mumbai', 'India'),
('Neelu', 32, 'Bangalore', 'India'),
('John', 16, 'New York', 'US'),
('Mike', 17, 'las vegas', 'US')]
df = pd.DataFrame( students,
columns=['Name', 'Age', 'City', 'Country'],
index=['a', 'b', 'c', 'd', 'e', 'f'])
i am trying to filter records for which country starts with ‘I’. When i try to run this
print(df.loc[lambda x:np.char.startswith(x['Country'],'I')])
it says
string operation on non-string array
Even tried converting the column to string with
df.astype({'Country':str})
pl update what is the mistake i am making
>Solution :
Use str accessor:
>>> df[df['Country'].str.startswith('I')]
Name Age City Country
b Riti 30 Delhi India
c Vikas 31 Mumbai India
d Neelu 32 Bangalore India
# OR df[df['Country'].str[0] == 'I']
You can read Testing for strings that match or contain a pattern to know more.
Update
To fix your code, use:
>>> df[df['Country'].apply(lambda x: np.char.startswith(x, 'I'))]
Name Age City Country
b Riti 30 Delhi India
c Vikas 31 Mumbai India
d Neelu 32 Bangalore India
but it’s clearly not efficient.