Dropping rows from dataframe if they contain a string within a series

byMR

January 3, 2024

I’ve got a dataframe with one column called "text" that is a series of strings, ie. [Joe, Biden, Is, President]

I’m trying to drop every row that contains the word "Joe" in the column "text". To do this I wrote:

dfl[~dfl[‘text’].str.contains("Joe", na=False)]

I thought this would work, but it’s just returning the full dataframe again.

I would also like to create a new dataframe with just the rows that contain "Joe" in the text column.
any help with that would be appreciated too!

To clarify, the table looks like:

index	label	score	text	ID
0	NEGATIVE	0.983319103717804	perrosloja,Expresoec,Es,del,partido,social,cristiano,la,lista,6,SenateGOP,SenateDems,JoeBiden,SenBobCasey,AsambleaEcuador,OEADDOT,ONUGeneve,ONUecuador,NoticiasONU,ilo,ONUes,Hay,2,bandos,de,mafias,izquierda,la,derecha,mafias,polc3adticas,ActualidadRT,TelemundoNews,CNNEE,soyfdelrincon,httpstcoYQnZsBKKdF	0
1	NEGATIVE	0.990364134311676	MolinaPvanya,JoeBiden,httpstcojAsJ08durF	1
2	NEGATIVE	0.8683468103408813	Iowa4Nikki,Whoes,Best,Person,PresidentnnDonald,Trump,Robert,F,Kennedy,Vivek,Ramaswamy,rest,presidential,candidates,except,Joe,Biden,good,candidates,due,respect,none,best,best,one,job,United,States,needse280a6	2
3	POSITIVE	0.999308705329895	amazing,JoeBiden,calls,Dick,f09fa4a3,httpstcopsV0uqG8aL	3
4	NEGATIVE	0.7860859036445618	ChrisDJackson,Whoes,Best,Person,PresidentnnDonald,Trump,Robert,F,Kennedy,Vivek,Ramaswamy,rest,presidential,candidates,except,Joe,Biden,good,candidates,due,respect,none,best,best,one,job,United,States,needse280a6	4
5	NEGATIVE	0.9982330799102783	PalBint,JoeBiden,much,blow,around,fake,White,House,probably,none	5
6	POSITIVE	0.842793345451355	thehill,e2809cJoeBiden,got,81million,votes,us,presidential,historye2809d,Even,Barack,Hussein,nnSo,wtf,Bolsheviks,afraid,actual,democracy,nf09fa4a3f09fa4a3f09fa4a3f09fa4a3f09fa4a3	6
7	NEGATIVE	0.998753547668457	tfbow,JimJordan,Weaponization,HouseGOP,JudiciaryGOP,HouseDemocrats,SenateGOP,SenateDems,MaineSenateGOP,2020,Election,Lawfully,CertifiablenJoe,Biden,Win,2020,ElectionnnThe,2022,AZ,Gubernatorial,Election,CertifiablenKatie,Hobbs,Win,2022,ElectionnnThe,Brunsons,Heroes,Others,Follow,LitigationnnDry,WeepyEye	7
8	NEGATIVE	0.9963979721069336	JoeBiden,Hows,youre,boss,httpstcogfuDElJdG1	8
9	NEGATIVE	0.9973702430725098	mitchellvii,Joe,Biden,committed,treason,intentionally,ensuring,stream,illegal,immigrants,remains,unhindered,least,take,ballot	9
10	NEGATIVE	0.9728578925132751	mirandadevine,DavidHo71155831,JoeBiden,BarackObama,AliMayorkas,SecBlinken,belong,prison	10

Here is the output of : df.loc[0, ‘text’]

[‘perrosloja’,
‘Expresoec’,
‘Es’,
‘del’,
‘partido’,
‘social’,
‘cristiano’,
‘la’,
‘lista’,
‘6’,
‘SenateGOP’,
‘SenateDems’,
‘JoeBiden’,
‘SenBobCasey’,
‘AsambleaEcuador’,
‘OEADDOT’,
‘ONUGeneve’,
‘ONUecuador’,
‘NoticiasONU’,
‘ilo’,
‘ONUes’,
‘Hay’,
‘2’,
‘bandos’,
‘de’,
‘mafias’,
‘izquierda’,
‘la’,
‘derecha’,
‘mafias’,
‘polc3adticas’,
‘ActualidadRT’,
‘TelemundoNews’,
‘CNNEE’,
‘soyfdelrincon’,
‘httpstcoYQnZsBKKdF’]

So I guess yes it is a list of strings

>Solution :

IIUC, each row contains a list of words. You can try:

m = df.loc[df['text'].notna(), 'text'].map(' '.join).str.contains('Joe', case=False)

joe = df.loc[m[m].index]

The code above concatenate all words to make a sentence then use str.contains to find the word ‘Joe` (case insensitive)

>>> df['text'].map(' '.join)
0    Joe Biden Sent Spinning Federal Court
1                           Election Cycle
2           Donald Trump running next year
Name: text, dtype: object

>>> df['text'].map(' '.join).str.contains('Joe', case=False)
0     True
1    False
2    False
Name: text, dtype: bool

Output:

>>> joe
                                           text
0  [Joe, Biden, Sent, Spinning, Federal, Court]

Details:

Input:

>>> df
                                           text
0  [Joe, Biden, Sent, Spinning, Federal, Court]
1                             [Election, Cycle]
2          [Donald, Trump, running, next, year]