I’ve got a dataframe with one column called "text" that is a series of strings, ie. [Joe, Biden, Is, President]
I’m trying to drop every row that contains the word "Joe" in the column "text". To do this I wrote:
dfl[~dfl[‘text’].str.contains("Joe", na=False)]
I thought this would work, but it’s just returning the full dataframe again.
I would also like to create a new dataframe with just the rows that contain "Joe" in the text column.
any help with that would be appreciated too!
To clarify, the table looks like:
| index | label | score | text | ID |
|---|---|---|---|---|
| 0 | NEGATIVE | 0.983319103717804 | perrosloja,Expresoec,Es,del,partido,social,cristiano,la,lista,6,SenateGOP,SenateDems,JoeBiden,SenBobCasey,AsambleaEcuador,OEADDOT,ONUGeneve,ONUecuador,NoticiasONU,ilo,ONUes,Hay,2,bandos,de,mafias,izquierda,la,derecha,mafias,polc3adticas,ActualidadRT,TelemundoNews,CNNEE,soyfdelrincon,httpstcoYQnZsBKKdF | 0 |
| 1 | NEGATIVE | 0.990364134311676 | MolinaPvanya,JoeBiden,httpstcojAsJ08durF | 1 |
| 2 | NEGATIVE | 0.8683468103408813 | Iowa4Nikki,Whoes,Best,Person,PresidentnnDonald,Trump,Robert,F,Kennedy,Vivek,Ramaswamy,rest,presidential,candidates,except,Joe,Biden,good,candidates,due,respect,none,best,best,one,job,United,States,needse280a6 | 2 |
| 3 | POSITIVE | 0.999308705329895 | amazing,JoeBiden,calls,Dick,f09fa4a3,httpstcopsV0uqG8aL | 3 |
| 4 | NEGATIVE | 0.7860859036445618 | ChrisDJackson,Whoes,Best,Person,PresidentnnDonald,Trump,Robert,F,Kennedy,Vivek,Ramaswamy,rest,presidential,candidates,except,Joe,Biden,good,candidates,due,respect,none,best,best,one,job,United,States,needse280a6 | 4 |
| 5 | NEGATIVE | 0.9982330799102783 | PalBint,JoeBiden,much,blow,around,fake,White,House,probably,none | 5 |
| 6 | POSITIVE | 0.842793345451355 | thehill,e2809cJoeBiden,got,81million,votes,us,presidential,historye2809d,Even,Barack,Hussein,nnSo,wtf,Bolsheviks,afraid,actual,democracy,nf09fa4a3f09fa4a3f09fa4a3f09fa4a3f09fa4a3 | 6 |
| 7 | NEGATIVE | 0.998753547668457 | tfbow,JimJordan,Weaponization,HouseGOP,JudiciaryGOP,HouseDemocrats,SenateGOP,SenateDems,MaineSenateGOP,2020,Election,Lawfully,CertifiablenJoe,Biden,Win,2020,ElectionnnThe,2022,AZ,Gubernatorial,Election,CertifiablenKatie,Hobbs,Win,2022,ElectionnnThe,Brunsons,Heroes,Others,Follow,LitigationnnDry,WeepyEye | 7 |
| 8 | NEGATIVE | 0.9963979721069336 | JoeBiden,Hows,youre,boss,httpstcogfuDElJdG1 | 8 |
| 9 | NEGATIVE | 0.9973702430725098 | mitchellvii,Joe,Biden,committed,treason,intentionally,ensuring,stream,illegal,immigrants,remains,unhindered,least,take,ballot | 9 |
| 10 | NEGATIVE | 0.9728578925132751 | mirandadevine,DavidHo71155831,JoeBiden,BarackObama,AliMayorkas,SecBlinken,belong,prison | 10 |
Here is the output of : df.loc[0, ‘text’]
[‘perrosloja’,
‘Expresoec’,
‘Es’,
‘del’,
‘partido’,
‘social’,
‘cristiano’,
‘la’,
‘lista’,
‘6’,
‘SenateGOP’,
‘SenateDems’,
‘JoeBiden’,
‘SenBobCasey’,
‘AsambleaEcuador’,
‘OEADDOT’,
‘ONUGeneve’,
‘ONUecuador’,
‘NoticiasONU’,
‘ilo’,
‘ONUes’,
‘Hay’,
‘2’,
‘bandos’,
‘de’,
‘mafias’,
‘izquierda’,
‘la’,
‘derecha’,
‘mafias’,
‘polc3adticas’,
‘ActualidadRT’,
‘TelemundoNews’,
‘CNNEE’,
‘soyfdelrincon’,
‘httpstcoYQnZsBKKdF’]
So I guess yes it is a list of strings
>Solution :
IIUC, each row contains a list of words. You can try:
m = df.loc[df['text'].notna(), 'text'].map(' '.join).str.contains('Joe', case=False)
joe = df.loc[m[m].index]
The code above concatenate all words to make a sentence then use str.contains to find the word ‘Joe` (case insensitive)
>>> df['text'].map(' '.join)
0 Joe Biden Sent Spinning Federal Court
1 Election Cycle
2 Donald Trump running next year
Name: text, dtype: object
>>> df['text'].map(' '.join).str.contains('Joe', case=False)
0 True
1 False
2 False
Name: text, dtype: bool
Output:
>>> joe
text
0 [Joe, Biden, Sent, Spinning, Federal, Court]
Details:
Input:
>>> df
text
0 [Joe, Biden, Sent, Spinning, Federal, Court]
1 [Election, Cycle]
2 [Donald, Trump, running, next, year]