Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Dropping rows from dataframe if they contain a string within a series

I’ve got a dataframe with one column called "text" that is a series of strings, ie. [Joe, Biden, Is, President]

I’m trying to drop every row that contains the word "Joe" in the column "text". To do this I wrote:

dfl[~dfl[‘text’].str.contains("Joe", na=False)]

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I thought this would work, but it’s just returning the full dataframe again.

I would also like to create a new dataframe with just the rows that contain "Joe" in the text column.
any help with that would be appreciated too!

To clarify, the table looks like:

index label score text ID
0 NEGATIVE 0.983319103717804 perrosloja,Expresoec,Es,del,partido,social,cristiano,la,lista,6,SenateGOP,SenateDems,JoeBiden,SenBobCasey,AsambleaEcuador,OEADDOT,ONUGeneve,ONUecuador,NoticiasONU,ilo,ONUes,Hay,2,bandos,de,mafias,izquierda,la,derecha,mafias,polc3adticas,ActualidadRT,TelemundoNews,CNNEE,soyfdelrincon,httpstcoYQnZsBKKdF 0
1 NEGATIVE 0.990364134311676 MolinaPvanya,JoeBiden,httpstcojAsJ08durF 1
2 NEGATIVE 0.8683468103408813 Iowa4Nikki,Whoes,Best,Person,PresidentnnDonald,Trump,Robert,F,Kennedy,Vivek,Ramaswamy,rest,presidential,candidates,except,Joe,Biden,good,candidates,due,respect,none,best,best,one,job,United,States,needse280a6 2
3 POSITIVE 0.999308705329895 amazing,JoeBiden,calls,Dick,f09fa4a3,httpstcopsV0uqG8aL 3
4 NEGATIVE 0.7860859036445618 ChrisDJackson,Whoes,Best,Person,PresidentnnDonald,Trump,Robert,F,Kennedy,Vivek,Ramaswamy,rest,presidential,candidates,except,Joe,Biden,good,candidates,due,respect,none,best,best,one,job,United,States,needse280a6 4
5 NEGATIVE 0.9982330799102783 PalBint,JoeBiden,much,blow,around,fake,White,House,probably,none 5
6 POSITIVE 0.842793345451355 thehill,e2809cJoeBiden,got,81million,votes,us,presidential,historye2809d,Even,Barack,Hussein,nnSo,wtf,Bolsheviks,afraid,actual,democracy,nf09fa4a3f09fa4a3f09fa4a3f09fa4a3f09fa4a3 6
7 NEGATIVE 0.998753547668457 tfbow,JimJordan,Weaponization,HouseGOP,JudiciaryGOP,HouseDemocrats,SenateGOP,SenateDems,MaineSenateGOP,2020,Election,Lawfully,CertifiablenJoe,Biden,Win,2020,ElectionnnThe,2022,AZ,Gubernatorial,Election,CertifiablenKatie,Hobbs,Win,2022,ElectionnnThe,Brunsons,Heroes,Others,Follow,LitigationnnDry,WeepyEye 7
8 NEGATIVE 0.9963979721069336 JoeBiden,Hows,youre,boss,httpstcogfuDElJdG1 8
9 NEGATIVE 0.9973702430725098 mitchellvii,Joe,Biden,committed,treason,intentionally,ensuring,stream,illegal,immigrants,remains,unhindered,least,take,ballot 9
10 NEGATIVE 0.9728578925132751 mirandadevine,DavidHo71155831,JoeBiden,BarackObama,AliMayorkas,SecBlinken,belong,prison 10

Here is the output of : df.loc[0, ‘text’]

[‘perrosloja’,
‘Expresoec’,
‘Es’,
‘del’,
‘partido’,
‘social’,
‘cristiano’,
‘la’,
‘lista’,
‘6’,
‘SenateGOP’,
‘SenateDems’,
‘JoeBiden’,
‘SenBobCasey’,
‘AsambleaEcuador’,
‘OEADDOT’,
‘ONUGeneve’,
‘ONUecuador’,
‘NoticiasONU’,
‘ilo’,
‘ONUes’,
‘Hay’,
‘2’,
‘bandos’,
‘de’,
‘mafias’,
‘izquierda’,
‘la’,
‘derecha’,
‘mafias’,
‘polc3adticas’,
‘ActualidadRT’,
‘TelemundoNews’,
‘CNNEE’,
‘soyfdelrincon’,
‘httpstcoYQnZsBKKdF’]

So I guess yes it is a list of strings

>Solution :

IIUC, each row contains a list of words. You can try:

m = df.loc[df['text'].notna(), 'text'].map(' '.join).str.contains('Joe', case=False)

joe = df.loc[m[m].index]

The code above concatenate all words to make a sentence then use str.contains to find the word ‘Joe` (case insensitive)

>>> df['text'].map(' '.join)
0    Joe Biden Sent Spinning Federal Court
1                           Election Cycle
2           Donald Trump running next year
Name: text, dtype: object

>>> df['text'].map(' '.join).str.contains('Joe', case=False)
0     True
1    False
2    False
Name: text, dtype: bool

Output:

>>> joe
                                           text
0  [Joe, Biden, Sent, Spinning, Federal, Court]

Details:

Input:

>>> df
                                           text
0  [Joe, Biden, Sent, Spinning, Federal, Court]
1                             [Election, Cycle]
2          [Donald, Trump, running, next, year]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading