Home Finding first intersections in each row pandas dataframe

Questions

Finding first intersections in each row pandas dataframe

June 20, 2022

I have a dataframe:

import pandas as pd
data =[[28, ['first'], 'apple edible', 23, 'apple is an edible fruit'],
 [28, ['first'], 'apple edible', 34, 'fruit produced by an apple tree'],
 [28, ['first'], 'apple edible', 39, 'the apple is a pome edible fruit'],
 [21, ['second'], 'green plants', 11, 'plants are green'],
 [21, ['second'], 'green plants', 7, 'plant these perennial green flowers']]
df = pd.DataFrame(data, columns=['day', 'group',  'bigram', 'count', 'sentence'])
+---+--------+------------+-----+-----------------------------------+
|day|group   |bigram      |count|sentence                           |
+---+--------+------------+-----+-----------------------------------+
|28 |[first] |apple edible|23   |apple is an edible fruit           |
|28 |[first] |apple edible|34   |fruit produced by an apple tree    |
|28 |[first] |apple edible|39   |the apple is a pome edible fruit   |
|21 |[second]|green plants|11   |plants are green                   |
|21 |[second]|green plants|7    |plant these perennial green flowers|
+---+--------+------------+-----+-----------------------------------+

I need to find intersections of bigram with a sentence. Moreover, find the first intersection and mark it True.
That is, after the first intersection, the remaining intersections will already be marked as False. Word order is not important.

So I would like this result:

+---+--------+------------+-----+--------------------------------+--------+
|day|group   |bigram      |count|sentence                        |        |
+---+--------+------------+-----+--------------------------------+--------+
|28 |[first] |apple edible|23   |apple is an edible fruit        |True    |
|28 |[first] |apple edible|34   |fruit produced by an apple tree |False   |
|28 |[first] |apple edible|39   |the apple is a pome edible fruit|False   |
|21 |[second]|green plants|11   |plant these perennial flowers   |False   |
|21 |[second]|green plants|7    |plants are green                |True    |
+---+--------+------------+-----+--------------------------------+--------+

>Solution :

First test all intersection by converted splitted values to sets with issubset and then select only first Trues per bigram:

df['new'] = [set(b.split()).issubset(a.split()) for a,b in zip(df['sentence'],df['bigram'])]
df['new'] = ~df.duplicated(['bigram','new']) & df['new']
print (df)
   day     group        bigram  count                             sentence  \
0   28   [first]  apple edible     23             apple is an edible fruit   
1   28   [first]  apple edible     34      fruit produced by an apple tree   
2   28   [first]  apple edible     39     the apple is a pome edible fruit   
3   21  [second]  green plants     11                     plants are green   
4   21  [second]  green plants      7  plant these perennial green flowers   

     new  
0   True  
1  False  
2  False  
3   True  
4  False

If order in bigram should be swapped and need first intersection use:

df['new'] = ~df.assign(bigram=df['bigram'].apply(lambda x: frozenset(x.split()))).duplicated(['bigram','new']) & df['new']

function

byMR

Published June 20, 2022

Add a comment

How to change flutter text color for dark and light theme?

byMR

June 20, 2022

Questions

Insert data with TextInput value in React Native, not able to send default values to API

byMR

June 20, 2022

Questions

Why is the document.onreadystatechange function executed in advance?

byMR

June 20, 2022

Questions

Getting values from radios, how to optimize it?

byMR

June 20, 2022

Questions

Add timedelta to date column based on conditions

byMR

June 20, 2022

Questions

Deleting rows that are duplicated in one column based on value in another column

byMR

June 20, 2022

Finding first intersections in each row pandas dataframe

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

How to change flutter text color for dark and light theme?

Insert data with TextInput value in React Native, not able to send default values to API

Why is the document.onreadystatechange function executed in advance?

Getting values from radios, how to optimize it?

Add timedelta to date column based on conditions

Deleting rows that are duplicated in one column based on value in another column

Keep Up to Date with the Most Important News

Finding first intersections in each row pandas dataframe

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

How to change flutter text color for dark and light theme?

Insert data with TextInput value in React Native, not able to send default values to API

Why is the document.onreadystatechange function executed in advance?

Getting values from radios, how to optimize it?

Add timedelta to date column based on conditions

Deleting rows that are duplicated in one column based on value in another column

Discover more from Dev solutions