Home Merge pandas dataframes while keeping duplicate entries

Questions

Merge pandas dataframes while keeping duplicate entries

August 25, 2023

I have 2 dataframes, A and B, which contain different kinds of data. Dataframe A consists of multiple different measurements of an object which where taken at the same time as well as the corresponding timestamp. So a single timestamp will have multiple entries. Dataframe B contains additional data with timestamps that overlap with A. Similar to A the timestamps here also contain multiple entries for the same timestamp but not the same amount as A. B also has more rows than A.

I am interested in assigning a specific value from B to A where the timestamps match, creating a new column for A. For the same timestamp that value is going to be consistent.

My first instinct was to simply use the merge function, however this creates a much larger dataframe caused by the duplicate timestamps. To do that efficiently I extracted only the timestamps from A and the timestamps as well as the value from B.

merge = pd.DataFrame(A['time']).merge(B[['time', 'value']], on='time', how='left')
A['value'] = merge['value']

Is there a different method that takes the duplicate timestamps into consideration and handles them in the way I need it to?

EDIT: Here is a simplified example of how the data could look like. In this case I would like to assign Value1 to all occurences of ‘20.08.2023 20:00’ and Value2 to ‘21.08.2023 21:00’ in a new column in A.

Simplified description of the data structure

>Solution :

My assumption is that you have two dataframes, A and B, in which A has the following format:

time	measurement_1	measurement_2
1	.5	.4
1	.3	4

and B the format below

time	value	value_2
1	medium	short
1	medium	tall
1	medium	short

The first thing I would look at from a data science perspective, I do not understand why your B will have anything but a single row assigned to your B timestamp (unless if the format is simply timestamp | value where the values have not been split into segregate columns) nor your dataframe A. But, my suggestion would be as follows:

C = A.merge(B[['time', 'value']].drop_duplicates(), on='time', how='left')

The result is the dataframe below:

time	measurement_1	measurement_2	value
1	0.5	0.4	medium
1	0.3	4	medium

dataframe

byMR

Published August 25, 2023

Add a comment

google sheet vlookup function analysis error

byMR

August 25, 2023

Questions

'CreateThread' crashes with more than 2 threads

byMR

August 25, 2023

Questions

Adding values from a JavaScript object as classes on an element

byMR

August 25, 2023

Questions

revere array object based on year and month

byMR

August 25, 2023

Questions

C# type casting isn't checked when a function takes a dynamic object as input parameter

byMR

August 25, 2023

Questions

Comparing matching values randomly picked from a list

byMR

August 25, 2023

Merge pandas dataframes while keeping duplicate entries

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

google sheet vlookup function analysis error

'CreateThread' crashes with more than 2 threads

Adding values from a JavaScript object as classes on an element

revere array object based on year and month

C# type casting isn't checked when a function takes a dynamic object as input parameter

Comparing matching values randomly picked from a list

Keep Up to Date with the Most Important News

Merge pandas dataframes while keeping duplicate entries

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

google sheet vlookup function analysis error

'CreateThread' crashes with more than 2 threads

Adding values from a JavaScript object as classes on an element

revere array object based on year and month

C# type casting isn't checked when a function takes a dynamic object as input parameter

Comparing matching values randomly picked from a list

Discover more from Dev solutions