I have 2 dataframes, A and B, which contain different kinds of data. Dataframe A consists of multiple different measurements of an object which where taken at the same time as well as the corresponding timestamp. So a single timestamp will have multiple entries. Dataframe B contains additional data with timestamps that overlap with A. Similar to A the timestamps here also contain multiple entries for the same timestamp but not the same amount as A. B also has more rows than A.
I am interested in assigning a specific value from B to A where the timestamps match, creating a new column for A. For the same timestamp that value is going to be consistent.
My first instinct was to simply use the merge function, however this creates a much larger dataframe caused by the duplicate timestamps. To do that efficiently I extracted only the timestamps from A and the timestamps as well as the value from B.
merge = pd.DataFrame(A['time']).merge(B[['time', 'value']], on='time', how='left') A['value'] = merge['value']
Is there a different method that takes the duplicate timestamps into consideration and handles them in the way I need it to?
EDIT: Here is a simplified example of how the data could look like. In this case I would like to assign Value1 to all occurences of ‘20.08.2023 20:00’ and Value2 to ‘21.08.2023 21:00’ in a new column in A.
My assumption is that you have two dataframes, A and B, in which A has the following format:
and B the format below
The first thing I would look at from a data science perspective, I do not understand why your B will have anything but a single row assigned to your B timestamp (unless if the format is simply timestamp | value where the values have not been split into segregate columns) nor your dataframe A. But, my suggestion would be as follows:
C = A.merge(B[['time', 'value']].drop_duplicates(), on='time', how='left')
The result is the dataframe below: