I have 2 dataframes, A and B, which contain different kinds of data. Dataframe A consists of multiple different measurements of an object which where taken at the same time as well as the corresponding timestamp. So a single timestamp will have multiple entries. Dataframe B contains additional data with timestamps that overlap with A. Similar to A the timestamps here also contain multiple entries for the same timestamp but not the same amount as A. B also has more rows than A.
I am interested in assigning a specific value from B to A where the timestamps match, creating a new column for A. For the same timestamp that value is going to be consistent.
My first instinct was to simply use the merge function, however this creates a much larger dataframe caused by the duplicate timestamps. To do that efficiently I extracted only the timestamps from A and the timestamps as well as the value from B.
merge = pd.DataFrame(A['time']).merge(B[['time', 'value']], on='time', how='left')
A['value'] = merge['value']
Is there a different method that takes the duplicate timestamps into consideration and handles them in the way I need it to?
EDIT: Here is a simplified example of how the data could look like. In this case I would like to assign Value1 to all occurences of ‘20.08.2023 20:00’ and Value2 to ‘21.08.2023 21:00’ in a new column in A.
Simplified description of the data structure
>Solution :
My assumption is that you have two dataframes, A and B, in which A has the following format:
time | measurement_1 | measurement_2 |
---|---|---|
1 | .5 | .4 |
1 | .3 | 4 |
and B the format below
time | value | value_2 |
---|---|---|
1 | medium | short |
1 | medium | tall |
1 | medium | short |
The first thing I would look at from a data science perspective, I do not understand why your B will have anything but a single row assigned to your B timestamp (unless if the format is simply timestamp | value where the values have not been split into segregate columns) nor your dataframe A. But, my suggestion would be as follows:
C = A.merge(B[['time', 'value']].drop_duplicates(), on='time', how='left')
The result is the dataframe below:
time | measurement_1 | measurement_2 | value |
---|---|---|---|
1 | 0.5 | 0.4 | medium |
1 | 0.3 | 4 | medium |