Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Comparing two dataframes of different lengths

I have 2 dataframes of different lengths –

len(df1) = 2400
len(df2) = 100

df1 =>

colA  colB  colC
0     1     2   
3     4     5 
6     7     8  
.
.
.
2400 rows.

df2 (number of rows is a factor (1/24) of num_rows in df1) =>

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

colD  colE  colF
10     11     12    
13     14     15  
.
.
.
100 rows

Currently I get following expected error since the lengths are different , All good here. ->

comparison –

df1['colB'] > df2['colD']

Error –

ValueError: ('Lengths must match to compare', (2400,), (100,))

Requirement ->

I want to perform this comparison in a way that consecutive 24rows in df1 get compared to 1 row in df2 to get rid of this error

(row1…row24 in df1 compared with row1 in df2)

(row25..row48 in df1 compared with row2 in df2)

and so on… Is there a way to do that ?

PS – Comparison is to be done between 2 specific columns of these dfs as shown above -> colB and colD

One way I could think of is copying the same rows 24 times in df2 and populating till 2400 rows. But I’m not sure how to do that as well since new to dataframes and numpy.

>Solution :

You can repeat your df2 24 times like this & do comparison;

df2_repeated = df2.loc[df2.index.repeat(24)]
df2_repeated.index = range(0,df2_repeated.shape[0])
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading