Home Create 'correlation matrix' for two lists to check if the values have something in common

Questions

Create 'correlation matrix' for two lists to check if the values have something in common

December 3, 2021

Could someone please help me out with the following?

I have one dataframe with two columns: products and webshops (n x 2) with n products. Now I would like to obtain a binary (n x n) matrix with all products listed as the indices and all products listed as the column names. Then each cell should contain a 1 or 0 denoting whether the product in the index and column name came from the same webshop.

The following code is returning what I would like to achieve.

dist = np.empty((len(df_title), len(df_title)), int)

for i in range(0,len(df_title)):
    for j in range(0,len(df_title)):
            boolean = df_title.values[i][1] == df_title.values[j][1]
            dist[i][j] = boolean  
df = pd.DataFrame(dist)

However, this code takes quite a significant time already for n = 1624. Therefore I was wondering if someone would have an idea for a faster algorithm.

Thanks!

>Solution :

It seems like you’re only interested in the element at position 1 for every column anyways, so creating a temp-variable for easier lookup could help:

lookup = df_title.values[:, 1]

Also since you want to interpret the resulting matrix as bool-matrix, you should probably specify dtype=bool (1 byte per field) instead of dtype=int (8 bytes per field), which also cuts down memory consumption by 8.

dist = np.empty((len(df_title), len(df_title)), dtype=bool)

Your matrix will be symmetric along the diagonal anyways, so you only need to compute "half" of the matrix, also if i == j we know the corresponding field in the matrix should be True.

lookup = df_title.values[:, 1]
dist = np.empty((len(df_title), len(df_title)), dtype=bool)

for i in range(len(df_title)):
    for j in range(len(df_title)):
        if i == j:
            # diagonal
            dist[i, j] = True
        else:
            # symmetric along diagonal
            dist[i, j] = dist[j, i] = lookup[i] == lookup[j]

Also using numpy-broadcasting you could actually transform all of that into a single line of code, that is orders of magnitude faster than the double-for-loop solution:

lookup = df_title.values[:, 1]
dist = lookup[None, :] == lookup[:, None]

nested

byMR

Published December 03, 2021

Add a comment

php – html: Cant load a date value in form

byMR

December 3, 2021

Questions

How to loop with R string Python

byMR

December 3, 2021

Questions

argparse required argument list competes with optional argument list

byMR

December 3, 2021

Questions

How to send-mailmessage from displayname?

byMR

December 3, 2021

Questions

In odd/even question if we provide very large value (like up to 12 digits ) then there are error in program;

byMR

December 3, 2021

Questions

Excel IF ISNA Vlookup

byMR

December 3, 2021

Create 'correlation matrix' for two lists to check if the values have something in common

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Like this:

Leave a ReplyCancel reply

Read more

php – html: Cant load a date value in form

How to loop with R string Python

argparse required argument list competes with optional argument list

How to send-mailmessage from displayname?

In odd/even question if we provide very large value (like up to 12 digits ) then there are error in program;

Excel IF ISNA Vlookup

Keep Up to Date with the Most Important News

Create 'correlation matrix' for two lists to check if the values have something in common

MEDevel.com: Open-source for Healthcare and Education

>Solution :

Share this:

Like this:

Leave a ReplyCancel reply

Keep Up to Date with the Most Important News

Read more

php – html: Cant load a date value in form

How to loop with R string Python

argparse required argument list competes with optional argument list

How to send-mailmessage from displayname?

In odd/even question if we provide very large value (like up to 12 digits ) then there are error in program;

Excel IF ISNA Vlookup

Discover more from Dev solutions