Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Pass the values from a with indexes and values dataset to a sparse Numpy array

I want to make a sparse numpy array using the indexes and values stored in a pandas DataSet

The dataset has ‘userIndex’, ‘movieIndex’ and ‘rating’ with a million rows

For example:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

movieIndex userIndex rating
0 0 4 2.5
1 2 2 3.0
2 1 1 4.0
3 2 0 4.0
4 4 2 3.0

Would be transformed to a numpy array like this:

[[0 0 0 0 2.5],
[0 4.0 0 0 0],
[4.0 0 3.0 0 0],
[0 0 0 0 0],
[0 0 3.0 0 0]]

So, first I’m making a np.zeros array with the correct size:

Y = np.zeros([nm,nu])

And for now, I’m passing the information as:

for i in range(len(ratings)):
  Y[int(ratings.iloc[i].movieIndex),int(ratings.iloc[i].userIndex)]
    = ratings.iloc[i].rating

And it works just fine with O(n), so it’s not really bad but it takes 3 minutes to do so.
I know it’s not a good idea to use "for" in a dataset, and I should use the vector functions to do it, but I can’t find a way to make this work. Any ideas?

>Solution :

Maybe it will work faster:

Y[ratings["movieIndex"].values, ratings["userIndex"].values] = ratings["rating"].values
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading