Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

return indexes of all samples in python

I am beginner in python and have this data frame data that contains samples, values, and cluster numbers for each sample

df = pd.DataFrame({'samples': ['A', 'B', 'C', 'D', 'E'],
                   'values': [ 0.336663,0.447101,0.402529,0.373014,0.456226],
                   'cluster': [1, 0, 2, 0, 1]})
df

output:

    samples values  cluster
0   A   0.336663    1
1   B   0.447101    0
2   C   0.402529    2
3   D   0.373014    0
4   E   0.456226    1

in the following code, it return the max value sample of each cluster. for example for cluster 0, B has the max value among other samples (her B and D). So, it returns the index value for B which is 1, same for cluster 1, we have A and E, and E has max value, so the E index has return, here 4 and etc.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

value = [] #list to store the max values
max_value = [] #list to store the max values
clust_max = [] #list to store cluster max
#loop to get the cluster value

tmp=df['values']
clust_labels=df['cluster']
clusters=len(list(set(clust_labels)))

for j in range(clusters):
    elems = [i for i, x in enumerate(clust_labels) if x == j] #get samples of cluster k
    values = [tmp[elem] for elem in elems] #get values for the sample
    max_value_temp = max(values) #get the max value
    max_value.append(max_value_temp) #store the max value
    max_ind = values.index(max_value_temp) #get the sample with max value
    clust_max.append(elems[max_ind]) #store the max value sample

output:

[1, 4, 2]

Want to update this code to return all sample indexes, not only the max values of each cluster.

The expected output:

[0, 1, 2, 3, 4]

>Solution :

I dont really get why you are using a java logic to work with pyhton, probably as mentioned you still new to it. I didnt quiet get what do you expect from the output so I did something according to what I understood.

dfc = pd.DataFrame({'samples': ['A', 'B', 'C', 'D', 'E'],
                   'values': [ 0.336663,0.447101,0.402529,0.373014,0.456226],
                   'cluster': [1, 0, 2, 0, 1]})

#get max values by cluster usign groupby
dfmax = dfc.groupby(['cluster']).max()

#insert index as a column using groupby and idxmax function
dfmax['idx'] = dfc.groupby(['cluster']).idxmax()

#you can sort values by two columns in this case values and cluster, or viceversa if you prefer which is a kinda groupby
#you are using java logic and you dont need it in pyhton, there is a pythonic way to code within python
dfsorted = dfc.sort_values(['values','cluster'], ascending=False)
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading