Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

return the location of non null values

I have a dataframe that looks like this:

    0   1   2   3   4   5
0   NaN NaN 7.0 NaN NaN NaN
1   NaN NaN 9.0 NaN NaN NaN
2   5.0 NaN 3.0 NaN 9.0 NaN
3   NaN NaN NaN NaN NaN NaN
4   NaN NaN NaN NaN NaN 1.0

I am trying to return the location of non null values.
For e.g. 7.0 is in the first row and second column (0-2)

expected = ["0-2", "1-2", "2-0", "2-2", "2-4", "4-5"]

Dataframe:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

mylist=[[np.nan, np.nan, 7, np.nan, np.nan, np.nan],[np.nan, np.nan, 9, np.nan, np.nan, np.nan],[5, np.nan, 3, np.nan, 9, np.nan],[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],[np.nan, np.nan,np.nan, np.nan,np.nan, 1]]

df = pd.DataFrame(mylist)

Update:

I am getting duplicate rows in the list. For e.g. 34-35 is same as 35-34

out = ['34-35', '35-34',
 '41-42', '42-41',
 '46-47', '47-46',
 '59-63', '63-59',
 '75-76', '76-75',
 '87-88', '88-87']

I need to remove the duplicates and get the unique values like:

expected = ['34-35', '41-42', '46-47', '59-63', '75-76', '87-88']

>Solution :

Use list comprehension with DataFrame.stack:

out = [f'{i}-{c}' for i, c in df.stack().index]
print (out)
['0-2', '1-2', '2-0', '2-2', '2-4', '4-5']

Or numpy.where for indices;

Solution if columns and index are default RangeIndex:

ro, co = np.where(df.notna())
out = [f'{i}-{c}' for i, c in zip(ro, co)]
print (out)
['0-2', '1-2', '2-0', '2-2', '2-4', '4-5']

If not, use indexing:

df = df.rename(index = lambda x: f'i{x}', columns = lambda x: f'c{x}')
print (df)
     c0  c1   c2  c3   c4   c5
i0  NaN NaN  7.0 NaN  NaN  NaN
i1  NaN NaN  9.0 NaN  NaN  NaN
i2  5.0 NaN  3.0 NaN  9.0  NaN
i3  NaN NaN  NaN NaN  NaN  NaN
i4  NaN NaN  NaN NaN  NaN  1.0

ro, co = np.where(df.notna())
out = [f'{i}-{c}' for i, c in zip(df.index[ro], df.columns[co])]
print (out)
['i0-c2', 'i1-c2', 'i2-c0', 'i2-c2', 'i2-c4', 'i4-c5']

EDIT: If need remove sorted duplicates:

out = pd.unique(['-'.join(map(str, sorted(x))) for x in df.stack().index]).tolist()
print (out)
['0-2', '1-2', '2-2', '2-4', '4-5']

ro, co = np.where(df.notna())
out = pd.unique(['-'.join(map(str, sorted(x))) 
                 for x in zip(df.index[ro], df.columns[co])]).tolist()
print (out)
['0-2', '1-2', '2-2', '2-4', '4-5']
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading