return the location of non null values

Advertisements

I have a dataframe that looks like this:

    0   1   2   3   4   5
0   NaN NaN 7.0 NaN NaN NaN
1   NaN NaN 9.0 NaN NaN NaN
2   5.0 NaN 3.0 NaN 9.0 NaN
3   NaN NaN NaN NaN NaN NaN
4   NaN NaN NaN NaN NaN 1.0

I am trying to return the location of non null values.
For e.g. 7.0 is in the first row and second column (0-2)

expected = ["0-2", "1-2", "2-0", "2-2", "2-4", "4-5"]

Dataframe:

mylist=[[np.nan, np.nan, 7, np.nan, np.nan, np.nan],[np.nan, np.nan, 9, np.nan, np.nan, np.nan],[5, np.nan, 3, np.nan, 9, np.nan],[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],[np.nan, np.nan,np.nan, np.nan,np.nan, 1]]

df = pd.DataFrame(mylist)

Update:

I am getting duplicate rows in the list. For e.g. 34-35 is same as 35-34

out = ['34-35', '35-34',
 '41-42', '42-41',
 '46-47', '47-46',
 '59-63', '63-59',
 '75-76', '76-75',
 '87-88', '88-87']

I need to remove the duplicates and get the unique values like:

expected = ['34-35', '41-42', '46-47', '59-63', '75-76', '87-88']

>Solution :

Use list comprehension with DataFrame.stack:

out = [f'{i}-{c}' for i, c in df.stack().index]
print (out)
['0-2', '1-2', '2-0', '2-2', '2-4', '4-5']

Or numpy.where for indices;

Solution if columns and index are default RangeIndex:

ro, co = np.where(df.notna())
out = [f'{i}-{c}' for i, c in zip(ro, co)]
print (out)
['0-2', '1-2', '2-0', '2-2', '2-4', '4-5']

If not, use indexing:

df = df.rename(index = lambda x: f'i{x}', columns = lambda x: f'c{x}')
print (df)
     c0  c1   c2  c3   c4   c5
i0  NaN NaN  7.0 NaN  NaN  NaN
i1  NaN NaN  9.0 NaN  NaN  NaN
i2  5.0 NaN  3.0 NaN  9.0  NaN
i3  NaN NaN  NaN NaN  NaN  NaN
i4  NaN NaN  NaN NaN  NaN  1.0

ro, co = np.where(df.notna())
out = [f'{i}-{c}' for i, c in zip(df.index[ro], df.columns[co])]
print (out)
['i0-c2', 'i1-c2', 'i2-c0', 'i2-c2', 'i2-c4', 'i4-c5']

EDIT: If need remove sorted duplicates:

out = pd.unique(['-'.join(map(str, sorted(x))) for x in df.stack().index]).tolist()
print (out)
['0-2', '1-2', '2-2', '2-4', '4-5']

ro, co = np.where(df.notna())
out = pd.unique(['-'.join(map(str, sorted(x))) 
                 for x in zip(df.index[ro], df.columns[co])]).tolist()
print (out)
['0-2', '1-2', '2-2', '2-4', '4-5']

Dev solutions

Solutions for development problems

return the location of non null values

>Solution :

Leave a ReplyCancel reply

>Solution :

Share this:

Leave a ReplyCancel reply

Discover more from Dev solutions