Advertisements
I have a dataframe that looks like this:
0 1 2 3 4 5
0 NaN NaN 7.0 NaN NaN NaN
1 NaN NaN 9.0 NaN NaN NaN
2 5.0 NaN 3.0 NaN 9.0 NaN
3 NaN NaN NaN NaN NaN NaN
4 NaN NaN NaN NaN NaN 1.0
I am trying to return the location of non null values.
For e.g. 7.0 is in the first row and second column (0-2)
expected = ["0-2", "1-2", "2-0", "2-2", "2-4", "4-5"]
Dataframe:
mylist=[[np.nan, np.nan, 7, np.nan, np.nan, np.nan],[np.nan, np.nan, 9, np.nan, np.nan, np.nan],[5, np.nan, 3, np.nan, 9, np.nan],[np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],[np.nan, np.nan,np.nan, np.nan,np.nan, 1]]
df = pd.DataFrame(mylist)
Update:
I am getting duplicate rows in the list. For e.g. 34-35 is same as 35-34
out = ['34-35', '35-34',
'41-42', '42-41',
'46-47', '47-46',
'59-63', '63-59',
'75-76', '76-75',
'87-88', '88-87']
I need to remove the duplicates and get the unique values like:
expected = ['34-35', '41-42', '46-47', '59-63', '75-76', '87-88']
>Solution :
Use list comprehension with DataFrame.stack
:
out = [f'{i}-{c}' for i, c in df.stack().index]
print (out)
['0-2', '1-2', '2-0', '2-2', '2-4', '4-5']
Or numpy.where
for indices;
Solution if columns and index are default RangeIndex
:
ro, co = np.where(df.notna())
out = [f'{i}-{c}' for i, c in zip(ro, co)]
print (out)
['0-2', '1-2', '2-0', '2-2', '2-4', '4-5']
If not, use indexing:
df = df.rename(index = lambda x: f'i{x}', columns = lambda x: f'c{x}')
print (df)
c0 c1 c2 c3 c4 c5
i0 NaN NaN 7.0 NaN NaN NaN
i1 NaN NaN 9.0 NaN NaN NaN
i2 5.0 NaN 3.0 NaN 9.0 NaN
i3 NaN NaN NaN NaN NaN NaN
i4 NaN NaN NaN NaN NaN 1.0
ro, co = np.where(df.notna())
out = [f'{i}-{c}' for i, c in zip(df.index[ro], df.columns[co])]
print (out)
['i0-c2', 'i1-c2', 'i2-c0', 'i2-c2', 'i2-c4', 'i4-c5']
EDIT: If need remove sorted duplicates:
out = pd.unique(['-'.join(map(str, sorted(x))) for x in df.stack().index]).tolist()
print (out)
['0-2', '1-2', '2-2', '2-4', '4-5']
ro, co = np.where(df.notna())
out = pd.unique(['-'.join(map(str, sorted(x)))
for x in zip(df.index[ro], df.columns[co])]).tolist()
print (out)
['0-2', '1-2', '2-2', '2-4', '4-5']