I have a dataset wit the following columns and observations:
Source Target Label_Source Label_Target
E N 0.0 0.0
A B 1.0 1.0
A C 1.0 0.0
A D 1.0 0.0
A N 1.0 0.0
S G 0.0 0.0
S L 0.0 1.0
S C 0.0 0.0
Who built the dataset did not split into edgelist and node attributes so now I am interested in creating these two separate datasets.
My idea is to select unique nodes in the network and create a map between the nodes and their corresponding label values, be aware that Label_Source is assigned to the source node and Label_Target is assigned to the target node. There is no overlapping of the two in the network (at least, there is should not be).
My expected output would be
-
edgelist (just by dropping the Labels columns):
Source Target
E N
A B
A C
A D
A N
S G
S L
S C -
nodelist with attributes:
Node Label E 0 N 0 A 1 B 1 C 0 D 0 S 0 G 0 L 1
Can you please tell me how to get the nodelist creating this mapping? I guess an option would be to select distinct elements from both Source and Target, then for each of them look at their labels in Label_source or target columns.
>Solution :
Try:
edgelist = df[['Source', 'Target']]
nodelist = pd.concat([pd.DataFrame(df.filter(like='Source').to_numpy()),
pd.DataFrame(df.filter(like='Target').to_numpy())]) \
.rename(columns={0: 'Node', 1: 'Label'}).astype({'Label': int}) \
.drop_duplicates().reset_index(drop=True)
Output:
>>> edgelist
Source Target
0 E N
1 A B
2 A C
3 A D
4 A N
5 S G
6 S L
7 S C
>>> nodelist
Node Label
0 E 0
1 A 1
2 S 0
3 N 0
4 B 1
5 C 0
6 D 0
7 G 0
8 L 1