From a databaset, create an edgelist and node's attributes list

December 15, 2021

I have a dataset wit the following columns and observations:

   Source   Target  Label_Source    Label_Target
    E   N   0.0 0.0
    A   B   1.0 1.0
    A   C   1.0 0.0
    A   D   1.0 0.0
    A   N   1.0 0.0
    S   G   0.0 0.0
    S   L   0.0 1.0
    S   C   0.0 0.0

Who built the dataset did not split into edgelist and node attributes so now I am interested in creating these two separate datasets.
My idea is to select unique nodes in the network and create a map between the nodes and their corresponding label values, be aware that Label_Source is assigned to the source node and Label_Target is assigned to the target node. There is no overlapping of the two in the network (at least, there is should not be).
My expected output would be

edgelist (just by dropping the Labels columns):

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.
Visit Medevel

Source Target
E N
A B
A C
A D
A N
S G
S L
S C

nodelist with attributes:

 Node    Label
 E          0
 N          0
 A          1
 B          1
 C          0
 D          0
 S          0
 G          0
 L          1

Can you please tell me how to get the nodelist creating this mapping? I guess an option would be to select distinct elements from both Source and Target, then for each of them look at their labels in Label_source or target columns.

>Solution :

Try:

edgelist = df[['Source', 'Target']]
nodelist = pd.concat([pd.DataFrame(df.filter(like='Source').to_numpy()),
                      pd.DataFrame(df.filter(like='Target').to_numpy())]) \
             .rename(columns={0: 'Node', 1: 'Label'}).astype({'Label': int}) \
             .drop_duplicates().reset_index(drop=True)

Output:

>>> edgelist
  Source Target
0      E      N
1      A      B
2      A      C
3      A      D
4      A      N
5      S      G
6      S      L
7      S      C

>>> nodelist
  Node  Label
0    E      0
1    A      1
2    S      0
3    N      0
4    B      1
5    C      0
6    D      0
7    G      0
8    L      1