I have a dateset made by several columns, like this:
patient, action, org:resource, DateTime
patient 0, First consult, Dr. Anna, 2017-01-02 11:40:11
patient 0, Blood test, Lab, 2017-01-02 12:47:33
patient 0, Physical test, Nurse Jesse, 2017-01-02 12:53:50
patient 0, Second consult, Dr. Anna, 2017-01-02 16:21:06
patient 0, Surgery, Dr. Charlie, 2017-01-05 13:23:09
patient 0, Final consult, Dr. Ben, 2017-01-09 08:29:28
patient 1, First consult, Dr. Anna, 2017-01-02 12:50:35
patient 1, Physical test, Nurse Jesse, 2017-01-02 13:59:14
patient 1, Blood test, Lab, 2017-01-02 14:20:19
patient 1, X-ray scan, Team 1, 2017-01-06 09:13:40
patient 1, Second consult, Dr. Anna, 2017-01-06 10:38:04
patient 1, Medicine, Pharmacy, 2017-01-06 11:47:36
I want to get an array containing all the values of the column "resource" without repetition, only the labels.
Something like this:
resources = ['Dr. Anna', 'Lab', 'Nurse Jesse', 'Dr. Charlie', 'Dr. Ben', 'Team 1', 'Pharmacy']
How to get it?
I was thinking about
fn = 'data2.csv'
events = pd.read_csv(fn)
events.columns = ['patient', 'action', 'resource', 'datetime']
resourcenums = [e for (i, e) in enumerate(events['resource'])]
But I know it’s not the correct way
>Solution :
First add skipinitialspace to read_csv, so possible select column name org:resource with remove duplicates in original order:
events = pd.read_csv(fn, skipinitialspace=True)
resources = list(dict.fromkeys(events['org:resource']))
Or:
resources = list(pd.unique(events['org:resource']))
Or:
resources = list(events['org:resource'].drop_duplicates())
print (resources)
['Dr. Anna', 'Lab', 'Nurse Jesse', 'Dr. Charlie', 'Dr. Ben', 'Team 1', 'Pharmacy']