I have an array:
array([nan, 'Stressful day', 'Drank coffee:Drank tea', 'Drank tea',
'Ate late:Drank coffee', 'Drank coffee:Drank tea:Worked out',
'Drank tea:Worked out', 'Drank coffee:Drank tea:Stressful day',
'Drank coffee', 'Drank coffee:Drank tea:Stressful day:Worked out',
'Drank coffee:Worked out', 'Ate late:Drank coffee:Drank tea',
'Ate late:Drank coffee:Drank tea:Worked out',
'Drank tea:Stressful day', 'Drank tea:Stressful day:Worked out',
'Drank coffee:Stressful day:Worked out',
'Drank coffee:Stressful day',
'Ate late:Drank coffee:Drank tea:Stressful day', 'Worked out',
'Ate late:Drank coffee:Worked out'], dtype=object)
these are unique values from the column of a dataframe,
as you can see they are combination of other values like ‘Drank coffee:Drank tea’ is a combination of ‘Drank coffee’ and ‘Drank tea’. I want those unique elements for this list.
What’s the quickest way to create that list? Is there any inbuilt function in python libraries for this sort of thing?
Expected output:
array([nan, 'Stressful day', 'Drank coffee', 'Drank tea', 'Ate late',
'Worked out'], dtype=object)
>Solution :
Assuming a the input array, you could use str.extractall:
out = pd.Series(a).str.extractall('([^:]+)')[0].unique()
From the original Series s:
out = s.unique().drop_duplicates().str.extractall('([^:]+)')[0].unique()
Output:
array(['Stressful day', 'Drank coffee', 'Drank tea', 'Ate late',
'Worked out'], dtype=object)
Other options (maybe less efficient):
out = set(x for s in a if isinstance(s, str) for x in s.split(':'))
out = pd.Series(a).str.split(':').explode().unique()
alternative interpretation
If you want to split each individual string:
out = pd.Series(a).str.split(':')
# or
out = pd.Series(a).str.findall('([^:]+)')
Output:
0 NaN
1 [Stressful day]
2 [Drank coffee, Drank tea]
3 [Drank tea]
4 [Ate late, Drank coffee]
5 [Drank coffee, Drank tea, Worked out]
6 [Drank tea, Worked out]
7 [Drank coffee, Drank tea, Stressful day]
8 [Drank coffee]
9 [Drank coffee, Drank tea, Stressful day, Worke...
10 [Drank coffee, Worked out]
11 [Ate late, Drank coffee, Drank tea]
12 [Ate late, Drank coffee, Drank tea, Worked out]
13 [Drank tea, Stressful day]
14 [Drank tea, Stressful day, Worked out]
15 [Drank coffee, Stressful day, Worked out]
16 [Drank coffee, Stressful day]
17 [Ate late, Drank coffee, Drank tea, Stressful ...
18 [Worked out]
19 [Ate late, Drank coffee, Worked out]