Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Splitting the elements of a list by some separator in the same list

I have an array:

array([nan, 'Stressful day', 'Drank coffee:Drank tea', 'Drank tea',
       'Ate late:Drank coffee', 'Drank coffee:Drank tea:Worked out',
       'Drank tea:Worked out', 'Drank coffee:Drank tea:Stressful day',
       'Drank coffee', 'Drank coffee:Drank tea:Stressful day:Worked out',
       'Drank coffee:Worked out', 'Ate late:Drank coffee:Drank tea',
       'Ate late:Drank coffee:Drank tea:Worked out',
       'Drank tea:Stressful day', 'Drank tea:Stressful day:Worked out',
       'Drank coffee:Stressful day:Worked out',
       'Drank coffee:Stressful day',
       'Ate late:Drank coffee:Drank tea:Stressful day', 'Worked out',
       'Ate late:Drank coffee:Worked out'], dtype=object)

these are unique values from the column of a dataframe,

as you can see they are combination of other values like ‘Drank coffee:Drank tea’ is a combination of ‘Drank coffee’ and ‘Drank tea’. I want those unique elements for this list.

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

What’s the quickest way to create that list? Is there any inbuilt function in python libraries for this sort of thing?

Expected output:

array([nan, 'Stressful day', 'Drank coffee', 'Drank tea', 'Ate late',
       'Worked out'], dtype=object)

>Solution :

Assuming a the input array, you could use str.extractall:

out = pd.Series(a).str.extractall('([^:]+)')[0].unique()

From the original Series s:

out = s.unique().drop_duplicates().str.extractall('([^:]+)')[0].unique()

Output:

array(['Stressful day', 'Drank coffee', 'Drank tea', 'Ate late',
       'Worked out'], dtype=object)

Other options (maybe less efficient):

out = set(x for s in a if isinstance(s, str) for x in s.split(':'))

out = pd.Series(a).str.split(':').explode().unique()
alternative interpretation

If you want to split each individual string:

out = pd.Series(a).str.split(':')

# or
out = pd.Series(a).str.findall('([^:]+)')

Output:

0                                                   NaN
1                                       [Stressful day]
2                             [Drank coffee, Drank tea]
3                                           [Drank tea]
4                              [Ate late, Drank coffee]
5                 [Drank coffee, Drank tea, Worked out]
6                               [Drank tea, Worked out]
7              [Drank coffee, Drank tea, Stressful day]
8                                        [Drank coffee]
9     [Drank coffee, Drank tea, Stressful day, Worke...
10                           [Drank coffee, Worked out]
11                  [Ate late, Drank coffee, Drank tea]
12      [Ate late, Drank coffee, Drank tea, Worked out]
13                           [Drank tea, Stressful day]
14               [Drank tea, Stressful day, Worked out]
15            [Drank coffee, Stressful day, Worked out]
16                        [Drank coffee, Stressful day]
17    [Ate late, Drank coffee, Drank tea, Stressful ...
18                                         [Worked out]
19                 [Ate late, Drank coffee, Worked out]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading