Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to ascribe the value count of a list item to a new column – pandas

Imagine that I have a dataset df with one column containing a dictionary with two list types (list_A and list_B) as value:

data = [{"list_A": [2.93, 4.18, 4.18, None, 1.57, 1.57, 3.92, 6.27, 2.09, 3.14, 0.42, 2.09],
         "list_B": [820, 3552, 7936, None, 2514, 4035, 6441, 15379, 2167, 6147, 3322, 1177]},
        {"list_A": [2.51, 3.58, 3.58, None, 1.34, 1.34, 3.36, 5.37, 1.79, 2.69, 0.36, 1.79],
         "list_B": [820, 3552, 7936, None, 2514, 4035, 6441, 15379, 2167, 6147, 3322, 1177]},
        {"list_A": [None, 5.94, 5.94, None, 2.23, 2.23, 5.57, 8.9, 2.97, 4.45, 0.59, 2.97],
         "list_B": [820, 3552, 7936, None, 2514, 4035, 6441, 15379, 2167, 6147, 3322, 1177]}]

# Create a DataFrame with a column named "column_dic"
df = pd.DataFrame({"column_dic": [data]})

Now, I want to create an additional column count_first_item that contains the count of non-Null values of the first item ([0]) of the lists that correspond to "List_A".

The expected output of this is 2 (2.93 = +1; 2.51 = +1; None = 0).

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

>Solution :

Use list comprehension for get first values of list_A, test non missing values by notna and count Trues by sum:

df['count_first_item'] = [pd.notna([y['list_A'][0] for y in x]).sum() 
                          for x in df['column_dic']]
print (df)
                                          column_dic  count_first_item
0  [{'list_A': [2.93, 4.18, 4.18, None, 1.57, 1.5...                 2

Or use Series.explode, get values of lists by str or Series.str.get, get first values by indexing – str[0] and count non missing values by DataFrameGroupBy.count:

df['count_first_item'] = (df['column_dic'].explode().str.get('list_A').str[0]
                                          .groupby(level=0).count())
print (df)
                                          column_dic  count_first_item
0  [{'list_A': [2.93, 4.18, 4.18, None, 1.57, 1.5...                 2
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading