Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Correct display of data after aggregation

In continuation of my question

There is a table in a CSV file format:

A B
35480007 0695388
35480007 0695388
35407109 3324741
35407109 3324741
35250208 0695388
35250208 6104556
86730903 3360935
86730903 3360935

By applying the code for aggregation:

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

df.groupby("B")["A"].unique()

I get the result:

695388     [35480007, 35250208]
3324741              [35407109]
3360935              [86730903]
6104556              [35250208]

Could you tell me please, how can I apply some kind of filter so that only those values that have a value greater than two can be displayed: that is so:

695388     [35480007, 35250208]

and how to save the result to a file, for example in txt.

I apologize in advance if my question seemed incorrect. I am very weak in the pandas library.

thank you very much!

>Solution :

It took me a second to realize that what you mean is not a value greater than two, but rather a length greater than one (or greather than or equal to two).

With that said, you can use the apply function on your Series to see which rows satisfy this property

grouped = df.groupby("B")["A"].unique()
has_multiple_elements = grouped.apply(lambda x: len(x)>1)

Which basically applies a function to each entry in your grouped series, and returns the following:

695388      True
3324741    False
3360935    False
6104556    False

Now all that’s left is to use these True/False boolean values to filter your series. Luckily, this is very simple.

result = grouped[has_multiple_elements]

As for the second part of your question, writing this to a file can be done using the to_csv function:

# I usually use tab separated files in case any commas appear in your data itself
result.to_csv('output.tsv', sep='\t')
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading