Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

How to delete specific values from a list-column in pandas

I’ve used POS-tagging (in german language, thus nouns have "NN" and "NE" as abbreviations) and now I am having trouble to extract the nouns into a new column of the pandas dataframe.

Example:

data = {"tagged": [[("waffe", "Waffe", "NN"), ("haus", "Haus", "NN")], [("groß", "groß", "ADJD"), ("bereich", "Bereich", "NN")]]}
df = pd.DataFrame(data=data)
df
df["nouns"] = df["tagged"].apply(lambda x: [word for word, tag in x if tag in ["NN", "NE"]])

Results in the following error message: "ValueError: too many values to unpack (expected 2)"

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

I think the code would work if I was able to delete the first value of each tagged word but I cannot figure out how to do that.

>Solution :

Because there are tuples with 3 values unpack values to variables word1 and word2:

df["nouns"] = df["tagged"].apply(lambda x: [word2 for word1, word2, tag 
                                                         in x if tag in ["NN", "NE"]])

Or use same solution in list comprehension:

df["nouns"] = [[word2 for word1,word2, tag in x if tag in ["NN", "NE"]]
                for x in df["tagged"]]

print (df)
                                         tagged          nouns
0        [(waffe, Waffe, NN), (haus, Haus, NN)]  [Waffe, Haus]
1  [(groß, groß, ADJD), (bereich, Bereich, NN)]      [Bereich]
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading