How to delete specific values from a list-column in pandas

December 6, 2022

I’ve used POS-tagging (in german language, thus nouns have "NN" and "NE" as abbreviations) and now I am having trouble to extract the nouns into a new column of the pandas dataframe.

Example:

data = {"tagged": [[("waffe", "Waffe", "NN"), ("haus", "Haus", "NN")], [("groß", "groß", "ADJD"), ("bereich", "Bereich", "NN")]]}
df = pd.DataFrame(data=data)
df
df["nouns"] = df["tagged"].apply(lambda x: [word for word, tag in x if tag in ["NN", "NE"]])

Results in the following error message: "ValueError: too many values to unpack (expected 2)"

I think the code would work if I was able to delete the first value of each tagged word but I cannot figure out how to do that.

>Solution :

Because there are tuples with 3 values unpack values to variables word1 and word2:

df["nouns"] = df["tagged"].apply(lambda x: [word2 for word1, word2, tag 
                                                         in x if tag in ["NN", "NE"]])

Or use same solution in list comprehension:

df["nouns"] = [[word2 for word1,word2, tag in x if tag in ["NN", "NE"]]
                for x in df["tagged"]]

print (df)
                                         tagged          nouns
0        [(waffe, Waffe, NN), (haus, Haus, NN)]  [Waffe, Haus]
1  [(groß, groß, ADJD), (bereich, Bereich, NN)]      [Bereich]