I’ve used POS-tagging (in german language, thus nouns have "NN" and "NE" as abbreviations) and now I am having trouble to extract the nouns into a new column of the pandas dataframe.
Example:
data = {"tagged": [[("waffe", "Waffe", "NN"), ("haus", "Haus", "NN")], [("groß", "groß", "ADJD"), ("bereich", "Bereich", "NN")]]}
df = pd.DataFrame(data=data)
df
df["nouns"] = df["tagged"].apply(lambda x: [word for word, tag in x if tag in ["NN", "NE"]])
Results in the following error message: "ValueError: too many values to unpack (expected 2)"
I think the code would work if I was able to delete the first value of each tagged word but I cannot figure out how to do that.
>Solution :
Because there are tuples with 3 values unpack values to variables word1 and word2:
df["nouns"] = df["tagged"].apply(lambda x: [word2 for word1, word2, tag
in x if tag in ["NN", "NE"]])
Or use same solution in list comprehension:
df["nouns"] = [[word2 for word1,word2, tag in x if tag in ["NN", "NE"]]
for x in df["tagged"]]
print (df)
tagged nouns
0 [(waffe, Waffe, NN), (haus, Haus, NN)] [Waffe, Haus]
1 [(groß, groß, ADJD), (bereich, Bereich, NN)] [Bereich]