I need the "Term"s and the "DocID"s for which the "DocFreq" is greater than 5. And I need to store it as a dictionary where the Term is the key and the "DocID"s separated by the comma make individual values for that key in a list.
For example, I need
{"Want to be with":[doc100.txt,doc8311.txt,...doc123.txt], "and has her own": [doc100.txt,doc9286.txt...doc23330.txt]....}
So far, I’ve got this:
df1 = df[(df['DocFreq'] > 5)][['Term','DocFreq','Ngram','DocID']]
But I can’t get the format I need. Doing df.to_dict() gives me a dictionary of dictionaries that include column names and I don’t want that.
Please help!!
Thank you!!
>Solution :
You are almost there. Just select DocID column before calling to_dict
.
You may use
dict1 = df.loc[(df['DocFreq'] > 5), ['Term','DocID']].set_index('Term')['DocID'].to_dict()