How to convert a dataframe to a dictionary when dataframe column has commas

I have a csv as follows:
enter image description here

I need the "Term"s and the "DocID"s for which the "DocFreq" is greater than 5. And I need to store it as a dictionary where the Term is the key and the "DocID"s separated by the comma make individual values for that key in a list.

For example, I need

{"Want to be with":[doc100.txt,doc8311.txt,...doc123.txt], "and has her own": [doc100.txt,doc9286.txt...doc23330.txt]....}

So far, I’ve got this:

df1 = df[(df['DocFreq'] > 5)][['Term','DocFreq','Ngram','DocID']]

But I can’t get the format I need. Doing df.to_dict() gives me a dictionary of dictionaries that include column names and I don’t want that.

Please help!!
Thank you!!

>Solution :

You are almost there. Just select DocID column before calling to_dict.
You may use

dict1 = df.loc[(df['DocFreq'] > 5), ['Term','DocID']].set_index('Term')['DocID'].to_dict()

Leave a Reply