Is there a way to index every word inside of a text file?

August 30, 2023

I am learning a foreign language and have exports of chats between my friends and I in my native language. I would like to parse or index each word from these text files and export them into a "word list" of sorts with a frequency count next to them so I can then translate them and learn them in my target language.

Example Input:

/mytextfile.txt => to the to the foo bar can to the foo the bar to the foo bar the to

Example of Output:

Word List:

Word – Frequency

the – 6
to – 5
bar – 3
foo – 3
can – 1

I would like to do this in PowerShell but I am not 100% sure on how to index the words. I understand "measure", and how to find/replace strings, but is there a way to just say "extract all words" with no filter, represent them as a single word and the put the measure of that word next to them? Kind of stuck here. Any help or libraries that may have this function would be sufficient. Maybe push each word into a table with a for-each _$ then use some kind of selector to push each word to an output file, delete duplicates, and then count the frequesncy?

I have tried measure, find, replace, etc. But they require you to specify a word or words to be found.

>Solution :

This can be achieved by reading the file content as a single string (Get-Content -Raw) then splitting on 1 or more appearances of white space (-split) then grouping all words to get the frequency and sorting by frequency and lastly Select-Object to give the columns the proper names:

-split (Get-Content path\to\the\file.txt -Raw) |
    Group-Object -NoElement |
    Sort-Object Count -Descending |
    Select-Object @{ N='Word'; E='Name' }, @{ N='Frequency'; E='Count' }