Follow

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use
Contact

Is there a way to index every word inside of a text file?

I am learning a foreign language and have exports of chats between my friends and I in my native language. I would like to parse or index each word from these text files and export them into a "word list" of sorts with a frequency count next to them so I can then translate them and learn them in my target language.

Example Input:

/mytextfile.txt => to the to the foo bar can to the foo the bar to the foo bar the to

MEDevel.com: Open-source for Healthcare and Education

Collecting and validating open-source software for healthcare, education, enterprise, development, medical imaging, medical records, and digital pathology.

Visit Medevel

Example of Output:

Word List:

  • Word – Frequency
  1. the – 6
  2. to – 5
  3. bar – 3
  4. foo – 3
  5. can – 1

I would like to do this in PowerShell but I am not 100% sure on how to index the words. I understand "measure", and how to find/replace strings, but is there a way to just say "extract all words" with no filter, represent them as a single word and the put the measure of that word next to them? Kind of stuck here. Any help or libraries that may have this function would be sufficient. Maybe push each word into a table with a for-each _$ then use some kind of selector to push each word to an output file, delete duplicates, and then count the frequesncy?

I have tried measure, find, replace, etc. But they require you to specify a word or words to be found.

>Solution :

This can be achieved by reading the file content as a single string (Get-Content -Raw) then splitting on 1 or more appearances of white space (-split) then grouping all words to get the frequency and sorting by frequency and lastly Select-Object to give the columns the proper names:

-split (Get-Content path\to\the\file.txt -Raw) |
    Group-Object -NoElement |
    Sort-Object Count -Descending |
    Select-Object @{ N='Word'; E='Name' }, @{ N='Frequency'; E='Count' }
Add a comment

Leave a Reply

Keep Up to Date with the Most Important News

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Discover more from Dev solutions

Subscribe now to keep reading and get access to the full archive.

Continue reading