I read some words from a file and print the 30 most frequent words but some words are printed
twice as you can see in the output.
#include <iostream>
#include <vector>
#include <map>
#include <iterator>
#include <fstream>
using namespace std;
int main(){
fstream fs, output;
fs.open("/Users/brah79/Downloads/skola/c++/inlämningsuppgifter/labb4/L4_wc/hitchhikersguide.txt");
output.open("/Users/brah79/Downloads/skola/c++/inlämningsuppgifter/labb4/labb4/output.txt");
if(!fs.is_open() || !output.is_open()){
cout << "could not open file" << endl;
}
map <string, int> mp;
string word;
while(fs >> word){
for(int i = 0; i < word.length(); i++){
if(!isalpha(word[i])){
word.erase(i--, 1);
}
}
if(word.empty()){
continue;
}
mp[word]++;
}
vector<pair<int, string>> v;
v.reserve(mp.size());
for (const auto& p : mp){
v.emplace_back(p.second, p.first);
}
sort(v.rbegin(), v.rend());
cout << "Theese are the 30 most frequent words: " << endl;
for(int i = 0; i < 30; i++){
cout << v[i].second << " : " << v[i].first << " times" << endl;
}
output << "Theese are the 30 most frequent words: " << endl;
for(int i = 0; i < 30; i++){
cout << v[i].second << " : " << v[i].first << " times" << endl;
}
return 0;
}
output:
the : 2230 times !!!
of : 1254 times
to : 1177 times
a : 1121 times
and : 1109 times
said : 680 times
it : 665 times
was : 605 times
in : 590 times
he : 546 times
that : 520 times
you : 495 times
I : 428 times
on : 349 times
Arthur : 332 times
his : 324 times
Ford : 314 times
The : 307 times !!!
at : 306 times
for : 284 times
is : 281 times
with : 273 times
had : 252 times
He : 242 times
this : 220 times
as : 207 times
Zaphod : 206 times
be : 188 times
all : 186 times
him : 182 times
"the" is printed twice. Also "could not open file" is printed at the top even
though the file was open and it’s content is stored in the map.
>Solution :
Because you’ve written your program in an case-sensitive manner.
In particular, The and the are considered different from each other and so have different frequencies. For example, the is 2230 times while The is 307 times.