I would like to count each word in a text file in python.
My text is this
Aragon is an autonomous community in northeastern Spain. The capital of Aragon is Zaragoza, which is also the most populous city in the autonomous community. Covering an area of ​​47720 km2, the region's terrain ranges from permanent glaciers through verdant valleys, rich pastures and orchards to the arid steppe plains of the central lowlands. Aragon is home to many rivers, most notably the Ebro, Spain's largest river, which flows west to east throughout the region through the province of Zaragoza. It is also home to the highest mountains in the Pyrenees.
I gave the following code
file=open("data/aragon.txt",'r')
from collections import Counter
wordcount = Counter(file.read().split())
for item in wordcount.items(): print("{}\t{}".format(*item))
But the problem is, it doesnt come in an order. I would like to have that the highest is at the top and lowest on the other side and don’t have any words like this: Ebro, or Spain. no point or comma just word
How can I fix that?
>Solution :
Maybe you can use regex and match words
from collections import Counter
import re
wordcount = Counter(re.findall('\w+', file.read()))
for item in wordcount.most_common(): print("{}\t{}".format(*item))