count words without point and comma in python

October 11, 2022

I would like to count each word in a text file in python.

My text is this

Aragon is an autonomous community in northeastern Spain. The capital of Aragon is Zaragoza, which is also the most populous city in the autonomous community. Covering an area of 47720 km2, the region's terrain ranges from permanent glaciers through verdant valleys, rich pastures and orchards to the arid steppe plains of the central lowlands. Aragon is home to many rivers, most notably the Ebro, Spain's largest river, which flows west to east throughout the region through the province of Zaragoza. It is also home to the highest mountains in the Pyrenees.

I gave the following code

file=open("data/aragon.txt",'r')
from collections import Counter
wordcount = Counter(file.read().split())
for item in wordcount.items(): print("{}\t{}".format(*item))

But the problem is, it doesnt come in an order. I would like to have that the highest is at the top and lowest on the other side and don’t have any words like this: Ebro, or Spain. no point or comma just word

How can I fix that?

>Solution :

Maybe you can use regex and match words

from collections import Counter
import re
wordcount = Counter(re.findall('\w+', file.read()))
for item in wordcount.most_common(): print("{}\t{}".format(*item))