I came across unexpected result. And I do not understand why this happens when I use collections.Counter
I use python 3.8
from collections import Counter counter = Counter() counter["تمباکو"] = +1 print(counter.most_common())
According to the documentation it should returns (keywords, count) pair
When I try to write to csv the output of counter.most_common() it also changes the order of the data:
writer = csv.writer(f) writer.writerows(counter.most_common())
it outputs in rows pairs (count, keyword)
but when you run:
it will output:
and it looks like everything is fine, because keywords is first.
Something is wrong and I do not understand it.
To elaborate on my comment:
It’s not Python, it’s your input.
Here’s a synthetic example that has a string including U+202E RIGHT-TO-LEFT OVERRIDE (which, humorously, affects the rendering on that linked page too).
from collections import Counter s = "\u202Ehello" c = Counter() c[s] += 1 for word, count in c.most_common(): print(word, count)
When I run this, my terminal shows
since the 202E character overrides rendering order.
If I remove the 202E character, I get
A way to print strings that have such override characters in a "de-fanged" way is to use
repr() (with its own caveats, of course):
for word, count in c.most_common(): print(repr(word), count)
since the offending control character is escaped.