Checking for unique elements in a string vs. converting string to a set

June 8, 2023

I’ve seen the below "code1" used for checking for most frequently used characters in a string where each character is first checked for uniqueness rather than simply converting the string to a set (as in "code2"). Is the "code1" method preferred due to the slight performance improvement? And why is converting a string to a set more expensive than checking each element of the string if it already exists in a dictionary?

code1=

string = "This is an example string"
char_list = {}
for char in string:
    if char in char_list:
        char_list[char] += 1
    else:
        char_list[char] = 1

max_frequency = max(char_list.values())
chars_with_max_frequency = [key for key, value in char_list.items() if value == max_frequency]

code2=

string = "This is an example string"
char_list = {}
for char in set(string):
    char_list[char] = string.count(char)

max_frequency = max(char_list.values())
chars_with_max_frequency = [key for key, value in char_list.items() if value == max_frequency]

print("Method #1: Checking string for unique chars: %.4f" % timeit.timeit(code1,number=1000000))

print("Method #2: Converting string to set: %.4f" % timeit.timeit(code2,number=1000000))

Method #1: Checking string for unique chars: 2.6955

Method #2: Converting string to set: 3.4223

>Solution :

In the first method (code1), you are iterating through each character in the string and incrementing a counter for that character in a dictionary. The operation is O(n) because each character is being visited exactly once, where n is the length of the string.

In the second method (code2), after converting the string to a set (which eliminates duplicates and is an O(n) operation), you’re invoking the string.count(char) method. This method itself is an O(n) operation because it needs to traverse the entire string to count occurrences of the character. As this is done for each unique character, the total operation becomes O(n^2). In the worst-case scenario, this could be as large as the length of the string itself.

So, to conclude, the conversion of the string to a set is not the expensive operation here. Instead, the expensive operation in the second method is the repeated use of the string.count() function. This is why the first method is faster than the second one.