Skip to main content

Are two words anagrams?

·350 words·2 mins

When correcting a single solution takes considerable time.

During the exam of my introductory programming course, I asked students to formulate a procedure that determines if two words are anagrams, i.e., they agree in the frequency of their constituting characters but not their order. One of my students came up with an original procedure of which I could not immediately assess the validity.

The algorithm she wrote was:

if len(word2) > len(word1):
    return False
for c in word1:
    if word1.count(c) > word2.count(c):
        return False
return True

For me this was a novel formulation. And this is how I verified her solution.

The first if-statement

The first if-statement establishes that

|w2||w1|.|w_2| \leq |w_1|.

This is important, if the two words are to agree in their character frequencies, one cannot be larger than the other. The use of an inequality here is peculiar. Typical solutions of this form check for equality.

The loop

Next, is a loop that establishes that for each character σw1\sigma \in w_1 its less or equally frequent in w1w_1.

#(σw1)#(σw2).(1)\#(\sigma \in w_1) \leq \#(\sigma \in w_2).\qquad{(1)}

Again, we notice the use of an inequality where one would expect an equality.

Combining what we know

Without having explicitly checked for equality of character frequency and length, can we still be certain that we only reach the final return statement if the two words are in fact anagrams? Let us combine the two.

Taking the sum of both sides of eq. 1 over the characters in w1w_1, and knowing that we count the occurrence of characters in w1w_1, we obtain

|w1|=σw1#(σw1)σw1#(σw2)|w2|.|w_1| = \sum_{\sigma \in w_1} \#(\sigma \in w_1) \leq \sum_{\sigma \in w_1} \#(\sigma \in w_2) \leq |w_2|.

The last inequality stems from the fact that w2w_2 may contain characters not in w1w_1. We now have |w2||w1||w_2| \leq |w_1| and |w1||w2||w_1| \leq |w_2|, therefore |w1|=|w2||w_1| = |w_2|.

Finally, we note that the inequalities in eq. 1 cannot be strict, as this would entail that |w1|<|w2||w_1| < |w_2|.

We thus conclude that w1w_1 and w2w_2 are equally long with the same character frequencies. This is the definition of an anagram.

Well done!