Algorithm for finding string permutations

Sat 08 October 2016 by kroche in Algorithms

I'm a member of the Framingham State Computer Science club, which meets weekly. At the close of every meeting, we are posed two "interview questions" to answer. Usually, one of them is a logical puzzle, and the other is an algorithm problem.

This week, we were asked to find an algorithm which could determine whether a string sB is a permutation of a string sA, "permutation" being defined as a string of the same length containing the same characters, irrespective of the order in which they appear. For example, "god" is a permutation of "dog", and so is "dgo", but not "nod" since it contains "n", which does not appear in "dog". Character encoding is assumed to be ASCII.

The given solution was to form two arrays: array aA and array aB, each comprised of the characters of their respective strings, like so:

string sA = "dog"
string sB = "god"

array aA = ["d", "o", "g"]
array aB = ["g", "o", "d"]

Each character in aA is compared to each character in aB. If any mismatches are detected, the strings are determined not to be permutations:

for char in aA:
    if (char not in aB):
        return false
return true

I was struck by the inefficiency of this algorithm. Because it needs to compare each character in aA to each character in aB, it is O(n^2) complex, which I felt to be excessive for such a simple operation. I developed a more efficient algorithm that is O(n) complex which relies on the uniqueness of the values representing ASCII characters to produce a unique number which remains the same between permutations.

Originally, this ostensibly unique number was the sum of the decimal ASCII values of each character in sA:

sum = 0

for char in sA:
    sum += ord(char)

For the string "dog", this would be 100 + 111 + 103 = 314. This sum remains the same for any permutation thanks to the commutative property of addition ("god" = 103 + 111 + 100 = 314).

This algorithm is easily defeatable by subtracting an arbitrary number from any character in sA and adding it to any other character in sA. For example:

"dog" = 100 + 111 + 103 = 314
"god" = 103 + 111 + 100 = 314
"hoc" = 104 + 111 + 99 = 314

"hoc" is not a permutation of "dog" because it contains letters {"h", "c"} which are not present in "dog", yet the sum is the same.

I needed to find a mathematical operation that would retain the commutative property so as to produce the same result for permutations, but could not be algebraically fooled.

My solution is to sum the square of each value instead of summing the values themselves, as this retains the commutative property of addition but is not prone to collisions in the same way.

"dog" = 100^2 + 111^2 * 103^2 = 32930
"god" = 103^2 + 111^2 + 100^2 = 32930
"hoc" = 104^2 + 111^2 + 99^2 = 32938

Implementations of these algorithms can be found in this repo.


Tags: None