I changed the use of the bool data type (which is really an int) to char and shrunk the size of the program and sped it up. real 0m1.885s user 0m1.649s sys 0m0.168s I think this is because the top most loop, when processing the top row, basically just zooms down a given row of the compared array. With four chars being packed in where one bool would be more of the tests will be using data that is available in the CPUs cache.