A while back I found a deficient implementation of #match:. Blaine and I decided to write a nicer one. At first, we went over the basic idea of why #match: works. We found that the pattern can be broken into pieces, like this:
Then we decided to do something that sounded loudly inefficient. We decided we'd parse the pattern, break it up into two kinds of tokens, and then let another object match tokens to the string. The tokens would be either the star, or stuff that needs to be matched exactly.
Once we were done, we were surprised to see that on slower hardware, parsing and then matching was ~3 times faster than the deficient version. What was killer was that since each token knew its identity there was no need for, you know, aCharacter == $* and that kind of stuff. No ifTrue:/ifFalse:.
Then we ran the profiler and found that because we supported using ? inside the textual tokens to match any single character... read this:
about 50% of the time was spent in Character>>==.
Even though this could seem like a good thing, we were not happy. So we subclassed our abstract token class and created PoundToken (the others were StarToken and MatchToken). We changed the parser, blablabla, and even though the parser did more work, the thing went twice as fast!
Insanely fast, actually. It was only 3 times slower than VW's #match: implementation.
After we wrote the parsing version of #match:, I inlined the whole thing into a single method to see how it would compare to VW's code. It was about 2% slower, but the code was much more readable.
Lesson 1: it is much faster to avoid checking + ifTrue:/ifFalse: to begin with by creating a class --- even if the check is for identity.
Lesson 2: don't inline your algorithm 9 times before you get it right, and then leave the non-inlined version around.
Incidentally, yes: PoundToken and StarToken were singletons.