CrudVision - Lisa Seelye

September 23, 2007

Random Snippet: Soundex

Filed under: C, ruby, snippet — Lisa Seelye @ 16:17

This bit of code is the Soundex algorithm based on some C written by a friend of mine (Hi Jen!). She writes really compact C (read: hard to read) and it took me a long time to understand what the code she wrote did.

So here is the annotated version of the code (correlates to the Rule numbers on Wikipedia).

RUBY:
  1. class String
  2.   @@sem = [ 0,1,2,3,0,1,2,0,0,2,2,4,5,5,0,1,2,6,2,3,0,1,0,2,0,2 ] # Letter codes, Rule 3
  3.  
  4.   def to_soundex
  5.     s = self.upcase
  6.     chars = s[1 .. s.size - 1].split(//) # Split the word into characters to encode
  7.     ret = s[0,1] # Rule 1
  8.     chars.each_index do |i|
  9.       c = chars[i]
  10.       next if ( c <'A' or c> 'Z' ) # Not a letter
  11.       next if i>0 and c == chars[i - 1] # Rule 4
  12.       d = @@sem[ c[0] - 'A'[0] ] # Encode it
  13.       ret += d.to_s unless d == 0 # Rule 2
  14.     end
  15.     ret += "0000" # Rule 5
  16.     ret[0..3] # Rule 5
  17.   end
  18. end

It was fun to write and it works well.

Powered by WordPress