This post originated from an RSS feed registered with Java Buzz
by Joey Gibson.
Original Post: Kata 6
Feed Title: Joey Gibson's Blog
Feed URL: http://www.jobkabob.com/index.html
Feed Description: Thoughts, musings, ramblings and rants on Java, Ruby, Python, obscure languages, politics and other exciting topics.
I took a swipe at implementing Dave
Thomas' Kata 6 which is an assignment dealing with anagrams. The goal is to
parse a list of 45000-ish words, finding all the words that are
anagrams of other words in the file. Dave claims there are 2,530 sets
of anagrams, but I only got 2,506. I'm not sure where the disconnect
is, but here's my solution. I welcome any comments and critiques.
words = IO.readlines("wordlist.txt")
anagrams = Hash.new([])
words.each do |word|
base = Array.new
word.chomp!.downcase!
word.each_byte do |byte|
base << byte.chr
end
base.sort!
anagrams[base.to_s] |= [word]
end
# Find the anagrams by eliminating those with only one word
anagrams.reject! {|k, v| v.length == 1}
values = anagrams.values.sort do |a, b|
b.length <=> a.length
end
File.open("anagrams.txt", "w") do |file|
values.each do |line|
file.puts(line.join(", "))
endend
largest = anagrams.keys.max do |a, b|
a.length <=> b.length
end
puts "Total: #{anagrams.length}" #
puts "Largest set of anagrams: #{values[0].inspect}" #
print "Longest anagram: #{anagrams[largest].inspect} " #
puts "at #{largest.length} characters each"
Update: Of course, 10 seconds after uploading the
code, I see something I could change. Instead of sorting the anagram
hash descending by array length, I could have done the following:
longest = anagrams.to_a.max do |a, b|
a[1].length <=> b[1].length
end
This will sort and pull the largest one off. The key is bucket 0 and
the interesting array is in bucket 1.