Kuro5hin.org: technology and culture, from the trenches
create account | help/FAQ | contact | links | search | IRC | site news
[ Everything | Diaries | Technology | Science | Culture | Politics | Media | News | Internet | Op-Ed | Fiction | Meta | MLP ]
We need your support: buy an ad | premium membership

[P]
Programming Fun Challenge 4

By jacob in jacob's Diary
Tue Apr 23, 2002 at 10:44:51 AM EST
Tags: (all tags)

The new Programming Fun Challenge is out! This time, we'll be writing a program to play the word chains game: write a program that, given two words, can produce a list of words that connect the first to the second, each word differing from its neighbors by only one letter.


The Challenge

Programming Challenge 4 is to write a program in the language of your choice that, given a dictionary, can produce minimum-length word chains that connect a start word to an end word. A word chain connecting a start word to an end word is a list of words that satisfies four properties:

  1. It starts with the given start word.
  2. It ends with the given end word.
  3. All words in the list are valid dictionary words.
  4. If two words are adjacent to each other in the list, they differ from each other by exactly one letter.

As you might expect, the minimum-length word chain is the shortest word chain that connects the start word to the end word.

When your program starts, it should present a '?' (question-mark character) and a space as a prompt to the user. It should accept the following commands and perform the corresponding actions:

  • add-dictionary filename
    Adds all the words in filename to the current dictionary (the current dictionary starts out empty). filename is presumed to be a text file containing an arbitrarily-long alphabetical list of words (that is, sequences of alphanumeric characters) separated by newline.
    Prints "Added filename" on success, "Error adding filename" on failure.
  • connect word1 word2
    Prints a minimum-length word chain connecting word1 to word2 in the current dictionary, with words separated by a space, the `>' symbol, and another space. If no word chain exists that connects the two words, the program should print "No way to connect word1 to word2." (Note that it will always print this whenever either of the given words is not in the dictionary or if the two words are not of the same length.)
  • exit
    Quits the program.

After it finishes executing any command other than exit, your program should print another prompt on a new line.

For example, suppose I have a file "sample-dict" with the following contents:

boat
book
boot
coat
hook
look
rake
take

An interaction might look like this:

? add-dictionary sample-dict
Added sample-dict
? connect coat hook
coat > boat > boot > book > hook
? connect boot hook
boot > book > hook
? connect boat take
No way to connect boat to take.
? exit

Scoring

PFC-4 entries will be scored on three metrics:

  • correctness (5 points)
    Your program must work exactly as I have described it. I will design test cases and let you know if your program passes them.
  • elegance (5 points)
    Your program should be easy to read and understand. That means making it small, using appropriate function names, variable names, and comments, choosing lucid algorithms, and organizing source code appropriately.
  • algorithmic efficiency (5 points)
    Your program should aim to do as little work as it can in producing its results. Here I am not concerned with the microefficiency of each individual instruction so much as the overall efficiency of the algorithms you choose.

That's the challenge. To enter, follow the usual procedure and post your program as a reply to this diary (see tmoertel's How to Post Code to K5 for a way to preserve your formatting). After a week, I will judge all entries and announce the winners. If you'd like to be notified by e-mail when that announcement happens (and for future PFC announcements) sign up for the PFC mailing list by sending a blank e-mail to:
<pfc-announce-subscribe@moertel.com>

Now get coding!

Sponsors

Voxel dot net
o Managed Hosting
o VoxCAST Content Delivery
o Raw Infrastructure

Login

Poll
How tough is this PFC?
o Too tough 4%
o Too easy 16%
o A good challenge 80%

Votes: 25
Results | Other Polls

Related Links
o How to Post Code to K5
o jacob's Diary


Display: Sort:
Programming Fun Challenge 4 | 219 comments (219 topical, editorial, 0 hidden)
neat (5.00 / 1) (#1)
by hurstdog on Tue Apr 23, 2002 at 11:14:11 AM EST

Wish I could find a reason to spend time working on that instead of fixing bugs in scoop or working on my compiler :-) Little programming challenges like this make programming fun.



Anybody who is entering, post in this thread (5.00 / 1) (#3)
by tmoertel on Tue Apr 23, 2002 at 11:34:05 AM EST

That way jacob will know how many entries to expect.

P.S. I'm in! I'll produce a Haskell entry and maybe a Perl entry. I'm going to be bogged down for the next few days, though, and so my entries may take a while to show up.

--
My blog | LectroTest

[ Disagree? Reply. ]


Question (5.00 / 1) (#5)
by bob6 on Tue Apr 23, 2002 at 11:53:32 AM EST

What if there is several solutions ? Should the program report them all ?

Cheers.
Question (5.00 / 1) (#6)
by matthewg on Tue Apr 23, 2002 at 11:53:47 AM EST

For connect a b, will a and b always be the same length? If not, does removing or adding a letter count as a transition? Ferinstance, if we have "dog" and "dogs" in the dict, are those two words considered adjacent?

Trivial. (5.00 / 1) (#8)
by ambrosen on Tue Apr 23, 2002 at 11:58:05 AM EST

If you do it in a .pl file. Prolog that is. I may have a quick go.

--
Procrastination does not make you cool. Being cool makes you procrastinate. DesiredUsername.
Can we please have simplier input (4.00 / 1) (#12)
by Cal Bunny on Tue Apr 23, 2002 at 12:17:53 PM EST

A small request for future PFCs: can we please leave the input to simplier means. Half of my code is going to be for a lame command handler. These things are supposed to be fun (and simple), having to read and parse command input as well as do error detection does not fall into either of those categories for me.

Also, might I seggest changing the output format to match the input format. This would simplify things a little, and it would make the output much easier to read and automatically parse.

^cb^
Kudos to you for warping my fragile little mind. - communist

Clarification (4.00 / 1) (#13)
by Cal Bunny on Tue Apr 23, 2002 at 12:20:00 PM EST

What does it mean to differ by one letter? Am I understanding properly that insertions and deletions do not count?

^cb^
Kudos to you for warping my fragile little mind. - communist
Disjoint input dictionaries? (5.00 / 1) (#22)
by tmoertel on Tue Apr 23, 2002 at 12:51:58 PM EST

Can we assume that the sets of words in the input dictionaries are disjoint, or should we be prepared to handle duplicate words?

--
My blog | LectroTest

[ Disagree? Reply. ]


Python solution (5.00 / 3) (#25)
by MK77 on Tue Apr 23, 2002 at 01:04:42 PM EST

#!/usr/bin/env python

import copy
import sys

metawords = {}
words = {}

path_not_found = "path not found"

def link_metaword(metaword, word):
        if metaword not in metawords:
                metawords[metaword] = [ word ]
        else:
                for neighbor in metawords[metaword]:
                        words[neighbor].append(word)
                        words[word].append(neighbor)
                metawords[metaword].append(word)

def add_word(word):
        if word in words:
                return

        words[word] = []
        for i in range(len(word)):
                metaword = word[:i] + '_' + word[i + 1:]
                link_metaword(metaword, word)

def find_path(start, end):
        if start not in words:
                raise path_not_found
       
        paths = [ (start, [ start ]) ]
        visited = {}

        while paths:
                (word, path) = paths[0]
                paths = paths[1:]
                if word == end:
                        return path
               
                visited[word] = 1
                for neighbor in words[word]:
                        if neighbor not in visited:
                                paths.append((neighbor, copy.copy(path) + [ neighbor ]))

        raise path_not_found

def add_dict(filename):
        f = open(filename)
        dict_words = f.readlines()
        f.close()

        for word in dict_words:
                word = word[:-1]
                add_word(word)

def main_loop():
        while 1:
                print "? ",
                line = sys.stdin.readline()
                line = line[:-1]
                cmd_words = line.split(' ')

                cmd = cmd_words[0]
                if cmd == "exit":
                        return
                if cmd == "add-dictionary":
                        add_dict(cmd_words[1])
                        print "Added " + cmd_words[1]
                if cmd == "connect":
                        try:
                                path = find_path(cmd_words[1], cmd_words[2])
                                for i in range(len(path)):
                                        if i > 0:
                                                print ">",
                                        print path[i],
                                print "\n",
                        except path_not_found:
                                print "No way to connect " + cmd_words[1] + " to " + cmd_words[2] + "."

main_loop()



--
Mmm... rageahol
perl implementation (5.00 / 2) (#27)
by matthewg on Tue Apr 23, 2002 at 01:20:05 PM EST

Here's my first go at it.

#!/usr/bin/perl

use strict;
use warnings;
our(@dict, %adjacents, %seen);
sub find_adjacent($);
sub connect_words($$);
sub build_adjacents();

# To find the shortest path between two words,
# we first get a list of all words which are
# adjacent to the target word.  We keep doing this,
# making sure not to revisit any words, until we've
# exhausted all the possibilities.  Then we check the
# length of all paths which worked.

sub find_adjacent($) {
    my($a) = shift;
    my @ret = ();
    my @a = split(//, $a);

    foreach my $word(@dict) {
        next if $word eq $a;
        next if length($word) != length($a);
        my $diffs = 0;
        my @word = split(//, $word);
        foreach my $letter_a(@a) {
            my $letter_b = shift @word;
            $diffs++ if $letter_a ne $letter_b;
            last if $diffs > 1;
        }
        push @ret, $word if $diffs == 1;
    }
    return @ret;
}

sub connect_words($$) {
    my($a, $b) = @_;
    my @paths = ();

    return if $seen{$a}++; # Don't go backwards
    return unless @{$adjacents{$a}}; # Nothing is adjacent to $a?

    # This is the last step
    return [$a, $b] if grep { $_ eq $b } @{$adjacents{$a}};

    # Darn, not the last step - find all the ways to get there.
    foreach my $neighbor(@{$adjacents{$a}}) {
        push @paths, map { [$a, @$_] if $_ } connect_words($ neighbor, $b);
    }

    return @paths;
}

sub build_adjacents() { %adjacents = map { $_ => [find_adjacent($_)] } @dict; }

@dict = ();

while(1) {
    print "? ";
    my $input = <STDIN>;
    chomp $input;
    if($input =~ /^exit$/) {
        exit;
    } elsif($input =~ /^add-dictionary\s+(.+)/) {
        my $file = $1;
        if(open(FILE, "<", $file)) {
            my @lines = <FILE>;
            map { chomp $_ } @lines;
            push @dict, @lines;
            build_adjacents();
            print "Added $file\n";
        } else {
            print "Error adding $file\n";
        }
        close FILE;
    } elsif($input =~ /^connect\s+(\S+)\s+(\S+)$/) {
        if(!@dict) {
            print "You must specify a dictionary f irst!\n";
            next;
        }

        my($a, $b) = ($1, $2);
        %seen = ();
        my @paths = connect_words($a, $b);
        if(@paths) {
            my $best_path = undef;
            foreach my $path(@paths) {
                next if $best_path and @ $path > @$best_path;
                $best_path = $path;
            }
            print join(" > ", @$best_path), "\n";
        } else {
            print "No way to connect $a to $b.\n";
        }
    }
}


stuff (5.00 / 1) (#29)
by trhurler on Tue Apr 23, 2002 at 01:56:54 PM EST

I have a command handler ready for this. Frankly, I wish you'd keep this format for future contests, as I can then reuse the command handler with truly minimal modifications. Anyone bitching about having to write one is either too lazy for words or else incompetent; it didn't take me long, and mine is probably better than what you'll write anyway(better error checking, handles a few odd cases gracefully, etc.)

This isn't too easy, too hard, or anything like that. The only problem I see is that you've (almost certainly deliberately) chosen a problem that's trivial in your favorite language and requires significant work in many others. Since you seem to have this habit, I am going to assemble a small library of list handling code in C for use in your contests. Hopefully you won't count this against my program length, for two reasons. One, because list handling libraries exist out there, and I'm just the sort of bastard who'd RATHER rewrite it than search for it - in other words, a typical C programmer would not have to write this code to complete your program. Two, because as I said, you obviously biased the whole thing in favor of languages with particular elements in their provided runtimes.

For anyone who is having trouble thinking about this problem, once you have the right data structures and appropriate methods for operating on them, the loading of data is straightforward and the solution of the problem can be one big stupid recursion. There are almost certainly more optimal ways to do it than the one big stupid recursion, but that method will work, which I'm sure is why jacob chose this particular exercise. I'm thinking of implementing the one big stupid recursion and then trying to do something better for grins, if I have time.

And when I say "one big stupid recursion" I do not mean to imply that recursion is stupid. I mean to imply that the method of solution comprised by this particular recursion is something of a blunt instrument - a brute force approach of extraordinary magnitude. The sort of solution I always sought and then tried to beat in college programming exercises. (I always did beat it back then, but admittedly I had lots of free time, and the resulting programs were rarely what you'd call "straightforward" A well documented heuristic is still a wild ass guess, after all, and sometimes the fastest algorithm is not the one you'd want to teach in class.)

--
'God dammit, your posts make me hard.' --LilDebbie

Damn you :-) (5.00 / 1) (#30)
by kentm on Tue Apr 23, 2002 at 02:02:19 PM EST

I should never have clicked on your text ad. Now I want to work on this instead of my real work. I'm going to attempt a Java version for this.

efficiency question (4.00 / 1) (#34)
by Cal Bunny on Tue Apr 23, 2002 at 02:54:27 PM EST

How are you going to determine efficiency? Are you counting dictionary load time or just query time. Should we be efficient for a single query or for multiple queries (basically, how much preprocessing of the input should we be doing)?

^cb^
Kudos to you for warping my fragile little mind. - communist
One indexing method (5.00 / 1) (#36)
by KWillets on Tue Apr 23, 2002 at 03:15:32 PM EST

Try forming sets of words which differ by one character, by deleting one character, eg:

cat --> c_t
cut --> c_t
cot --> c_t
(These all map to the "c_t" set.)

Map each word of length k to k different sets, one for each character, i.e:

cat --> _at
cat --> c_t
cat --> ca_
Given a word it's easy to look up the k sets of words one hop away, and recurse.

I'm looking at more global ways of doing this. If anyone has stats on the sparsity of the links, etc., let me know.

C++/STL implementation (5.00 / 3) (#37)
by ucblockhead on Tue Apr 23, 2002 at 03:16:45 PM EST

Here's an implementation in C++, using the STL. (Sad that I have enough freetime to do this, but oh well.) It's pretty straightforward and unoptimized. #include <string>
#include <vector>
#include <iostream>
#include <fstream>
using namespace std;

class WordList;
class Word
{
public:
    Word(const string& aText) : myText(aText) {}

    inline const string& GetText() const { return myText; }

    static bool IsOkCnxn(const string& str1, const string& str2);
    void MakeConnections(const WordList& aWordList);
    void Connect(vector<string>& result, vector<string>&current, const string& aTarget) const;

private:
    vector<Word*> myTargets;
    string myText;
};

class WordList : public vector<Word*>
{
public:
    ~WordList();
    void AddWord(const string& str);
    void Connect(vector<string>& result, string aStart, string anEnd);
    const Word* GetWord(const string& aWord) const;
};

const Word* WordList::GetWord(const string& aWord) const
{
    for(vector<Word*>::const_iterator it = begin();it!=end();it++)
        if( *it && (*it)->GetText() == aWord )
            return *it;
    return NULL;
}

void WordList::Connect(vector<string>& result, string aStart, string anEnd)
{
    if( const Word* word = GetWord(aStart) ) {
        word->Connect(result, vector<string>(), anEnd);
    }
}

WordList::~WordList()
{
    for(vector<Word*>::iterator it = begin();it!=end();it++)
        delete *it;
}

void WordList::AddWord(const string& str)
{
    push_back(new Word(str));
}


bool Word::IsOkCnxn(const string& str1, const string& str2)
{
    if( str1.size() == str2.size() ) {
        int Diffs = 0;
        for(int i=0;Diffs < 2 && i < str2.size();i++)
            if( str1[i] != str2[i] )
             &n bsp;  if( ++Diffs > 1 )
             &n bsp;      return false;

        return (Diffs == 1);
    }
    return false;
}

void Word::MakeConnections(const WordList& aWordList)
{
    for(WordList::const_iterator it=aWordList.begin();it!=aWordList.end();it++) {
        if( IsOkCnxn((*it)->GetText(), myText) ) {
            myTargets.push_back(*it);
        }
    }
}

void Word::Connect(vector<string>& result, vector<string>&current, const string& aTarget) const
{
    current.push_back(myText);

    for(vector<Word*>::const_iterator it=myTargets.begin();it!=myTargets.end();it++) {

        if( aTarget != myText ) {

            bool isAlreadyInList=false;
            for(vector<string>::iterator jt=current.begin();jt!=current.end() && !isAlreadyInList;jt++)
             &n bsp;  if( *jt == (*it)->GetText() )
             &n bsp;      isAlreadyInList = true;

            if( !isAlreadyInList )    {
             &n bsp;  (*it)->Connect(result, current, aTarget);
            }
        }
        else if( aTarget == myText &&
             &n bsp;   (result.size() == 0 || current.size() < result.size() ) )
            result = current;
    }

    current.pop_back();
}

int main()
{
    WordList someWords;

    for(;;)
    {
        cout << "? ";

        string command;
        cin >> command;

        if( command == "exit" )
            return 0;
        else if( command == "add-dictionary") {

            string dict;
            cin >> dict;

            if( ifstream in(dict.c_str()) )
            {
             &n bsp;  string word;

             &n bsp;  while( !in.eof() ) {
             &n bsp;      in >> word;
             &n bsp;      if( !word.empty() )
             &n bsp;          someWords.AddWord(word);
             &n bsp;  }

             &n bsp;  for(WordList::iterator it=someWords.begin();it!=someWords.end();it++)    {
             &n bsp;      (*it)->MakeConnections(someWords);
             &n bsp;  }
            }
        }
        else if( command == "connect" ) {

            string src;
            cin >> src;
            string trgt;
            cin >> trgt;

            vector<string> result;
            someWords.Connect(result, src, trgt);

            if( result.size() == 0 )
             &n bsp;  cout << "No way to connect " << src << " and " << trgt << "\n";
            else {
             &n bsp;  for(vector<string>::iterator jt = result.begin();jt!=result.end();jt++) {

             &n bsp;      if( jt != result.begin() )
             &n bsp;          cout << " > ";
             &n bsp;      cout << *jt;
             &n bsp;  }
             &n bsp;  cout << "\n";
            }
        }
    }

    return 0;
}

-----------------------
This is k5. We're all tools - duxup
yet another perl (5.00 / 3) (#42)
by wintergreen on Tue Apr 23, 2002 at 03:39:36 PM EST

use strict;

my @DICT;

my $inp;
do {
    print "? ";
    $inp = <>;
    if( $inp =~ /^add-dictionary (.*)$/ ) {
        open FH, $1 || die "Could not open '$1' for reading\n";
        my @newwords = <FH>;
        chomp for @newwords;
        @DICT = (@DICT, @newwords);
    }
    elsif( $inp =~ /^connect (.*?) (.*)$/ ) {
        do_connect($1,$2);
    }
    elsif( $inp !~ /^exit/ ) {
        print "Invalid input.\n";
    }
} while( $inp !~ /^exit/ );

# branch and bound
sub do_connect {
    my( $w1, $w2 ) = @_;

    unless( length($w1) eq length($w2) ) {
        print "No way to connect $w1 to $w2.\n";
        return;
    }

    my @pathlist = ([$w1]);
    do {
        if( $pathlist[0][-1] eq $w2 ) {
            print join(" > ", @{$pathlist[0]}), "\n";
            return;
        }
        my @newpaths = add_path( shift @pathlist );
        push(@pathlist, @newpaths) if @newpaths;
        @pathlist = sort { scalar(@$a) <=> scalar(@$b) } @pathlist;
    } while( @pathlist );

    print "No way to connect $w1 to $w2.\n";
}

sub add_path {
    my $path = shift;
    my @newpaths;

    $" = '|';
    my $used = qr{^(@$path)$};
    my %difs;
    my @end = split / */, $path->[-1];
WORD:  foreach my $word (@DICT) {
        next if $word =~ $used;
        my @word = split / */, $word;
        for my $i ( 0..$#word ) {
           $difs{$word}++ if ($word[$i] ne $end[$i]);
            if( $difs{$word} > 1 ) {
            delete $difs{$word};
            next WORD;
            }
        }
    }
    foreach my $newword ( keys %difs ) {
        my @newpath = ( @$path, $newword );
        push @newpaths, \@newpath;
    }
    return @newpaths;
}


Is case significant for determining adjacency? (5.00 / 1) (#43)
by tmoertel on Tue Apr 23, 2002 at 03:46:42 PM EST

For example, are the words "A" and "a" considered to be identical or one step apart?

--
My blog | LectroTest

[ Disagree? Reply. ]


Question (5.00 / 1) (#46)
by trhurler on Tue Apr 23, 2002 at 04:49:46 PM EST

Are you only allowed to use connect commands with words in the dictionaries provided in advance? If this is not the case, then am I allowed to add words you put in connect commands to my internal dictionary?

--
'God dammit, your posts make me hard.' --LilDebbie

Test data (5.00 / 2) (#58)
by djotto on Wed Apr 24, 2002 at 06:39:48 AM EST

I just stripped 3000-odd four-letter words out of Webster's 1913. The file is available here



Ok, trying Haskell. (5.00 / 3) (#59)
by i on Wed Apr 24, 2002 at 06:42:46 AM EST

I'm still learning the idioms. Forgive me, tmoertel :)

================ ladder.hs =================
module Main where

import Set
import FiniteMap
import IO
import BFS

type SplitString = (String, String)
type Dict = FiniteMap SplitString [String]

splits :: String -> [SplitString]
splits [_] = [("", "")]
splits (x:xs) = ("",xs) : map addx (splits xs) where
  addx (u,v) = (x:u,v)

splitString :: String -> SplitString -> Dict -> Dict
splitString w u fm = addToFM_C (flip (++))  fm u [w]

addString :: String -> Dict -> Dict
addString w fm = foldr add1 fm (splits w) where
  add1 u fm = splitString w u fm

ioAddDict :: FilePath -> Dict -> IO Dict
ioAddDict path dict = do
  handle <- openFile path ReadMode
  dict' <- ioAddDict' handle dict
  hClose handle
  return dict'
  where
  ioAddDict' :: Handle -> Dict -> IO Dict
  ioAddDict' handle dict =
      do
          eof <- hIsEOF handle
          if eof
              then return dict
              else do
                  word <- hGetLine handle
                  let dict' = word `seq` dict `seq` addString word dict
                  dict'' <- ioAddDict' handle dict'
                  return dict''

tryAddDict :: FilePath -> Dict -> IO Dict
tryAddDict path dict = do
  res <- try (ioAddDict path dict)
  case res of
      (Left _) -> do
          putStrLn $ "Error adding " ++ path
          return dict
      (Right dict') -> return dict'
                                             
findPath :: String -> String -> Dict -> [String]
findPath from to dict = findChain (adjacents dict) from to

adjacents :: Dict -> String -> [String]
adjacents fm w = setToList $ foldl addToSet emptySet adj where
  adj = just $ map (lookupFM fm) (splits w)
  just [] = []
  just (Nothing:xs) = just xs
  just ((Just x):xs) = x ++ just xs


mainLoop :: Dict -> IO Dict
mainLoop dict = do
  putStr "? "
  hFlush stdout
  res <- try getLine
  case res of
      Left  _            -> return dict
      Right cmdline -> doCmd (words cmdline) dict

doCmd :: [String] -> Dict -> IO Dict
doCmd ['e':_] d = return d
doCmd ['a':_, w] d = tryAddDict w d >>= mainLoop
doCmd ['?':_, w] d =
      do
          print (adjacents d w)
          mainLoop d
doCmd [w1, w2] d = doCmd ["c", w1, w2] d
doCmd ['c':_, w1, w2] d = do
  let path = findPath w1 w2 d
  case path of
      [] -> putStrLn $ "No way to connect " ++ w1 ++ " to " ++ w2 ++ "."
      x    -> putStrLn $ showPath x
  mainLoop d
doCmd _ d = putStrLn "Bad command" >> mainLoop d

showPath :: [String] -> String
showPath [x] = x
showPath (x:xs) = x ++ " > " ++ showPath xs

main :: IO ()
main = do
  let dict = emptyFM
  mainLoop dict
  hFlush stdout
  return ()

==================== BFS.hs ==============
module BFS (findChain) where

import qualified Set as S
import qualified SimpleQueue as Q

-- Parented tree

data PTree a = PNil | PTree (PTree a) a [PTree a]

findChain :: Ord a => (a->[a]) -> a -> a -> [a]
findChain f from to = reverse $ first chain where
  chain = dropWhile notFound $ bfsPaths $ toPTree f from
  first (x:_) = x
  first [] = []
  notFound [] = True
  notFound xs = to /= head xs

toPTree :: (a->[a]) -> a -> PTree a
toPTree f a = toPTree' f PNil a where
  toPTree' f p a = tree where
      tree = PTree p a trees
      trees = map (toPTree' f tree) (f a)

toParent :: PTree a -> [a]
toParent PNil = []
toParent (PTree p a _) = a:toParent p

bfsPaths :: Ord a => PTree a -> [[a]]
bfsPaths PNil = []
bfsPaths tree@(PTree parent node branches) = map toParent paths where
  (paths,_) = bfs (Q.single tree) (S.emptySet)
  bfs :: Ord a => Q.Seq (PTree a) -> S.Set a -> ([PTree a], S.Set a)
  bfs queue set =
      case Q.lview queue of
          Q.Just2 (tree@(PTree p n bs)) queue' ->
              if S.elementOf n set
                  then bfs queue' set
                  else (tree : trees, set'') where
                      (trees, set'') = bfs (foldr (flip Q.snoc) queue' bs) set'
                      set' = S.addToSet set n
          Q.Just2 (PNil) queue' -> bfs queue' set
          Q.Nothing2 -> ([], set)

It is reasonably fast under ghc -O2, a bit slower under ghci, and crashes hugs :(

I may try to tidy it up later.

and we have a contradicton according to our assumptions and the factor theorem

My Haskell implementation (5.00 / 2) (#65)
by tmoertel on Wed Apr 24, 2002 at 11:47:00 AM EST

This one uses a number of optimizations to compute the connections quickly. For example, on a Celeron 433, it takes 2.6 seconds to compute the answer
masts > pasts > posts > hosts > hoots > hooks > books
when given the input
add-dictionary /usr/share/dict/words
connect masts books
exit
via
time ./tgm-word-connect < test1.tst

Here it is:

-- TGM 20020424
--
-- PFC 4 entry
--
-- Optimizations:
--
-- * Partitions the word database by word length so that only
--   words of the desired length need be searched.
-- * Builds DB partitions only when needed (i.e., lazily).
-- * Adds new dictionaries to DB via quick incremental merge.
-- * Uses BFS to find shortest path without having to consider
--   potentially longer solutions.
-- * Uses word "trace" (in DB) to rapidly compute BFS neighbors.

module Main (main) where

import List
import Array
import Maybe (fromJust)
import StrictFiniteMap
import IO
import EdisonPrelude
import qualified SimpleQueue as Q


newtype WordDatabase     =  WD (WordArray, WordToIndex, TraceToIndices)
type WordArray           =  Array Int String       -- 33    => "cat"
type WordToIndex         =  FiniteMap String Int   -- "cat" => 33
type TraceToIndices      =  FiniteMap String [Int] -- "c.t" => [33, 329, ...]
emptyDatabase            =  WD (array (0,-1) [], emptyFM, emptyFM)

type LPWordDatabase      =  FiniteMap Int WordDatabase
emptyLPDatabase          =  emptyFM :: LPWordDatabase


main                     :: IO ()
main                     =  commandLoop emptyLPDatabase

commandLoop              :: LPWordDatabase -> IO ()
commandLoop lpwdb        =  prompt >> getLine >>= \cmd -> case words cmd of
    ["exit"]             -> return ()
    ["connect", a, b]    -> putStrLn (connect lpwdb a b) >> commandLoop lpwdb
    [_,dictFile]         -> addDictFile dictFile >>= commandLoop
    _                    -> putStrLn ("Illegal command: " ++ cmd)
    where
    prompt               =  putStr "? " >> hFlush stdout
    addDictFile df       =  catch tryDict failDict
        where
        tryDict          =  do
                            contents <- readFile df
                            putStrLn ("Added " ++ df)
                            return (addDict lpwdb (words contents))
        failDict         =  const $ do
                            putStrLn ("Error adding " ++ df)
                            return lpwdb


connect                  :: LPWordDatabase -> String -> String -> String
connect lpwdb a b        =  connect1 db a b
    where
    db                   =  lookupWithDefaultFM lpwdb emptyDatabase (length a)

connect1                 :: WordDatabase -> String -> String -> String
connect1 (WD (warry, widx, ttis)) a b
                         =  case (getIdx a, getIdx b) of
                                (Nothing, _)         -> noPath
                                (_, Nothing)         -> noPath
                                (Just aix, Just bix) -> showPath (bfs bix aix)
    where

    getIdx               =  lookupFM widx

    noPath               =  "No way to connect " ++ a ++ " to " ++ b ++ "."
    showPath []          =  noPath
    showPath ps          =  concat . intersperse " > " . map (warry!) $ ps

    -- standard BFS: C = closed set, O = open set

    bfs start goal       =  bfs' emptyFM (Q.single (start, start)) goal
    bfs' c o goal        =  case Q.lview o of
        Nothing2         -> []
        Just2 (v,vp) o'  -> if v == goal then walkPath
                            else if elemFM v c then bfs' c o' goal -- closed
                            else bfs' c' o'' goal -- node is open => follow it
            where
            c'           =  addToFM c v vp
            o''          =  foldl' Q.snoc o' [(vc,v) | vc <- vcs, vc /= v]
            vcs          =  concatMap (lookupFMe ttis) (traces (warry ! v))
            walkPath     =  v : takeWhile (v /=) (walkPath' vp)
            walkPath' n  =  n : if np == n then [] else walkPath' np
                            where np = lookupFMe c n


addDict                  :: LPWordDatabase -> [String] -> LPWordDatabase
addDict lpwdb ws         =  plusFM_C mergeDB lpwdb lpwdb1
    where
    lpws                 =  keyedGroups length ws
    lpwdb1               =  mapFM (\_ xs -> addDict1 emptyDatabase xs) lpws
    mergeDB a (WD(bwarry,_,_)) = addDict1 a (elems bwarry)

addDict1                 :: WordDatabase -> [String] -> WordDatabase
addDict1 (WD (warry, widx, ttis)) ws
                         =  WD (warry', widx', ttis')
    where
    ws'                  =  elems warry ++ ws
    base                 =  sizeFM widx
    warry'               =  array (1, base + length ws) (zip [1..] ws')
    widx'                =  addListToFM widx (zip ws [base+1..])
    ttis'                =  addListToFM_C (++) ttis (concatMap w2ts ws)
        where
        w2ts w           =  map (flip (,) [lookupFMe widx' w]) (traces w)


-- UTILITIES

traces                   :: String -> [String]
traces []                =  []
traces (x:xs)            =  ('.':xs) : map (x:) (traces xs)

lookupFMe                :: Ord a => FiniteMap a b -> a -> b
lookupFMe                =  (fromJust.) . lookupFM

foldl' _ z []            = z
foldl' f z (x:xs)        = (foldl' f $! f z x) xs

keyedGroups           :: (Ord a, Ord key) =>
                             (a -> key) -> [a] -> FiniteMap key [a]
keyedGroups keyFn     =  addListToFM_C (flip (++)) emptyFM
                      .  map ( \x -> (keyFn x, [x]) )

--
My blog | LectroTest

[ Disagree? Reply. ]


Request (5.00 / 2) (#70)
by jacob on Wed Apr 24, 2002 at 12:13:38 PM EST

When you post a solution, please post brief instructions for how to run it. This is particularly important for languages with more than one interpreter/compiler: if you're using C++, for example, whether you compiled with gcc or VC++ or something else might make a difference. It's not a big deal -- I can figure it out if you don't tell me -- but things will run smoother if you just say up front.

By the way: what do you think of the text ad? I bought it with the intent of attracting new people to the contest, and it appears to have done so; is it worth continuing to buy them for future contests as well?

--
"it's not rocket science" right right insofar as rocket science is boring

--Iced_Up

As long as it's written... (5.00 / 2) (#71)
by kuran42 on Wed Apr 24, 2002 at 12:59:17 PM EST

My Python (2.2) version, straight boring brute force. If I can get my brain in gear I might try a GA solution.

import string, sys

class Words:
    def __init__(self):
        self.words = {}

    def hasWord(self, word):
        return word and self.words.has_key(len(word)) and word in self.words[len(word)][word[0]]

    def addDictionary(self, input):
        try:
            for i in map(string.lower, map(string.strip, open(input).readlines())):
                if not self.words.has_key(len(i)):
                    self.words[len(i)] = dict(map(lambda x: (x, []), string.lowercase))
                self.words[len(i)][i[0]].append(i)
        except:
            print 'Error adding %s' % input
        else:
            print 'Added %s' % input

    def connectWords(self, a, b):
        self.tried, self.destination = {a.lower() : 0}, b.lower()

        if not self.hasWord(a) or not self.hasWord(b) or len(a) != len(b):
            print 'No way to connect %s to %s.' % (a, b)
            return

        x = self.connectWork(a.lower())
        if len(x):
            print ' > '.join(x)
        else:
            print "No way to connect %s to %s." % (a, b)

    def connectWork(self, a):
        prospects = {}

        for i in range(len(a)):
            for j in string.lowercase:
                m = '%s%s%s' % (a[:i], j, a[i + 1:])
                if m == self.destination:
                    return [a, m]
                elif self.hasWord(m) and m not in self.tried and m != a:
                    prospects[m] = 0

        self.tried.update(prospects)

        if len(prospects):
            x = filter(None, map(self.connectWork, prospects.keys()))
            if x:
                y = map(len, x)
                z = min(y)
                if z:
                    return [a] + x[y.index(z)]
        return []

def mainLoop():
    words = Words()
    COMMANDS = {
        'add-dictionary': (1, words.addDictionary),
        'connect': (2, words.connectWords),
        'exit': (0, lambda: sys.exit(0))
    }

    while 1:
        x = raw_input('? ').split()
        if len(x):
            if x[0].lower() in COMMANDS:
                if COMMANDS[x[0]][0] != len(x) - 1:
                    print 'Incorrect argument count'
                else:
                    apply(COMMANDS[x[0].lower()][1], tuple(x[1:]))
            else:
                print 'Invalid command'

if __name__ == '__main__':
    mainLoop()


--
kuran42? genius? Nary a difference betwixt the two. -- Defect

Two things (5.00 / 1) (#78)
by trhurler on Wed Apr 24, 2002 at 03:10:24 PM EST

First, what should the program output if you put in "connect weasel weasel" assuming "weasel" is in the dictionary? I can think of three options, but I'm not sure which you want.

Second, my entry is progressing nicely. At this point, other than the above question and some debugging, the only bit that isn't finished is the actual interesting case, which admittedly is the minority of the code for this thing written in C:) (I've been doing this in my spare moments.)

--
'God dammit, your posts make me hard.' --LilDebbie

Incredibly familiar (5.00 / 2) (#84)
by X3nocide on Wed Apr 24, 2002 at 11:32:39 PM EST

In my Discrete Math homework I just worked on, one of the concepts dealt with was the n dimensional cube and a perculiar definition of it as a graph where each node represents a n digit sequence and is connected to every node that differs by 1 digit. The intersting thing about it is that the shortest path between two nodes is guarenteed to be at most n. Of course this could be a problem if there aren't enough connections.

Just thought I'd throw out an interesting idea to try if getting the challenge done isn't hard enough, and to see what other people think about optimal representation.

pwnguin.net

Prove I can't code (5.00 / 2) (#86)
by alge on Thu Apr 25, 2002 at 12:13:49 AM EST

(can't code but trying to learn) (=
Hmm. My algos are pretty damned stupid and I don't know ruby too well..
but here you are (:

#!/usr/bin/ruby

class WordConnect

    def initialize
        @done = false
        @dict = Array.new
        until @done
            read_input
        end
    end

    def read_input
        puts "? "
        line = gets
        case line.split[0]
        when "add-dictionary"
            if add_dict( line.split[1] )
                puts "Added dictionary"
            else
                puts "Error adding " + line.split[1]
            end
        when "connect"
            puts connect( line.split[1], line.split[2] )
        when "exit"
            @done = true
        end
    end

    def add_dict( filename )
        begin
            File.open( filename ) { |file|
                file.each_line { |line|
                    @dict << line.split[0]
                }
            }
        rescue
            return false
        end
        return true
    end

    def connect( start, stop )
        @could_not_connect = "Could not connect " + start + " and " + stop + "."
        unless start.length == stop.length and @dict.include?(start) and @dict.include?(stop)
            return @could_not_connect
        end
       
        # Remove all elements from our dictionary witch don't match
        # the length of start (or stop, they're equally long).
        array = @dict.dup
        @dict.each { |element|
            unless element.length == start.length
                array = array - element.to_a
            end
        }

        @maxlevel = 1;
        until @maxlevel > 10
            if connect_recurse(array.dup, start, stop, 0)
                return start + " > " + @ret
            end
            @maxlevel = @maxlevel + 1
        end
        @could_not_connect
    end

    def match?( word1, word2 )
        diff = 0
        0.step(word1.length - 1, 1) { |index|
            unless word1[index] == word2[index]
                diff = diff.next
            end
        }
        if diff < 2 then true else false end
    end
   
    # recursive method.
    # first we remove prune from array,
    # then we iterate over array until we match stop
    def connect_recurse( array, prune, stop, level )
        @ret = ""
        level = level.next
        if level > @maxlevel then return false end
        array.delete(prune)
        array.each { |element|
            if match?(prune, stop)
                @ret = stop
                return true
            end
            if match?(element, prune)
                if connect_recurse(array, element, stop, level)
                    @ret = element + " > " + @ret
                    return true
                end
            end
        }
        return false
    end
end

WordConnect.new


vi er ikke lenger elsket her

my implementation in C (5.00 / 3) (#87)
by svillee on Thu Apr 25, 2002 at 12:22:23 AM EST

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_INPUT_LENGTH 4096
#define MAX_VERB_LENGTH 32
#define MAX_WORD_LENGTH 1024

typedef struct Word_s {
  struct Word_s  *next;
  struct Word_s  *left;
  struct Word_s  *right;
  int            balance;
  struct Links_s *links;
  char            name[1];
} Word_t;

typedef struct Link_s {
  struct Link_s *nextSameSource;
  Word_t        *destination;
} Link_t;

typedef struct Links_s {
  struct Links_s *next;
  Word_t        *source;
  Link_t        *firstLink;
} Links_t;

typedef struct Partial_s {
  struct Partial_s *next;
  Word_t          *word;
} Partial_t;

typedef enum {
  CHAIN_IMPOSSIBLE,
  CHAIN_TOO_SHORT,
  CHAIN_FOUND
} ChainResult_t;

static int      length;
static Word_t  *target1;
static Links_t *firstLinks;
static Word_t  *firstByLength[MAX_WORD_LENGTH+1];
static Word_t  *topByLength[MAX_WORD_LENGTH+1];
static char    input[MAX_INPUT_LENGTH+1];
static char    verb[MAX_VERB_LENGTH+1];
static char    word1[MAX_WORD_LENGTH+1];
static char    word2[MAX_WORD_LENGTH+1];

static Word_t *findWord(char *name)
{
  Word_t *wp;
  int m;

  length = strlen(name);
  wp = topByLength[length];
  while (wp != (Word_t *) 0) {
    m = strcmp(name, wp->name);
    if (m == 0)
      break;
    if (m < 0) {
      wp = wp->left;
    } else {
      wp = wp->right;
    }
  }
  return wp;
}

void insertNode(
  Word_t  *newNode,
  Word_t **where,
  int    *outerGrew)
{
  Word_t *nodeA;
  Word_t *nodeB;
  Word_t *nodeC;
  int innerGrew;

  nodeA = *where;
  if (nodeA == (Word_t *) 0) {
    /*  The tree was empty, so make the new node the only one.  */

    newNode->left = (Word_t *) 0;
    newNode->right = (Word_t *) 0;
    newNode->balance = 0;
    *where = newNode;
    *outerGrew = 1;
  } else if (strcmp(newNode->name, nodeA->name) < 0) {
    /*  Add the new node to the left subtree.  */

    insertNode(newNode, &nodeA->left, &innerGrew);
    if (!innerGrew) {
      *outerGrew = 0;
    } else if (nodeA->balance == 1) {
      /*  The right subtree height was 1 more, and now they are the same.  */

      nodeA->balance = 0;
      *outerGrew = 0;
    } else if (nodeA->balance == 0) {
      /*  The subtree heights were the same, and now the left subtree  */
      /*  height is 1 more.                                            */

      nodeA->balance = -1;
      *outerGrew = 1;
    } else {
      /*  The left subtree height was 1 more, and now it is 2 more,  */
      /*  so we must rebalance.  Examine the left subtree.          */

      nodeB = nodeA->left;
      nodeC = nodeB->right;
      if (nodeB->balance == -1) {
        /*      A              B      */
        /*    / \            / \    */
        /*    B  Z  --->  W  A    */
        /*  / \                / \  */
        /*  W  C              C  Z  */

        nodeA->left = nodeC;
        nodeA->balance = 0;
        nodeB->right = nodeA;
        nodeB->balance = 0;
        *where = nodeB;
      } else {
        /*      A              C      */
        /*    / \            /  \    */
        /*    B  Z  --->  B    A    */
        /*  / \            / \  / \  */
        /*  W  C          W  X Y  Z  */
        /*    / \                      */
        /*    X  Y                    */

        nodeB->right = nodeC->left;
        if (nodeC->balance == 1) {
          nodeB->balance = -1;
        } else {
          nodeB->balance = 0;
        }
        nodeA->left = nodeC->right;
        if (nodeC->balance == -1) {
          nodeA->balance = 1;
        } else {
          nodeA->balance = 0;
        }
        nodeC->left = nodeB;
        nodeC->right = nodeA;
        nodeC->balance = 0;
        *where = nodeC;
      }
      *outerGrew = 0;
    }
  } else {
    /*  Add the new node to the right subtree.  */

    insertNode(newNode, &nodeA->right, &innerGrew);
    if (!innerGrew) {
      *outerGrew = 0;
    } else if (nodeA->balance == -1) {
      /*  The left subtree height was 1 more, and now they are the same.  */

      nodeA->balance = 0;
      *outerGrew = 0;
    } else if (nodeA->balance == 0) {
      /*  The subtree heights were the same, and now the right subtree  */
      /*  height is 1 more.                                            */

      nodeA->balance = 1;
      *outerGrew = 1;
    } else {
      /*  The right subtree height was 1 more, and now it is 2 more,  */
      /*  so we must rebalance.  Examine the right subtree.          */

      nodeB = nodeA->right;
      nodeC = nodeB->left;
      if (nodeB->balance == 1) {
        /*      A              B      */
        /*    / \            / \    */
        /*    W  B  --->  A  Z    */
        /*      / \        / \      */
        /*      C  Z      W  C      */

        nodeA->right = nodeC;
        nodeA->balance = 0;
        nodeB->left = nodeA;
        nodeB->balance = 0;
        *where = nodeB;
      } else {
        /*      A              C      */
        /*    / \            /  \    */
        /*    W  B  --->  A    B    */
        /*      / \        / \  / \  */
        /*      C  Z      W  X Y  Z  */
        /*    / \                      */
        /*    X  Y                    */

        nodeB->left = nodeC->right;
        if (nodeC->balance == -1) {
          nodeB->balance = 1;
        } else {
          nodeB->balance = 0;
        }
        nodeA->right = nodeC->left;
        if (nodeC->balance == 1) {
          nodeA->balance = -1;
        } else {
          nodeA->balance = 0;
        }
        nodeC->left = nodeA;
        nodeC->right = nodeB;
        nodeC->balance = 0;
        *where = nodeC;
      }
      *outerGrew = 0;
    }
  }
}

static void doAddDictionary(void)
{
  FILE *fp;
  int grew;
  Links_t *links1;
  Links_t *links2;
  Link_t *link1;
  Link_t *link2;
  Word_t *wp;

  if (firstLinks != (Links_t *) 0) {
    /*  Discard any links calculated so far.  */

    links1 = firstLinks;
    while (links1 != (Links_t *) 0) {
      links2 = links1->next;
      link1 = links1->firstLink;
      while (link1 != (Link_t *) 0) {
        link2 = link1->nextSameSource;
        free(link1);
        link1 = link2;
      }
      links1->source->links = (Links_t *) 0;
      free(links1);
      links1 = links2;
    }
    firstLinks = (Links_t *) 0;
  }
  fp = fopen(word1, "r");
  if (fp == (FILE *) 0) {
    fprintf(stderr, "Error adding %s\n", word1);
    return;
  }
  while (fscanf(fp, "%1024s", word2) > 0) {
    /*  See if the word is already in the dictionary.  */

    if (findWord(word2) != (Word_t *) 0)
      continue;

    /*  Create the new word.  */

    wp = (Word_t *) malloc(sizeof (Word_t) + length - 1);
    if (wp == (Word_t *) 0) {
      fprintf(stderr, "out of memory for word\n");
      exit(-1);
    }
    wp->next = firstByLength[length];
    wp->links = (Links_t *) 0;
    strcpy(wp->name, word2);
    firstByLength[length] = wp;

    /*  Add it to the balanced tree.  */

    insertNode(wp, &topByLength[length], &grew);
  }
  fclose(fp);
  printf("Added %s\n", word1);
}

static void findLinks(Word_t *wp)
{
  char *sp;
  char *dp;
  Word_t *wp2;
  Link_t *link;
  Links_t *links;

  links = (Links_t *) malloc(sizeof (Links_t));
  if (links == (Links_t *) 0) {
    fprintf(stderr, "out of memory for links\n");
    exit(-1);
  }
  links->next = firstLinks;
  links->source = wp;
  links->firstLink = (Link_t *) 0;
  firstLinks = links;
  wp->links = links;

  /*  Look for any existing words that differ by one letter.  */

  wp2 = firstByLength[length];
  while (wp2 != (Word_t *) 0) {
    if (wp2 != wp) {
      sp = wp->name;
      dp = wp2->name;
      while (*sp == *dp) {
        ++sp;
        ++dp;
      }
      if (strcmp(sp + 1, dp + 1) == 0) {
        /*  They differ by one letter, so create a link.  */

        link = (Link_t *) malloc(sizeof (Link_t));
        if (link == (Link_t *) 0) {
          fprintf(stderr, "out of memory for link\n");
          exit(-1);
        }
        link->nextSameSource = links->firstLink;
        link->destination = wp2;
        links->firstLink = link;
      }
    }
    wp2 = wp2->next;
  }
}

static ChainResult_t chain(Partial_t *partial, int chainLength)
{
  int nextLength;
  ChainResult_t innerResult;
  ChainResult_t outerResult;
  Link_t *lp;
  Partial_t *ap;
  Partial_t nextPartial;

  nextLength = chainLength - 1;
  if (nextLength == 0) {
    lp = partial->word->links->firstLink;
    while (lp != (Link_t *) 0) {
      if (lp->destination == target1) {
        printf("%s", target1->name);
        ap = partial;
        while (ap != (Partial_t *) 0) {
          printf(" > %s", ap->word->name);
          ap = ap->next;
        }
        putchar('\n');
        return CHAIN_FOUND;
      }
      lp = lp->nextSameSource;
    }
    return CHAIN_TOO_SHORT;
  }
  outerResult = CHAIN_IMPOSSIBLE;
  nextPartial.next = partial;
  lp = partial->word->links->firstLink;
  while (lp != (Link_t *) 0) {
    nextPartial.word = lp->destination;
    if (nextPartial.word->links == (Links_t *) 0)
      findLinks(nextPartial.word);
    ap = partial;
    while (ap != (Partial_t *) 0) {
      if (ap->word == nextPartial.word)
        break;
      ap = ap->next;
    }
    if (ap == (Partial_t *) 0) {
      innerResult = chain(&nextPartial, nextLength);
      if (innerResult == CHAIN_FOUND)
        return CHAIN_FOUND;
      if (innerResult == CHAIN_TOO_SHORT)
        outerResult = CHAIN_TOO_SHORT;
    }
    lp = lp->nextSameSource;
  }
  return outerResult;
}

static void doConnect(void)
{
  int length1;
  int chainLength;
  ChainResult_t chainResult;
  Word_t *target2;
  Partial_t lastPartial;

  target1 = findWord(word1);
  length1 = length;
  target2 = findWord(word2);
  if ((target1 != (Word_t *) 0) &&
      (target2 != (Word_t *) 0) &&
      (length1 == length))
  {
    if (target1 == target2) {
      printf("%s\n", word1);
      return;
    }
    if (target2->links == (Links_t *) 0)
      findLinks(target2);
    if (target2->links->firstLink != (Link_t *) 0) {
      lastPartial.next = (Partial_t *) 0;
      lastPartial.word = target2;
      chainLength = 1;
      while (1) {
        chainResult = chain(&lastPartial, chainLength);
        if (chainResult == CHAIN_FOUND)
          return;
        if (chainResult == CHAIN_IMPOSSIBLE)
          break;
        ++chainLength;
      }
    }
  }
  printf("No way to connect %s to %s.\n", word1, word2);
}

int main(int argc, char **argv)
{
  int i;

  while (1) {
    printf("? ");
    input[0] = '\0';
    fgets(input, MAX_INPUT_LENGTH+1, stdin);
    verb[0] = '\0';
    word1[0] = '\0';
    word2[0] = '\0';
    sscanf(input, "%32s %1024s %1024s", verb, word1, word2);
    if (strcmp(verb, "add-dictionary") == 0) {
      doAddDictionary();
    } else if (strcmp(verb, "connect") == 0) {
      doConnect();
    } else if (strcmp(verb, "exit") == 0) {
      break;
    }
  }
  return 0;
}


My Slow Haskell Implementation (5.00 / 1) (#93)
by Logan on Thu Apr 25, 2002 at 03:40:19 AM EST

Here is my slow Haskell implementation. I believe it is somewhat slower than tmoertel's implementation. To compile, run
ghc -package data -O2 -o pfc4 pfc4.hs
My code could use a little commenting, and perhaps I will explain my method in a followup (although it is somewhat straight-forward).

module Main where

import FiniteMap
import IO
import List
import Maybe

type Dictionary            = FiniteMap String [(String, [String])]
type UberDictionary        = FiniteMap Int Dictionary

main                       :: IO ()
main                       = do parser (emptyFM)
                                return ()

prompt                     :: IO String
prompt                     = do putStr "? "
                                hFlush stdout
                                getLine

parser                     :: UberDictionary -> IO UberDictionary
parser s                   = do line <- prompt
                                runcom s line

runcom                     :: UberDictionary -> String -> IO UberDictionary
runcom udict line          | com == "add-dictionary" = do { udict' <- addDictionary udict args; parser udict' }
                           | com == "connect"        = do { udict' <- connect udict args; parser udict' }
                           | com == "exit"           = return udict
                           | otherwise               = do { putStrLn "Huh?"; parser udict }
                             where (com:args) = words line

addDictionary              :: UberDictionary -> [String] -> IO UberDictionary
addDictionary udict [file] = do h <- openFile file ReadMode
                                text <- hGetContents h
                                return (loadDict $ lines text)
addDictionary s _          = do { putStrLn "Huh?"; return s}

loadDict                   :: [String] -> UberDictionary
loadDict content           = foldl addWord (emptyFM) content

addWord                    :: UberDictionary -> String -> UberDictionary
addWord udict word         | elemFM len udict = addToFM udict len dict'
                           | otherwise        = addToFM udict len (addToFM (emptyFM) word [])
                             where len = length word
                                   (Just dict) = lookupFM udict len
                                   dict' = addToFM dict word []

connect                    :: UberDictionary -> [String] -> IO UberDictionary
connect udict [w1, w2]     | length w1 /= length w2 = failMessage
                           | not (elemFM w1 dict && elemFM w2 dict) = failMessage
                           | maybeCachedAnswer /= Nothing = do { putStrLn (output cachedAnswer); return udict }
                           | null answers = failMessage
                           | otherwise = do { putStrLn (output $ head answers); return udict' }
                             where failMessage = do { putStrLn ("No way to connect " ++ w1 ++ " to " ++ w2); return udict }
                                   dict = lookupWithDefaultFM udict (emptyFM) (length w1)
                                   (Just cache) = lookupFM dict w1
                                   maybeCachedAnswer = lookup w2 cache
                                   (Just cachedAnswer) = maybeCachedAnswer
                                   answers = findAnswer dict w2 [[w1]]
                                   output = concat . (intersperse " > ") . reverse
                                   udict' = addToFM udict (length w1) (addToFM dict w1 ((w2, (head answers)) : cache))
connect udict _            = do { putStrLn "Huh?"; return udict }

findAnswer                 :: Dictionary -> String -> [[String]] -> [[String]]
findAnswer _ _ []          = []
findAnswer dict word paths | (not . null) answers = answers
                           | otherwise = findAnswer dict' word newPaths
                             where frontier = map head paths
                                   dict' = delListFromFM dict frontier
                                   answers = filter (word `elem`) paths
                                   newPaths = concat $ map (getNeighbors dict) paths

getNeighbors               :: Dictionary -> [String] -> [[String]]
getNeighbors dict path     = map (:path) $ filter ((flip elemFM) dict) xforms
                             where (word:_) = path
                                   len = length word - 1
                                   xforms = [ setPos word x l | x <- [0..len], l <- ['a'..'z'] ]

setPos                     :: String -> Int -> Char -> String
setPos word x l            = prefix ++ (l : suffix)
                             where (prefix, (_:suffix)) = splitAt x word



A small test case. (5.00 / 1) (#95)
by i on Thu Apr 25, 2002 at 04:03:29 AM EST

add-dictionary WORD.LST
connect make love
connect bread wheat
connect resist futile
connect effaces cabaret

WORD.LST is linked somewhere in this diary.

and we have a contradicton according to our assumptions and the factor theorem

Enough is enough. (5.00 / 1) (#106)
by i on Thu Apr 25, 2002 at 01:41:58 PM EST

Here's the final version. It's still slower than tmoertel's, and has a stack-leak somewhere, but at least it runs my testcase in under 1 minute. (Changing FiniteMap to StrictFiniteMap fixes the space leak but makes it even slower; the inverse operation on tmoertel's entry both introduces a space leak and makes it slower. Strictness is so much fun.) So I'm leaving it as is. Enjoy!

--------------------- ladder.hs -------------

-- PFC4 entry: username = "i"
-- compilation (on Un*x): ghc -O2 ladder.hs -o ladder -package data
-- You probably need at least GHC5.0x
-- run: ladder [+RTS <runtime system switches>]
-- runtime switches help: ladder +RTS -\?
-- Fiddling with them may improve performance, but only marginally
--
-- with Hugs: hugs ladder.hs
-- with GHCi: ghci ladder.hs -package data

module Main where

import StrictFiniteMap (FiniteMap, addToFM, addToFM_C,
                                  emptyFM, lookupFM, plusFM_C,
                                  lookupWithDefaultFM)
import Maybe (catMaybes)
import IO
import qualified Set as S
import qualified SimpleQueue as Q

-- algorithms + data structures

splits :: String -> [String]
splits [] = []
splits (x:xs) = ('?':xs) : map (x:) (splits xs)

type LDict = FiniteMap String [String]
type Dict =  FiniteMap Int LDict

-- our dictionary maps word length ro the real dictionary
-- the real dictionary maps a splitword to a list of words
-- like this: "c?t" ==> ["cat", "cut"]

addWord :: Dict -> String -> Dict
addWord dict word = addToFM_C combiner dict (length word) unitDict where
  combiner :: LDict -> LDict -> LDict
  combiner = plusFM_C (++)
  unitDict = foldl' addWord' emptyFM (splits word)
  addWord' dict splitWord = addToFM dict splitWord [word]

-- we cannot find directly whether a word is in the dictionary!
-- so we have to look up its adjacents and find if it's
-- adjacent to itself! This is completely unnesseccary,
-- but challenge rules say so, amd we must comply.

-- Pattern quards would be handy here

findPath :: String -> String -> Dict -> [String]
findPath from to dict
  | len /= len' = []
  | not $ elem from (adjacents ldict from) = []
  | otherwise = findPath' from to ldict
  where
      ldict = lookupWithDefaultFM dict emptyFM len
      findPath' :: String -> String -> LDict -> [String]
      findPath' from to dict = findChain (adjacents dict) from to
      len = length from
      len' = length to

-- A list of words reachable from a given word in one step

adjacents :: LDict -> String -> [String]
adjacents dict word = maybes $ map (lookupFM dict) (splits word) where
  maybes = concat . catMaybes

-- handle user interaction

addDict :: FilePath -> Dict -> IO Dict
addDict path dict = catch addDict' report where
  addDict' = do
      handle <- openFile path ReadMode
      contents <- hGetContents handle
      let dict' = foldl' addWord dict (words contents)
      putStrLn $ "Added " ++ path
      return dict'
  report _ = putStrLn ("Error adding " ++ path) >> return dict

connect :: Dict -> String -> String -> IO ()
connect dict word1 word2 = do
  let path = findPath word1 word2 dict
  case path of
      [] -> putStrLn $ "No way to connect " ++ word1 ++ " to " ++ word2
      xs  -> putStrLn $ showPath xs where
          showPath = foldr1 (\a b -> a ++ " > " ++ b)

doCommands :: Dict -> IO Dict
doCommands dict = do
  putStr "? "
  hFlush stdout
  command <- getLine
  case (words command) of
      ['e':_] -> return dict
      ['a':_, path] -> addDict path dict >>= doCommands
      ['c':_, word1, word2] -> connect dict word1 word2 >> doCommands dict
      _ -> doCommands dict

main :: IO ()
main = catch (doCommands emptyFM >> return ()) (\_->return ())

-- This section implements the standard BFS algorithm

findChain :: Ord a => (a->[a]) -> a -> a -> [a]
findChain f from to = reverse $ first chain where
  chain = dropWhile notFound $ bfsPaths $ toPTree f from
  first (x:_) = x
  first [] = []
  notFound [] = True
  notFound xs = to /= head xs

-- Parented tree, so we can walk from a node back to root

data PTree a = PNil | PTree (PTree a) a [PTree a]

-- Functional tree to parented tree
toPTree :: (a->[a]) -> a -> PTree a
toPTree f a = toPTree' f PNil a where
  toPTree' f p a = tree where
      tree = PTree p a trees
      trees = map (toPTree' f tree) (f a)


-- The BFS itself
bfsPaths :: Ord a => PTree a -> [[a]]
bfsPaths PNil = []
bfsPaths tree@(PTree parent node branches) = map walkPath paths where
  (paths,_) = bfs (Q.single tree) (S.emptySet)
  bfs :: Ord a => Q.Seq (PTree a) -> S.Set a -> ([PTree a], S.Set a)
  bfs queue set =
      case Q.lview queue of
          Q.Just2 (tree@(PTree p n bs)) queue' ->
              if S.elementOf n set
                  then bfs queue' set
                  else (tree : trees, set'') where
                      (trees, set'') = bfs (foldl' Q.snoc queue' bs) set'
                      set' = S.addToSet set n
          Q.Just2 (PNil) queue' -> bfs queue' set
          Q.Nothing2 -> ([], set)

  walkPath :: PTree a -> [a]
  walkPath PNil = []
  walkPath (PTree p a _) = a:walkPath p

-- foldl' (why oh why isn't it standard?)
foldl'                    :: (a -> b -> a) -> a -> [b] -> a
foldl' f a []        = a
foldl' f a (x:xs) = (foldl' f $! f a x) xs


and we have a contradicton according to our assumptions and the factor theorem

faster but still slower than tmoertel's (5.00 / 1) (#110)
by svillee on Thu Apr 25, 2002 at 10:48:12 PM EST

/*  benchmark input:  */

/*  add-dictionary word.lst  */
/*  connect make love        */
/*  connect bread wheat      */
/*  connect resist futile    */
/*  connect effaces cabaret  */
/*  exit                     */

/*  time on a Celeron 700 MHz:  */

/*  67.72user 0.05system 1:09.80elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k  */
/*  0inputs+0outputs (111major+2280minor)pagefaults 0swaps                        */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_INPUT_LENGTH 4096
#define MAX_VERB_LENGTH 32
#define MAX_WORD_LENGTH 1024

typedef struct Word_s {
  struct Word_s  *next;
  struct Word_s  *left;
  struct Word_s  *right;
  int             balance;
  struct Links_s *links;
  char            name[1];
} Word_t;

typedef struct Link_s {
  struct Link_s *nextSameSource;
  Word_t        *destination;

  /*  The next two fields are meaningful only for the current connect.  */

  struct Link_s *nextInOrbit;
  struct Link_s *sourceLink;
} Link_t;

typedef struct Links_s {
  struct Links_s *next;
  Word_t         *source;
  Link_t         *firstLink;
} Links_t;

static int      length;
static Links_t *firstLinks;
static Word_t  *firstByLength[MAX_WORD_LENGTH+1];
static Word_t  *topByLength[MAX_WORD_LENGTH+1];
static char     input[MAX_INPUT_LENGTH+1];
static char     verb[MAX_VERB_LENGTH+1];
static char     word1[MAX_WORD_LENGTH+1];
static char     word2[MAX_WORD_LENGTH+1];

static Word_t *findWord(char *name)
{
  Word_t *wp;
  int m;

  length = strlen(name);
  wp = topByLength[length];
  while (wp != (Word_t *) 0) {
    m = strcmp(name, wp->name);
    if (m == 0)
      break;
    if (m < 0) {
      wp = wp->left;
    } else {
      wp = wp->right;
    }
  }
  return wp;
}

void insertNode(
  Word_t  *newNode,
  Word_t **where,
  int     *outerGrew)
{
  Word_t *nodeA;
  Word_t *nodeB;
  Word_t *nodeC;
  int innerGrew;

  nodeA = *where;
  if (nodeA == (Word_t *) 0) {
    /*  The tree was empty, so make the new node the only one.  */

    newNode->left = (Word_t *) 0;
    newNode->right = (Word_t *) 0;
    newNode->balance = 0;
    *where = newNode;
    *outerGrew = 1;
  } else if (strcmp(newNode->name, nodeA->name) < 0) {
    /*  Add the new node to the left subtree.  */

    insertNode(newNode, &nodeA->left, &innerGrew);
    if (!innerGrew) {
      *outerGrew = 0;
    } else if (nodeA->balance == 1) {
      /*  The right subtree height was 1 more, and now they are the same.  */

      nodeA->balance = 0;
      *outerGrew = 0;
    } else if (nodeA->balance == 0) {
      /*  The subtree heights were the same, and now the left subtree  */
      /*  height is 1 more.                                            */

      nodeA->balance = -1;
      *outerGrew = 1;
    } else {
      /*  The left subtree height was 1 more, and now it is 2 more,  */
      /*  so we must rebalance.  Examine the left subtree.           */

      nodeB = nodeA->left;
      nodeC = nodeB->right;
      if (nodeB->balance == -1) {
        /*      A              B      */
        /*     / \            / \     */
        /*    B   Z   --->   W   A    */
        /*   / \                / \   */
        /*  W   C              C   Z  */

        nodeA->left = nodeC;
        nodeA->balance = 0;
        nodeB->right = nodeA;
        nodeB->balance = 0;
        *where = nodeB;
      } else {
        /*      A               C       */
        /*     / \            /   \     */
        /*    B   Z   --->   B     A    */
        /*   / \            / \   / \   */
        /*  W   C          W   X Y   Z  */
        /*     / \                      */
        /*    X   Y                     */

        nodeB->right = nodeC->left;
        if (nodeC->balance == 1) {
          nodeB->balance = -1;
        } else {
          nodeB->balance = 0;
        }
        nodeA->left = nodeC->right;
        if (nodeC->balance == -1) {
          nodeA->balance = 1;
        } else {
          nodeA->balance = 0;
        }
        nodeC->left = nodeB;
        nodeC->right = nodeA;
        nodeC->balance = 0;
        *where = nodeC;
      }
      *outerGrew = 0;
    }
  } else {
    /*  Add the new node to the right subtree.  */

    insertNode(newNode, &nodeA->right, &innerGrew);
    if (!innerGrew) {
      *outerGrew = 0;
    } else if (nodeA->balance == -1) {
      /*  The left subtree height was 1 more, and now they are the same.  */

      nodeA->balance = 0;
      *outerGrew = 0;
    } else if (nodeA->balance == 0) {
      /*  The subtree heights were the same, and now the right subtree  */
      /*  height is 1 more.                                             */

      nodeA->balance = 1;
      *outerGrew = 1;
    } else {
      /*  The right subtree height was 1 more, and now it is 2 more,  */
      /*  so we must rebalance.  Examine the right subtree.           */

      nodeB = nodeA->right;
      nodeC = nodeB->left;
      if (nodeB->balance == 1) {
        /*      A              B      */
        /*     / \            / \     */
        /*    W   B   --->   A   Z    */
        /*       / \        / \       */
        /*      C   Z      W   C      */

        nodeA->right = nodeC;
        nodeA->balance = 0;
        nodeB->left = nodeA;
        nodeB->balance = 0;
        *where = nodeB;
      } else {
        /*      A               C       */
        /*     / \            /   \     */
        /*    W   B   --->   A     B    */
        /*       / \        / \   / \   */
        /*      C   Z      W   X Y   Z  */
        /*     / \                      */
        /*    X   Y                     */

        nodeB->left = nodeC->right;
        if (nodeC->balance == -1) {
          nodeB->balance = 1;
        } else {
          nodeB->balance = 0;
        }
        nodeA->right = nodeC->left;
        if (nodeC->balance == 1) {
          nodeA->balance = -1;
        } else {
          nodeA->balance = 0;
        }
        nodeC->left = nodeA;
        nodeC->right = nodeB;
        nodeC->balance = 0;
        *where = nodeC;
      }
      *outerGrew = 0;
    }
  }
}

static void doAddDictionary(void)
{
  FILE *fp;
  int grew;
  Links_t *links1;
  Links_t *links2;
  Link_t *link1;
  Link_t *link2;
  Word_t *wp;

  if (firstLinks != (Links_t *) 0) {
    /*  Discard any links calculated so far.  */

    links1 = firstLinks;
    while (links1 != (Links_t *) 0) {
      links2 = links1->next;
      link1 = links1->firstLink;
      while (link1 != (Link_t *) 0) {
        link2 = link1->nextSameSource;
        free(link1);
        link1 = link2;
      }
      links1->source->links = (Links_t *) 0;
      free(links1);
      links1 = links2;
    }
    firstLinks = (Links_t *) 0;
  }
  fp = fopen(word1, "r");
  if (fp == (FILE *) 0) {
    fprintf(stderr, "Error adding %s\n", word1);
    return;
  }
  while (fscanf(fp, "%1024s", word2) > 0) {
    /*  See if the word is already in the dictionary.  */

    if (findWord(word2) != (Word_t *) 0)
      continue;

    /*  Create the new word.  */

    wp = (Word_t *) malloc(sizeof (Word_t) + length);
    if (wp == (Word_t *) 0) {
      fprintf(stderr, "out of memory for word\n");
      exit(-1);
    }
    wp->next = firstByLength[length];
    wp->links = (Links_t *) 0;
    strcpy(wp->name, word2);
    firstByLength[length] = wp;

    /*  Add it to the balanced tree.  */

    insertNode(wp, &topByLength[length], &grew);
  }
  fclose(fp);
  printf("Added %s\n", word1);
}

static void findLinks(Word_t *wp)
{
  char *sp;
  char *dp;
  Word_t *wp2;
  Link_t *link;
  Links_t *links;

  links = (Links_t *) malloc(sizeof (Links_t));
  if (links == (Links_t *) 0) {
    fprintf(stderr, "out of memory for links\n");
    exit(-1);
  }
  links->next = firstLinks;
  links->source = wp;
  links->firstLink = (Link_t *) 0;
  firstLinks = links;
  wp->links = links;

  /*  Look for any existing words that differ by one letter.  */

  wp2 = firstByLength[length];
  while (wp2 != (Word_t *) 0) {
    if (wp2 != wp) {
      sp = wp->name;
      dp = wp2->name;
      while (*sp == *dp) {
        ++sp;
        ++dp;
      }
      if (strcmp(sp + 1, dp + 1) == 0) {
        /*  They differ by one letter, so create a link.  */

        link = (Link_t *) malloc(sizeof (Link_t));
        if (link == (Link_t *) 0) {
          fprintf(stderr, "out of memory for link\n");
          exit(-1);
        }
        link->nextSameSource = links->firstLink;
        link->destination = wp2;
        link->nextInOrbit = (Link_t *) 0;
        link->sourceLink = (Link_t *) 0;
        links->firstLink = link;
      }
    }
    wp2 = wp2->next;
  }
}

static void doConnect(void)
{
  int length1;
  Word_t *target1;
  Word_t *target2;
  Word_t *wp;
  Links_t *links;
  Link_t *link;
  Link_t *link2;
  Link_t *firstInOrbit;
  Link_t endMark;

  target1 = findWord(word1);
  length1 = length;
  target2 = findWord(word2);
  if ((target1 != (Word_t *) 0) &&
      (target2 != (Word_t *) 0) &&
      (length1 == length))
  {
    if (target1 == target2) {
      printf("%s\n", word1);
      return;
    }

    /*  Reset any existing links.  */

    links = firstLinks;
    while (links != (Links_t *) 0) {
      link = links->firstLink;
      while (link != (Link_t *) 0) {
        link->nextInOrbit = (Link_t *) 0;
        link->sourceLink = (Link_t *) 0;
        link = link->nextSameSource;
      }
      links = links->next;
    }

    /*  Set up a dummy link as the chain end mark.  */

    endMark.nextSameSource = (Link_t *) 0;
    endMark.destination = target2;
    endMark.nextInOrbit = (Link_t *) 0;
    endMark.sourceLink = (Link_t *) 0;

    /*  Build successive orbits from target2 until target1  */
    /*  is reached or the outermost orbit is empty.         */

    firstInOrbit = &endMark;
    while (firstInOrbit != (Link_t *) 0) {
      link = firstInOrbit;
      firstInOrbit = (Link_t *) 0;
      while (link != (Link_t *) 0) {
        wp = link->destination;
        if (wp->links == (Links_t *) 0)
          findLinks(wp);
        link2 = wp->links->firstLink;
        while (link2 != (Link_t *) 0) {
          if (link2->sourceLink == (Link_t *) 0) {
            if (link2->destination == target1) {
              printf("%s", target1->name);
              link2 = link;
              while (link2 != (Link_t *) 0) {
                printf(" > %s", link2->destination->name);
                link2 = link2->sourceLink;
              }
              putchar('\n');
              return;
            }
            link2->sourceLink = link;
            link2->nextInOrbit = firstInOrbit;
            firstInOrbit = link2;
          }
          link2 = link2->nextSameSource;
        }
        link = link->nextInOrbit;
      }
    }
  }
  printf("No way to connect %s to %s.\n", word1, word2);
}

int main(int argc, char **argv)
{
  int i;

  while (1) {
    printf("? ");
    input[0] = '\0';
    fgets(input, MAX_INPUT_LENGTH+1, stdin);
    verb[0] = '\0';
    word1[0] = '\0';
    word2[0] = '\0';
    sscanf(input, "%32s %1024s %1024s", verb, word1, word2);
    if (strcmp(verb, "add-dictionary") == 0) {
      doAddDictionary();
    } else if (strcmp(verb, "connect") == 0) {
      doConnect();
    } else if (strcmp(verb, "exit") == 0) {
      break;
    }
  }
  return 0;
}


version 3, finally faster than tmoertel's!!! (5.00 / 1) (#111)
by svillee on Fri Apr 26, 2002 at 12:51:46 AM EST

/*  benchmark input:  */

/*  add-dictionary word.lst  */
/*  connect make love        */
/*  connect bread wheat      */
/*  connect resist futile    */
/*  connect effaces cabaret  */
/*  exit                     */

/*  time on a Celeron 700 MHz:  */

/*  3.51user 0.05system 0:03.69elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k  */
/*  0inputs+0outputs (111major+2098minor)pagefaults 0swaps                       */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_INPUT_LENGTH 4096
#define MAX_VERB_LENGTH 32
#define MAX_WORD_LENGTH 1024

typedef struct Word_s {
  struct Word_s  *left;
  struct Word_s  *right;
  int             balance;
  struct Links_s *links;
  char            name[1];
} Word_t;

typedef struct Link_s {
  struct Link_s *nextSameSource;
  Word_t        *destination;

  /*  The next two fields are meaningful only for the current connect.  */

  struct Link_s *nextInOrbit;
  struct Link_s *sourceLink;
} Link_t;

typedef struct Links_s {
  struct Links_s *next;
  Word_t         *source;
  Link_t         *firstLink;
} Links_t;

static int      length;
static Links_t *firstLinks;
static Word_t  *topByLength[MAX_WORD_LENGTH+1];
static char     input[MAX_INPUT_LENGTH+1];
static char     verb[MAX_VERB_LENGTH+1];
static char     word1[MAX_WORD_LENGTH+1];
static char     word2[MAX_WORD_LENGTH+1];
static char     word3[MAX_WORD_LENGTH+1];

static Word_t *findWord(char *name)
{
  Word_t *wp;
  int m;

  length = strlen(name);
  wp = topByLength[length];
  while (wp != (Word_t *) 0) {
    m = strcmp(name, wp->name);
    if (m == 0)
      break;
    if (m < 0) {
      wp = wp->left;
    } else {
      wp = wp->right;
    }
  }
  return wp;
}

void insertNode(
  Word_t  *newNode,
  Word_t **where,
  int     *outerGrew)
{
  Word_t *nodeA;
  Word_t *nodeB;
  Word_t *nodeC;
  int innerGrew;

  nodeA = *where;
  if (nodeA == (Word_t *) 0) {
    /*  The tree was empty, so make the new node the only one.  */

    newNode->left = (Word_t *) 0;
    newNode->right = (Word_t *) 0;
    newNode->balance = 0;
    *where = newNode;
    *outerGrew = 1;
  } else if (strcmp(newNode->name, nodeA->name) < 0) {
    /*  Add the new node to the left subtree.  */

    insertNode(newNode, &nodeA->left, &innerGrew);
    if (!innerGrew) {
      *outerGrew = 0;
    } else if (nodeA->balance == 1) {
      /*  The right subtree height was 1 more, and now they are the same.  */

      nodeA->balance = 0;
      *outerGrew = 0;
    } else if (nodeA->balance == 0) {
      /*  The subtree heights were the same, and now the left subtree  */
      /*  height is 1 more.                                            */

      nodeA->balance = -1;
      *outerGrew = 1;
    } else {
      /*  The left subtree height was 1 more, and now it is 2 more,  */
      /*  so we must rebalance.  Examine the left subtree.           */

      nodeB = nodeA->left;
      nodeC = nodeB->right;
      if (nodeB->balance == -1) {
        /*      A              B      */
        /*     / \            / \     */
        /*    B   Z   --->   W   A    */
        /*   / \                / \   */
        /*  W   C              C   Z  */

        nodeA->left = nodeC;
        nodeA->balance = 0;
        nodeB->right = nodeA;
        nodeB->balance = 0;
        *where = nodeB;
      } else {
        /*      A               C       */
        /*     / \            /   \     */
        /*    B   Z   --->   B     A    */
        /*   / \            / \   / \   */
        /*  W   C          W   X Y   Z  */
        /*     / \                      */
        /*    X   Y                     */

        nodeB->right = nodeC->left;
        if (nodeC->balance == 1) {
          nodeB->balance = -1;
        } else {
          nodeB->balance = 0;
        }
        nodeA->left = nodeC->right;
        if (nodeC->balance == -1) {
          nodeA->balance = 1;
        } else {
          nodeA->balance = 0;
        }
        nodeC->left = nodeB;
        nodeC->right = nodeA;
        nodeC->balance = 0;
        *where = nodeC;
      }
      *outerGrew = 0;
    }
  } else {
    /*  Add the new node to the right subtree.  */

    insertNode(newNode, &nodeA->right, &innerGrew);
    if (!innerGrew) {
      *outerGrew = 0;
    } else if (nodeA->balance == -1) {
      /*  The left subtree height was 1 more, and now they are the same.  */

      nodeA->balance = 0;
      *outerGrew = 0;
    } else if (nodeA->balance == 0) {
      /*  The subtree heights were the same, and now the right subtree  */
      /*  height is 1 more.                                             */

      nodeA->balance = 1;
      *outerGrew = 1;
    } else {
      /*  The right subtree height was 1 more, and now it is 2 more,  */
      /*  so we must rebalance.  Examine the right subtree.           */

      nodeB = nodeA->right;
      nodeC = nodeB->left;
      if (nodeB->balance == 1) {
        /*      A              B      */
        /*     / \            / \     */
        /*    W   B   --->   A   Z    */
        /*       / \        / \       */
        /*      C   Z      W   C      */

        nodeA->right = nodeC;
        nodeA->balance = 0;
        nodeB->left = nodeA;
        nodeB->balance = 0;
        *where = nodeB;
      } else {
        /*      A               C       */
        /*     / \            /   \     */
        /*    W   B   --->   A     B    */
        /*       / \        / \   / \   */
        /*      C   Z      W   X Y   Z  */
        /*     / \                      */
        /*    X   Y                     */

        nodeB->left = nodeC->right;
        if (nodeC->balance == -1) {
          nodeB->balance = 1;
        } else {
          nodeB->balance = 0;
        }
        nodeA->right = nodeC->left;
        if (nodeC->balance == 1) {
          nodeA->balance = -1;
        } else {
          nodeA->balance = 0;
        }
        nodeC->left = nodeA;
        nodeC->right = nodeB;
        nodeC->balance = 0;
        *where = nodeC;
      }
      *outerGrew = 0;
    }
  }
}

static void doAddDictionary(void)
{
  FILE *fp;
  int grew;
  Links_t *links1;
  Links_t *links2;
  Link_t *link1;
  Link_t *link2;
  Word_t *wp;

  if (firstLinks != (Links_t *) 0) {
    /*  Discard any links calculated so far.  */

    links1 = firstLinks;
    while (links1 != (Links_t *) 0) {
      links2 = links1->next;
      link1 = links1->firstLink;
      while (link1 != (Link_t *) 0) {
        link2 = link1->nextSameSource;
        free(link1);
        link1 = link2;
      }
      links1->source->links = (Links_t *) 0;
      free(links1);
      links1 = links2;
    }
    firstLinks = (Links_t *) 0;
  }
  fp = fopen(word1, "r");
  if (fp == (FILE *) 0) {
    fprintf(stderr, "Error adding %s\n", word1);
    return;
  }
  while (fscanf(fp, "%1024s", word2) > 0) {
    /*  See if the word is already in the dictionary.  */

    if (findWord(word2) != (Word_t *) 0)
      continue;

    /*  Create the new word.  */

    wp = (Word_t *) malloc(sizeof (Word_t) + length);
    if (wp == (Word_t *) 0) {
      fprintf(stderr, "out of memory for word\n");
      exit(-1);
    }
    wp->links = (Links_t *) 0;
    strcpy(wp->name, word2);

    /*  Add it to the balanced tree.  */

    insertNode(wp, &topByLength[length], &grew);
  }
  fclose(fp);
  printf("Added %s\n", word1);
}

static void findLinks(Word_t *wp)
{
  int i;
  int m;
  char j;
  char k;
  char *cp;
  Word_t *wp2;
  Link_t *link;
  Links_t *links;

  links = (Links_t *) malloc(sizeof (Links_t));
  if (links == (Links_t *) 0) {
    fprintf(stderr, "out of memory for links\n");
    exit(-1);
  }
  links->next = firstLinks;
  links->source = wp;
  links->firstLink = (Link_t *) 0;
  firstLinks = links;
  wp->links = links;

  /*  Look for any existing words that differ by one letter.  */

  for (i = 0; i < length; ++i) {
    strcpy(word3, wp->name);
    cp = &word3[i];
    k = *cp;
    for (j = 'a'; j <= 'z'; ++j)
      if (j != k) {
        *cp = j;
        wp2 = topByLength[length];
        while (wp2 != (Word_t *) 0) {
          m = strcmp(word3, wp2->name);
          if (m == 0)
            break;
          if (m < 0) {
            wp2 = wp2->left;
          } else {
            wp2 = wp2->right;
          }
        }
        if (wp2 != (Word_t *) 0) {
          /*  They differ by one letter, so create a link.  */

          link = (Link_t *) malloc(sizeof (Link_t));
          if (link == (Link_t *) 0) {
            fprintf(stderr, "out of memory for link\n");
            exit(-1);
          }
          link->nextSameSource = links->firstLink;
          link->destination = wp2;
          link->nextInOrbit = (Link_t *) 0;
          link->sourceLink = (Link_t *) 0;
          links->firstLink = link;
        }
      }
  }
}

static void doConnect(void)
{
  int length1;
  Word_t *target1;
  Word_t *target2;
  Word_t *wp;
  Links_t *links;
  Link_t *link;
  Link_t *link2;
  Link_t *firstInOrbit;
  Link_t endMark;

  target1 = findWord(word1);
  length1 = length;
  target2 = findWord(word2);
  if ((target1 != (Word_t *) 0) &&
      (target2 != (Word_t *) 0) &&
      (length1 == length))
  {
    if (target1 == target2) {
      printf("%s\n", word1);
      return;
    }

    /*  Reset any existing links.  */

    links = firstLinks;
    while (links != (Links_t *) 0) {
      link = links->firstLink;
      while (link != (Link_t *) 0) {
        link->nextInOrbit = (Link_t *) 0;
        link->sourceLink = (Link_t *) 0;
        link = link->nextSameSource;
      }
      links = links->next;
    }

    /*  Set up a dummy link as the chain end mark.  */

    endMark.nextSameSource = (Link_t *) 0;
    endMark.destination = target2;
    endMark.nextInOrbit = (Link_t *) 0;
    endMark.sourceLink = (Link_t *) 0;

    /*  Build successive orbits from target2 until target1  */
    /*  is reached or the outermost orbit is empty.         */

    firstInOrbit = &endMark;
    while (firstInOrbit != (Link_t *) 0) {
      link = firstInOrbit;
      firstInOrbit = (Link_t *) 0;
      while (link != (Link_t *) 0) {
        wp = link->destination;
        if (wp->links == (Links_t *) 0)
          findLinks(wp);
        link2 = wp->links->firstLink;
        while (link2 != (Link_t *) 0) {
          if (link2->sourceLink == (Link_t *) 0) {
            if (link2->destination == target1) {
              printf("%s", target1->name);
              link2 = link;
              while (link2 != (Link_t *) 0) {
                printf(" > %s", link2->destination->name);
                link2 = link2->sourceLink;
              }
              putchar('\n');
              return;
            }
            link2->sourceLink = link;
            link2->nextInOrbit = firstInOrbit;
            firstInOrbit = link2;
          }
          link2 = link2->nextSameSource;
        }
        link = link->nextInOrbit;
      }
    }
  }
  printf("No way to connect %s to %s.\n", word1, word2);
}

int main(int argc, char **argv)
{
  int i;

  while (1) {
    printf("? ");
    input[0] = '\0';
    fgets(input, MAX_INPUT_LENGTH+1, stdin);
    verb[0] = '\0';
    word1[0] = '\0';
    word2[0] = '\0';
    sscanf(input, "%32s %1024s %1024s", verb, word1, word2);
    if (strcmp(verb, "add-dictionary") == 0) {
      doAddDictionary();
    } else if (strcmp(verb, "connect") == 0) {
      doConnect();
    } else if (strcmp(verb, "exit") == 0) {
      break;
    }
  }
  return 0;
}


time left? (5.00 / 1) (#113)
by codemonkey_uk on Fri Apr 26, 2002 at 05:30:03 AM EST

I only spotted this yesterday. How much time is left? I spend an hour or so last night getting a c++ / stl version working using the A* algorithm, but I've still got to do some testing, and bug fixing. Is it worth it? When's the "final" entry supposed to be in?
---
Thad
"The most savage controversies are those about matters as to which there is no good evidence either way." - Bertrand Russell
Entry: C++ (5.00 / 1) (#116)
by codemonkey_uk on Fri Apr 26, 2002 at 08:12:43 AM EST

My entry is implemented in Standard C++*. It is split into two files, which I will post seperatly as replys to this comment.

astar.h is a header file providing a template based implementation of the A* algorithm.

pfc4.cpp provides the nuts & bolts needed by astar.h to solve the problem.

Some example results follow:

? add-dictionary word
Added 173529 words from word to dictionary in 1.453 seconds.
? connect make love
make > lake > lave > love
Done in 0 seconds.
? connect bread wheat
bread > break > bleak > bleat > cleat > cheat > wheat
Done in 0.047 seconds.
? connect resist futile
resist > relist > relict > relics > relies > belies > bevies > levies > levees > levels > revels > refels > refell > refill > refile > retile > rutile > futile
Done in 3.937 seconds.
? connect effaces cabaret
effaces > effaced > enfaced > enlaced > unlaced > unladed > unfaded > unfaked > uncaked > uncakes > uncases > uneases > ureases > creases > cresses > tresses > trasses > brasses > brashes > braches > beaches > benches > bunches > buncoes > bunkoes > buckoes > buckles > huckles > heckles > deckles > deciles > defiles >refiles > reviles > reviled > reveled > raveled > ravened > havened > havered > wavered > watered > catered > capered > tapered > tabered > tabored > taboret > tabaret > cabaret
Done in 34 seconds.
? connect breads cheese
breads > breaks > creaks > creaky > creasy > crease > creese > cheese
Done in 0.109 seconds.
? connect driving tickets
driving > droving > proving > prosing > prising > peising > peining > paining > panning > panting > patting > matting > mattins > lattins > lattens > rattens > ratters > ranters > rankers > rackers > rackets > rickets > tickets
Done in 6.734 seconds.
? exit
* Tested on MSVC++ v6. Let me know if you have problems with other compilers.
---
Thad
"The most savage controversies are those about matters as to which there is no good evidence either way." - Bertrand Russell
My C Submission (5.00 / 1) (#126)
by Logan on Fri Apr 26, 2002 at 11:53:10 AM EST

Here is my C submission. It is far, far faster than my Haskell submission. I will post the code here, and post some timing information and description of my approach (which is quite simple) in a followup.

First, some simple build instructions. I compiled and tested on my own computer with the following command line:

gcc -g -O2 -Wall -o pfc4 pfc4.c
In theory, the compiler and flags should not matter. I've supposedly written some fairly portable code here.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

/* Uncomment the following line to have timing information displayed */
#define TIMING

#ifdef TIMING
#include <sys/time.h>
#endif

#define MAX_LENGTH    64

/* dict[len] is a sorted array of (len+1)-letter words
 * ndict[len] is number of words in dict[len]
 * sdict[len] is number of bytes allocated for dict[len]
 */
char ***dict;
int  *ndict, *sdict;

/* Initialize dict, ndict, and sdict for first call of add_dictionary */
void new_dict(void)
{
    int i;

    dict = malloc(MAX_LENGTH * sizeof(char **));
    ndict = calloc(MAX_LENGTH, sizeof(int));
    sdict = malloc(MAX_LENGTH * sizeof(int));
    /* We start each dictionary with a size of 64 and let it grow */
    for(i = 0; i < MAX_LENGTH; i++)
        dict[i] = malloc((sdict[i] = 64) * sizeof(char *));
}

/* Free up memory used by previous dictionary when new dictionary is added */
void release_dict(void)
{
    int i, j;

    for(i = 0; i < MAX_LENGTH; i++) {
        for(j = 0; j < ndict[i]; j++)
            free(dict[i][j]);
    }
    memset(ndict, 0, MAX_LENGTH * sizeof(int));
}

/* Wrapper around strcmp to do some pointer dereferencing */
int cmp(const void *a, const void *b)
{
    return strcmp(*(char **)a, *(char **)b);
}

int add_dictionary(const char *filename)
{
    FILE *f = fopen(filename, "r");
    char line[256], *p;
    int len;

    if(!f)
        return 0;
    if(dict)
        release_dict();
    else
        new_dict();
    while(fgets(line, sizeof(line), f)) {
        if((p = strchr(line, '\n')))
            *p = 0;
        len = strlen(line) - 1;
        if(ndict[len] >= sdict[len]) // need to resize array
            dict[len] = realloc(dict[len], (sdict[len] <<= 1) * sizeof(char *));
        dict[len][ndict[len]++] = strcpy(malloc(len + 2), line);
    }
    for(len = 0; len < MAX_LENGTH; len++)
        qsort(dict[len], ndict[len], sizeof(char **), cmp);
    return 1;
}

/* Simple binary search */
int lookup(int len, const char *s1)
{
    int left = 0, right = ndict[len], mid;
   
    if(!right)
        return -1;
    while(left != right - 1) {
        mid = (left + right) >> 1;
        if(strcmp(s1, dict[len][mid]) < 0)
            right = mid;
        else
            left = mid;
    }
    if(!strcmp(s1, dict[len][left]))
        return left;
    return -1;
}

/* Does a breadth-first search, to see if y can be reached from x */
int connection(int len, int x, int y)
{
    int *pred, *queue, *path;
    int h = 0, t = 1, i, w, z, npath;
    char buf[256], oldc;
#ifdef TIMING
    struct timeval tvs, tve;

    gettimeofday(&tvs, 0);
#endif

    /* If the words are the same, avoid allocating all these buffers */
    if(x == y) {
        printf("%s\n", dict[len][x]);
        return 1;
    }
    pred = malloc(ndict[len] * sizeof(int));
    queue = malloc(ndict[len] * sizeof(int));
    memset(pred, -1, ndict[len] * sizeof(int));
    /* The queue, over the interval of indices [h, t), represents the frontier */
    queue[0] = x;
    while(h != t) {
        z = queue[h++ % ndict[len]];
        if(z == y)
            goto success;
        strcpy(buf, dict[len][z]);
        /* transform buf into all possible strings that differ from the original
         * by exactly one letter.  For all transformations that correspond to an
         * actual word that has not yet been visited or added to the frontier,
         * we note that the original word was its predecessor and append it to
         * the queue
         */
        for(i = 0; buf[i]; i++) {
            oldc = buf[i];
            for(buf[i] = 'a'; buf[i] <= 'z'; buf[i]++)
                if(oldc != buf[i] && (w = lookup(len, buf)) > 0 && pred[w] == -1) {
                    pred[w] = z;
                    if(w == y)
                        goto success;
                    queue[t++ % ndict[len]] = w;
                }
            buf[i] = oldc;
        }
    }
    /* We only reach this point if the search fails */
    free(pred);
    free(queue);
#ifdef TIMING
    gettimeofday(&tve, 0);
    tve.tv_usec -= tvs.tv_usec;
    tve.tv_sec -= tvs.tv_sec;
    while(tve.tv_usec < 0) {
        tve.tv_usec += 1000000;
        tve.tv_sec--;
    }
    printf("%d.%03d seconds\n", tve.tv_sec, tve.tv_usec / 1000);
#endif
    return 0;

success:

    /* We now build our path from y to x by following predecessors.  Then
     * we traverse it in reverse order for the output.
     */
    path = malloc(ndict[len] * sizeof(int));
    npath = 0;
    while(y != x)
        y = pred[path[npath++] = y];
    path[npath] = x;
    printf("%s", dict[len][x]);
    for(i = npath; i--;)
        printf(" > %s", dict[len][path[i]]);
    printf("\n");
#ifdef TIMING
    gettimeofday(&tve, 0);
    tve.tv_usec -= tvs.tv_usec;
    tve.tv_sec -= tvs.tv_sec;
    while(tve.tv_usec < 0) {
        tve.tv_usec += 1000000;
        tve.tv_sec--;
    }
    printf("%d.%03d seconds\n", tve.tv_sec, tve.tv_usec / 1000);
#endif
    free(pred);
    free(queue);
    free(path);
    return 1;
}

/* Frontend to connection */
void connect(const char *s1, const char *s2)
{
    int len = strlen(s1), x, y;

    if(len-- != strlen(s2) || (x = lookup(len, s1)) < 0 ||
       (y = lookup(len, s2)) < 0 || !connection(len, x, y))
        printf("No way to connect %s to %s\n", s1, s2);
}

int main(void)
{
    char line[256], command[64], arg1[64], arg2[64];
    int n;

    printf("? ");
    fflush(stdout);
    while(fgets(line, sizeof(line), stdin)) {
        n = sscanf(line, "%s %s %s\n", command, arg1, arg2);
        if(n > 1 && !strcmp(command, "add-dictionary")) {
            if(add_dictionary(arg1))
                printf("Added %s\n", arg1);
            else
                printf("Error adding %s\n", arg1);
        } else if(n > 2 && !strcmp(command, "connect"))
            connect(arg1, arg2);
        else if(!strcmp(command, "exit"))
            return 0;
        else
            printf("Huh?\n");
        printf("? ");
        fflush(stdout);
    }
    return 0;
}

Logan

question (5.00 / 1) (#128)
by ucblockhead on Fri Apr 26, 2002 at 01:40:01 PM EST

Can we assume that only one dictionary will be loaded at a time?
-----------------------
This is k5. We're all tools - duxup
For you mind warping pleasure. (4.00 / 1) (#136)
by Cal Bunny on Sat Apr 27, 2002 at 05:09:17 AM EST

I only had about 10 minutes to work on a solution, so this is what you would call a first draft. It completely recalculates the transition matrices each time a new dictionary is added (I will fix this to only recalculate what is necesary). To make it worse, it overcalculates the transition matrices (does twice the amount of work necessary), too. It uses a simple breath-first search (I plan to improve it to only keep track of the fringe and maybe move it to dual-*).

On my PII-233 w/128M running Win2000 Workstation it takes a while to load the files and calculate the adjacency matrices. After that, though most queries run in under a second.

/-------------- group by count, 1 equal sum cross-product --------------
w:()
add_dictionary:{w,:0:x;g::group w;a::transitions'g}
group:{x@((=n),,!0)(?n)?/:!1+|/n:#:'x}
transitions:{&:'1=+/''~x=\:/:x}
/-------------- intersection scan reverse union scan ----------------
connect:{`0:{(-#y)_,/x,\:y}[co[x;y];" > "],"\n"}
co:{A::a[#x]; G::g[#x]; G@h,|bfs[h:G?x;G?y]}
bfs:{y{*x@&(x:A x)_lin y}\1_|(~y _in)(?,/A@)\A x}
/--------------- command processor ------------------------------------
c:""; exit:{}; sp:{1_'(&x=*x)_ x:y,x}[;" "]
while[~"exit"~*c;`0:"? ";c:sp _ssr[;"-";"_"](0:`);(.*c). 1_ c]
\\


^cb^
Kudos to you for warping my fragile little mind. - communist
Quickie Prototype (5.00 / 1) (#137)
by KWillets on Sat Apr 27, 2002 at 02:55:53 PM EST

This is a working prototype, but my mouse (or the motherboard it's connected to) keeps failing, so I'll post quick before I lose it again. Sorry about the formatting. It's hard-coded with a testcase until I manage to download a dictionary.

That's perl, by the way.
===
my %dictionary = ( "cab" => 1, "cat" => 1, "cot" => 1, "can" => 1, "ban" => 1, "bun" => 1 );
my $startword = "cab";
my $endword = "bun";
my %unvisited = %dictionary;
my %frontier = ( $startword => "");
delete $unvisited{ $startword };
my %newfrontier;
my $src, $path;

print "start = $startword / end = $endword \n";
#
# Breadth-first search
# The first visit to a word is minimal distance; later visits can be ignored
# Here each word is deleted as soon as it is visited
# on each iteration, frontier set is used to generate next frontier set
# join method is to generate candidates from src string and test in hash
# each hit is moved into new frontier set with path from src
# stop when frontier reaches endword, or frontier is empty
#
while( %frontier && ! exists ( $frontier{ $endword } )) {
while( ($src, $path) = each %frontier ) {
   for( my $i = 0; $i < length($src); $i++ ) {
    for ( $c = ord "a", my $tst = $src; $c <= ord "z"; $c++ ) {
      substr( $tst, $i, 1 ) = chr $c;
      if( delete $unvisited{ $tst } ) {
       $newfrontier{ $tst } = "$path\n$src" ;
      }
    }
   }
}
%frontier = %newfrontier;
%newfrontier = ();
}

print "DONE:\n $frontier{ $endword }\n$endword\n\n";



Another interesting test case (none / 0) (#141)
by tmoertel on Sat Apr 27, 2002 at 04:48:25 PM EST

Here's a variant on i's test case. This one attempts to measure how efficiently an implementation handles reverse and repeated queries, i.e., how smart it is at caching.
add-dictionary WORD.LST
connect make love     
connect bread wheat   
connect resist futile 
connect effaces cabaret
connect make love     
connect bread wheat   
connect resist futile 
connect effaces cabaret
connect love make     
connect wheat bread   
connect futile resist 
connect cabaret effaces
connect love make     
connect wheat bread   
connect futile resist 
connect cabaret effaces
connect make love     
connect bread wheat   
connect resist futile 
connect effaces cabaret
connect love make     
connect wheat bread   
connect futile resist 
connect cabaret effaces
connect make love     
connect bread wheat   
connect resist futile 
connect effaces cabaret
connect love make     
connect wheat bread   
connect futile resist 
connect cabaret effaces
connect wood fire
connect climb rocks
connect stones chisel
connect amperes voltage
connect wood fire
connect climb rocks
connect stones chisel
connect amperes voltage
connect fire wood
connect rocks climb
connect chisel stones
connect voltage amperes
connect fire wood
connect rocks climb
connect chisel stones
connect voltage amperes
connect wood fire
connect climb rocks
connect stones chisel
connect amperes voltage
connect fire wood
connect rocks climb
connect chisel stones
connect voltage amperes
connect wood fire
connect climb rocks
connect stones chisel
connect amperes voltage
connect fire wood
connect rocks climb
connect chisel stones
connect voltage amperes
exit
Note that the new test case asks for sixteen times as many connections as the original, and that there is no connection between "chisel" and "stones" and no connection between "amperes" and "voltage".

As a basis for comparison, on a Celeron 433 my implementation took 28.1 user seconds for i's orignal test case and 63.5 user seconds for 16-times variant. Each measurement was taken via bash's time built-in command, as follows:

time ./tgm-word-connect < bench16x.tst
and I cited the "user" line from the results:
real    1m4.435s
user    1m3.480s
sys    0m0.960s

Anyone care to try the test and post the results? I'm especially curious about how the C/C++ and Perl implementations fare.

--
My blog | LectroTest

[ Disagree? Reply. ]


A supplemental dict: 6926 words not in WORD.LST (none / 0) (#155)
by tmoertel on Sun Apr 28, 2002 at 01:50:05 PM EST

Here are 6926 additional words, none of which are in the WORD.LST I linked to earlier:
WORD2.LST.bz2
Here's an example of it in action:
$ ./tgm-word-connect
? add-dictionary WORD.LST
Added WORD.LST
? connect aba ode
aba > ala > ale > ole > ode
? add-dictionary WORD2.LST
Added WORD2.LST
? connect aba ode
aba > abe > obe > ode
? exit

--
My blog | LectroTest

[ Disagree? Reply. ]


Ok I think I finally got it right. (none / 0) (#156)
by i on Sun Apr 28, 2002 at 02:49:33 PM EST

Runs in constant (log actually but who cares) space, and speed is on par with tmoertel's, so I consider my job done :)

Thanks to wonders of laziness (tying a knot) and strictness (foldl').

-- PFC4 entry
-- Username: 'i'
-- Version: 0.3
-- Compilation (on Un*x): ghc -O2 ladder.hs -o ladder -package data
-- You probably need at least GHC5.0x
-- Run: ladder [+RTS <runtime system switches>]
-- Runtime switches help: ladder +RTS -\?
-- Fiddling with them may improve performance, but only marginally
--
-- Interpreting
-- With Hugs (SLOW): hugs ladder.hs -h20000000
-- With GHCi: ghci ladder.hs -package data
-- then type `main' at the command prompt

module Main where

import FiniteMap hiding (addListToFM, addListToFM_C)
import Maybe
import IO
import IOExts
import qualified SimpleQueue as Q
import qualified Set as S
import List

-- graph data structure
-- each vertex has a label and a set (list) of adjacent vertices
data Vertex = Vertex String [Vertex]

-- useful instances
instance Eq Vertex where
  (==) (Vertex x _) (Vertex y _) = x == y

instance Ord Vertex where
  compare (Vertex x _) (Vertex y _) = compare x y
  (<=) (Vertex x _) (Vertex y _) = x <= y

-- This is a secondary data structure: maps splits to list of words
-- like c?t => [cat,cut]
type SplitMap = FiniteMap String [String]

-- This is the primary data structure
-- maps word to vertex of the graph
type WordMap  = FiniteMap String Vertex

-- all words in a SmallWordDB are of equal length
type SmallWordDB = (SplitMap, WordMap)

-- this maps word length to a SmallWordDB
type WordDB          = FiniteMap Int SmallWordDB

--
-- Main logic and algorithms
--

-- addToWordDB adds list of words to the database
addToWordDB :: WordDB -> [String] -> WordDB
addToWordDB db ws = plusFM_C combineSmallWordDBs db db' where
  db' :: WordDB
  db' = mapFM (\_->listToSmallWordDB) (partitionWords ws)

-- utility for addToWordDB
-- combines two graps into a single graph
combineSmallWordDBs :: SmallWordDB -> SmallWordDB -> SmallWordDB
combineSmallWordDBs (sm1, wm1) (sm2, wm2) = (sm, wm) where
  sm = plusFM_C (++) sm1 sm2
  wm' = plusFM_C ev wm1 wm2
  wm = rekeyWordMap wm wm' sm
  ev (Vertex x _) _ = Vertex x []

-- utility for addToWordDB
-- partitionWords partitions the input list by length
-- the output is a map from length to list of words
partitionWords :: [String] -> FiniteMap Int [String]
partitionWords words = addListToFM_C (flip (++)) emptyFM words' where
  words' = map (\x->(length x, [x])) words

-- utility for addToWordDB
-- build a graph
-- given a list of words, build a map from words to vertices
listToSmallWordDB :: [String] -> SmallWordDB
listToSmallWordDB ws = (splitMap, realWordMap) where
  dummyWM =  addListToFM emptyFM (map (\x->(x, Vertex x [])) ws)
  splitMap = addListToFM_C (flip (++)) emptyFM (splitws ws)
  realWordMap = rekeyWordMap realWordMap dummyWM splitMap
  splitws ws = concatMap splits' ws
  splits' w = zip (splits w) (repeat [w])

-- tying the knot!
-- build a map from words to vertices
-- where vertex is a word and a list of adjacent vertices
-- this definition is recursive, so we can walk the graph
-- directly, without having to look up e.g. edges from a
-- vertex
rekeyWordMap :: WordMap -> WordMap -> SplitMap -> WordMap
rekeyWordMap newWM oldWM sm = mapFM rekeyWord oldWM where
  rekeyWord :: String -> Vertex -> Vertex
  rekeyWord x _ = Vertex x (map head $ group $ sort $ vertices x)
  vertices x = catMaybes $ map (lookupFM newWM) (adjs x)
  adjs :: String -> [String]
  adjs x = adjacents sm x

-- given a word and a map from splits to lists of words,
-- return all words adjacent to given one
adjacents :: SplitMap -> String -> [String]
adjacents sdb word = concatMaybes $ map (lookupFM sdb) (splits word) where
  concatMaybes = concat . catMaybes


-- return all splits of a word: cat => [?at, c?t, ca?]
splits :: String -> [String]
splits [] = []
splits (x:xs) = ('?':xs) : map (x:) (splits xs)

-- implementation of add-dictionary
addDict :: FilePath -> WordDB -> IO WordDB
addDict path db = catch addDict' report where
  addDict' = do
          handle <- openFile path ReadMode
          contents <- hGetContents handle
          let db' = addToWordDB db (words contents)
          putStrLn $ "Added " ++ path
          return db'
  report _ = putStrLn ("Error adding " ++ path) >> return db

-- implementation of connect
connect :: WordDB -> String -> String -> IO ()
connect db word1 word2 = do
  let path = findPath word1 word2 db
  case path of
          [] -> putStrLn $ "No way to connect " ++ word1 ++ " to " ++ word2
          xs  -> putStrLn $ showPath xs where
                  showPath = foldr1 (\a b -> a ++ " > " ++ b)

-- findPath is a utility for connect
findPath :: String -> String -> WordDB -> [String]
findPath from to db
  | len /= len' = []
  | isNothing from' = []
  | isNothing to' = []
  | otherwise = mapv $ findChain (\(Vertex x vs) -> vs) from'' to''
  where
      (sm, wm) = lookupWithDefaultFM db (emptyFM,emptyFM) len
      len = length from
      len' = length to
      from' = lookupFM wm from
      to' = lookupFM wm to
      from'' = fromJust from'
      to'' = fromJust to'
      mapv = map (\(Vertex x _) -> x)

-- findChain is a utility for findPath
findChain :: Ord a => (a->[a]) -> a -> a -> [a]
findChain f from to = reverse $ first chain where
  chain = dropWhile notFound $ bfsPaths $ toPTree f from
  first (x:_) = x
  first [] = []
  notFound [] = True
  notFound xs = to /= head xs

-- Main loop
doCommands :: WordDB -> IO WordDB
doCommands db = do
  putStr "? "
  hFlush stdout
  command <- getLine
  case (words command) of
          ['e':_] -> return db
          ['a':_, path] -> addDict path db >>= doCommands
          ['c':_, word1, word2] -> connect db word1 word2 >> doCommands db
          _ -> doCommands db

main :: IO ()
main = catch (doCommands emptyFM >> return ()) (\_->return ())

--------------------------------------------------------------

--
-- In the ideal world this would be in a separate module
--

-- Parented tree, so we can walk from a node back to root
data PTree a = PNil | PTree (PTree a) a [PTree a]

-- Functional tree to parented tree
toPTree :: (a->[a]) -> a -> PTree a
toPTree f a = toPTree' f PNil a where
  toPTree' f p a = tree where
          tree = PTree p a trees
          trees = map (toPTree' f tree) (f a)

-- The BFS itself
bfsPaths :: Ord a => PTree a -> [[a]]
bfsPaths PNil = []
bfsPaths tree@(PTree parent node branches) = map walkPath paths where
  (paths,_) = bfs (Q.single tree) (S.emptySet)
  bfs :: Ord a => Q.Seq (PTree a) -> S.Set a -> ([PTree a], S.Set a)
  bfs queue set =
          case Q.lview queue of
                  Q.Nothing2 -> ([], set)
                  Q.Just2 (tree@(PTree p n bs)) queue' ->
                          if S.elementOf n set
                                  then bfs queue' set
                                  else (tree : trees, set'') where
                                          (trees, set'') = bfs (foldl' Q.snoc queue' bs) set'
                                          set' = S.addToSet set n
                  Q.Just2 (PNil) queue' -> bfs queue' set

  walkPath :: PTree a -> [a]
  walkPath PNil = []
  walkPath (PTree p a _) = a:walkPath p

--
-- Misc utilities
--

-- strict variant of addListToFM, addListToFM_C
-- replace those found in the standard FiniteMap
addListToFM :: Ord a => FiniteMap a b -> [(a,b)] -> FiniteMap a b
addListToFM fm pairs = addListToFM_C (\old new -> new) fm pairs

addListToFM_C :: Ord a => (b -> b -> b)
                                          -> FiniteMap a b
                                          -> [(a,b)]
                                          -> FiniteMap a b
addListToFM_C combiner fm key_elt_pairs
  = foldl' add fm key_elt_pairs  -- foldl adds from the left
  where
      add fmap (key,elt) = addToFM_C combiner fmap key elt

-- strict variant of foldl
foldl'                    :: (a -> b -> a) -> a -> [b] -> a
foldl' f a []        = a
foldl' f a (x:xs) = (foldl' f $! f a x) xs


and we have a contradicton according to our assumptions and the factor theorem

Optimizations (none / 0) (#157)
by ucblockhead on Sun Apr 28, 2002 at 05:25:22 PM EST

I found a cool optimization I suspect people may have missed. The version I am working on (and will post as soon as I clean it up) used lots of data structures with strings as keys. String compares are really expensive, though. So I created a separate structure that mapped integers to strings (which works as long as you have less than 2 billion words or so :-) ). I then use these integer handles in all my other data structures.

This doubled the program's speed. It worked well because very little of the algorithm cares about whether or not it is dealing with words. Only the part that checks how "different" two words are, really. And that can be cached.

(I suspect that this would be even easier in C, as you could simply used the character pointer as the "handle".)
-----------------------
This is k5. We're all tools - duxup

Idea for Optimizing for Large Number of Searches (4.00 / 1) (#159)
by Cal Bunny on Sun Apr 28, 2002 at 06:05:44 PM EST

If you think that there will be a large number of searches that walk through most of the graph, I had an interesting idea. Of course a more asymptotically optimal solution would be a chart search, but a simplier solution that requires very little memory would be a variant of an interated deepening depth first search.

In an iterated deepening DFS, you run a DFS search with a maximum depth of 1, if you do not find the goal, you run a new DFS with a depth of 2, and this continues until you find the goal state. An ID-DFS search takes advantage large branching factors. If each node connects to just 2 deeper nodes, then half of your work is done on the last expansion. This is a good solution for simple problems and instances where you havd a hard cutoff. A DFS can be substantially faster than a BFS: it will consume only memory linear to the length of the path, you do not need to record an open list, and you do not need to explicitely record the current shortest path since it is sitting on the stack (if you want to make an incredibly dumb DFS you don't even need to record a closed set, either).

On dictionary load you could run Kruskal's all-pairs shortest path, that is O(n^3) -- for comparison, Dijkstra's single-source shortest path is O(n^2). To solve a query, you lookup what the depth of the shortest path is, then run a DFS search with the correct cutoff. To help direct the DFS you can order the list of potential expansions by the number of characters both strings have in common. On a sparse graph it is easy to see that with a good next exapansion strategy you coule get results close to linear in the length of the string and certainly no worse than O(word_length*avg_braching_factor/2).

Since determining all-pairs shortest path is stronger than the transitive closure, all queries where there is not answer are known immediately.

One of the best preformance gains for ID-DFS is to remember the results of the previous expansions. If you were to remember the results of every expansion you turn into BFS/A*. (Well, you don't turn into anything, but the algorithm expands the same nodes that a BFS/A* would expand).

^cb^

My submission (none / 0) (#170)
by Hopfrog on Mon Apr 29, 2002 at 11:47:26 AM EST

Hell, here is me submission. It doesn't work (apart from on that small collection of words in the sample), and is incredibly mind-boggling slow - about 30 minutes for large dics.

I had to go to military camp, so this is the most I could do.

C++ and STL, uses some windows specific stuff that can be defined away, will compile with msvc++6. I use structs the C++ way, not the C way. If you enable time checking, you need to link the winmm.lib file in.

I made everything inline to make it a bit neater, not for any idealogical reasons.

//

#include "stdafx.h"
#include <iostream>
#include <string>
#include <list>
#include <fstream>
#include <crtdbg.h>
#include <windows.h>
#include <mmsystem.h>

using namespace std;

#define STR_ADD_DIC "add-dictionary"
#define STR_CONNECT "connect"
#define MAX_NODE    2048
#define SAFE_DELETE(x) if(x) { delete x; x = NULL; }
#define SIMILARITY_DEVIANCE 2
#define MEASURE_TIME

#pragma warning (disable : 4786)

struct DictionaryWord
{
            string strWord;
            bool   bInChain;
};

struct WordTree
{
            WordTree() : nSubTreeCount(0), iterWord(NULL)
            {
            }

            int AddSubTree(string& strTreeWord, list<DictionaryWord>::iterator iter)
            {

                        arrSubTree[nSubTreeCount] = new WordTree();
                        arrSubTree[nSubTreeCount]->strWord = strTreeWord;
                        arrSubTree[nSubTreeCount]->iterWord = iter;
                        nSubTreeCount++;
                        return (nSubTreeCount - 1);
            };

            void DestroyTree()
            {
                        // We loop through all trees we created with "new" and delete them

                        for (int i=0;i<nSubTreeCount;i++)
                        {
                                   arrSubTree[i]->DestroyTree();
                                   SAFE_DELETE(arrSubTree[i]);
                        }

                        if (iterWord != NULL)
                            (*iterWord).bInChain = false;
            };

            string strWord;
            list<DictionaryWord>::iterator iterWord;
            int nSubTreeCount;
            WordTree* arrSubTree[MAX_NODE];
};

class Dictionary
{
public:
            bool LoadDictionary(const string& strFilepath)
            {
                        // Load the dictionary into memory.
                        // Words are stored in a list

                        string str;                   

                        ifstream file;
                        file.open(strFilepath.c_str());

                        if (!file.is_open())
                                   return false;

                        while (file.good())
                        {
                                   char szLine[120] = {'\0'};
                                   file.getline(szLine, 120);                                  

                                   if (strlen(szLine) > 0)
                                   {
                                               DictionaryWord wrd;
                                               wrd.strWord = szLine;
                                               wrd.bInChain = false;
                                               m_wordList.push_back(wrd);
                                   }
                        }

                        file.close();
                       return true;
            }

            bool GetSimilarWords(WordTree& tree, bool bVerifyWord)
            {

                       // bVerifyWord, if set to true, makes us check if
                        // the word exists at all in the dictionary. You need
                        // set it to true only once, as all other words definately
                        // exist in the dictionary.
                        bool bWordFound = false;
                        bool bChildFound = false;
                        int nIndex = 0;

                        list<DictionaryWord>::iterator iter;
                        for (iter=m_wordList.begin(); iter != m_wordList.end(); ++iter)
                        {         
                                   if ((*iter).bInChain)
                                               continue;

                                   if (bVerifyWord && !bWordFound && tree.strWord == (*iter).strWord) // lazy eval
                                   {
                                               tree.iterWord = iter;
                                               bWordFound = true;
                                   }
                                   else
                                   {           
                                    int n = AreSimilar(tree.strWord, (*iter).strWord);

                                               if (n == 1)
                                               {
                                                           bChildFound = true;
                                                           tree.AddSubTree((*iter).strWord, iter);
                                               }
                                   }

                                   nIndex++;
                        }

                       
                        if (bChildFound == false || bVerifyWord && bWordFound == false)
                                   return false;

                        return true;
            };

protected:

            int AreSimilar(string& str1, string& str2)
            {

                        // if the same, return 0, if similar, return 1, else
                        // return -1

                       
                        int nFail  = 0;
                        int nSize1 = str1.size();
                        int nSize2 = str2.size();
                        if (nSize1 != nSize2)
                                   return -1;

                        for (int i=0;i<nSize1;i++)
                        {

                                   if (str1.at(i) != str2.at(i))
                                   {
                                               nFail++;
                                   }

                                   if (nFail == SIMILARITY_DEVIANCE) // failed twice.
                                               return -1;
                        }

                        return nFail; //1 -> similar, 0 -> equal
            }

private:
            list<DictionaryWord> m_wordList;
};

class CommandParser
{

public:
            typedef enum {cmdUnkown = 0, cmdAddDictionary, cmdConnect, cmdExit } Command;

           
            Command ParseCommand(string strInput)
            {
                        if (strInput.compare(0, sizeof(STR_ADD_DIC) - 1, STR_ADD_DIC) == 0)
                        {
                                   return cmdAddDictionary;
                        }
                        else if (strInput.compare(0, sizeof(STR_CONNECT) - 1, STR_CONNECT) == 0)
                        {
                                   return cmdConnect;
                        }
                        else if (strInput == "exit")
                        {
                                   return cmdExit;
                        }
                       
                        return cmdUnkown;
            }

            bool ParseAddDictionary(string strInput, string& strFilepath)
            {
                        // We have a command, which idealy would look like
                        // "add-dictionary c:\dic.txt".
                       
                        if (strInput.size() <= sizeof(STR_ADD_DIC))
                                   return false;

                        strFilepath = strInput.substr(sizeof(STR_ADD_DIC));

                        // We don't perform a check for multiple characters
                        // in this case, because spaces are allowed in paths.
                       
                        return true;
            }

           
            bool ParseConnect(string strInput, string& strStartWord, string& strEndWord)
            {
                        // We have a command, which idealy would look like
                        // "connect coat hook". However, it might be typed wrongly
                        // so we also check that
                        if (strInput.size() <= sizeof(STR_CONNECT))
                        {
                                   // The user typed "connect". This would cause
                                   // an access violation if we do the next call,
                                   // so we return here.
                                   return false;
                        }
                       
                        // remove "connect"
                        strInput = strInput.substr(sizeof(STR_CONNECT));
                       
                       
                        // Find the space char separating the first word
                        // from the second
                        string::size_type nPos;
                        nPos = strInput.find(' ');
                        if (nPos == string::npos)
                        {
                                   // If there is no space, we only have one word,
                                   // so we indicate an error
                                   return false;
                        }
                       
                        // We retrieve the two words
                        strStartWord = strInput.substr(0, nPos);
                        strEndWord  = strInput.substr(nPos + 1);
                       
                        // We check if there are more than two parameters
                        // if so, we indicate an error.
                        if (strEndWord.find(' ') != string::npos)
                        {
                                   return false;
                        }
                        return true;
            }
};
class WordChain : public Dictionary
{
public:
            static bool ConnectWords(string strStartWord, string strEndWord, string& strChain, Dictionary& dic)
            {
                        bool bChained = false;
                        strChain = strStartWord + " > ";
                        WordTree tree;
                        tree.strWord = strStartWord;
                       
                        WordTree* pLink = GetNextLinksInChain(dic, tree, strEndWord, bChained, true);
                        while (pLink && !bChained)
                        {
                                   strChain += pLink->strWord + " > ";
                                   pLink = GetNextLinksInChain(dic, *pLink, strEndWord, bChained, false);
                        }
                       
                        if (bChained && pLink)
                        {
                                   strChain += pLink->strWord + "\n";
                        }
                       
                        tree.DestroyTree();
                        return bChained;
            }
            static WordTree * GetNextLinksInChain(Dictionary& dic, WordTree& tree, string& strEndWord, bool& bComplete, bool bVerifyWord)
            {
                        WordTree * pResult = NULL;
                        if (dic.GetSimilarWords(tree, bVerifyWord))
                        {
                                   (*tree.iterWord).bInChain = true;
                                  
                                   for (int i=0;i<tree.nSubTreeCount;i++)
                                   {
                                               if (tree.arrSubTree[i]->strWord == strEndWord)
                                               {
                                                           bComplete = true;
                                                           return tree.arrSubTree[i];
                                               }
                                               if (dic.GetSimilarWords(*tree.arrSubTree[i], false))
                                               {
                                                           // tree.strWord is definately in our chain
                                                           // We hide it so that future lookups do not use
                                                           // it
                                                           (*tree.iterWord).bInChain = true;
                                                           pResult = tree.arrSubTree[i];
                                               }
                                   }
                        }
                       
                        return pResult;
            }
};
int main(int argc, char* argv[])
{
            char szInput[200] = {'\0'};
            WordChain dic;
            CommandParser cmd;
            while (true)
            {
                        //////
                        // Display prompt and get command
                        cout << "?";
                        cin.get(szInput, 200, '\n');
                        cin.get();
                       
                        ////
                        // Parse command
                        CommandParser::Command cmdType = cmd.ParseCommand(szInput);
                        switch (cmdType)
                        {
                        case CommandParser::cmdUnkown:
                                   {         
                                               cout << "\"" << szInput << "\": This command does not exist. Use \"exit\" to exit.\n";
                                   }
                                   break;
                                   /////////////////////////////////////
                        case CommandParser::cmdAddDictionary:
                                   {
                                               string strFilepath;
                                               if (cmd.ParseAddDictionary(szInput, strFilepath) == true)
                                               {
                                                           if (dic.LoadDictionary(strFilepath))

                                                           {
                                                                       cout << "Added " << strFilepath << "\n";
                                                           }
                                                           else
                                                                       cout << "Error adding " << strFilepath << "\n";
                                               }
                                               else
                                                           cout << "Invalid parameters. Use \"add-dictionary filename\"\n";                                 
                                   }
                                   break;
                                   /////////////////////////////////////    
                        case CommandParser::cmdConnect:
                                   {
                                               string strStartWord, strEndWord, strChain;
                                               if (cmd.ParseConnect(szInput, strStartWord, strEndWord) == true)
                                               {
#if defined(MEASURE_TIME)
                                                           DWORD dwStart = timeGetTime();
#endif
                                                           if (dic.ConnectWords(strStartWord, strEndWord, strChain, dic) == false)
                                                           {
                                                                       cout << "No way to connect " << strStartWord << " to " << strEndWord << "\n";
                                                           }
                                                           else
                                                           {
                                                                       cout << strChain;
                                                           }
#if defined(MEASURE_TIME)
                                                           DWORD dwEnd = timeGetTime();
                                                           cout << "Completed in " << dwEnd - dwStart << " milliseconds\n";
#endif
                                               }
                                               else
                                                           cout << "Invalid parameters. Use \"connect word1 word2\"\n";
                                    }
                                   break;
                                   //////////////////////////////////
                       
                        case CommandParser::cmdExit:
                                   {
                                               return 0;
                                   }
                                   break;
                                   //////////////////////////////////
                        }
            }
            return 0;
}



Final version (none / 0) (#171)
by ucblockhead on Mon Apr 29, 2002 at 12:46:12 PM EST

Damn you to hell! I started out thinking "Hey, I'll just throw together a braindead version". Next thing you know, the hours I put in... Anyway, here's a nonbraindead C++/STL version:

#include <string>
#include <vector>
#include <map>
#include <set>
#include <queue>
#include <iostream>
#include <fstream>
#include <algorithm>
#include <functional>
using namespace std;

// Algorithm:
//
// There are two important data structures here.  The first is a set of all
// of the minimal paths between two words that the program knows about (Dictionary::myPaths).
// The second is a priority queue of paths to combine (Dictionary::myWorkQueue).
//
// The basic algorithm goes like this:
//
// Create a path representing the starting word.  (A one word path.)
// Create a path for each of that word's neighbors (i.e. those that are only one letter
// different.)
// Search for all paths where the end of one is a neighbor of the other.  O(N^2)
//      Put each such pair of paths in the work queue.
//
// While we have items in our work queue
//        Take the pair of paths out of the work queue that is the best guess and combine them.
//        If this is the target
//            Report success
//        If no other path with these endpoints exists [1]
//            Add it to our path list
//            For each path in our path list
//                Search for all paths that are a neighbor of its endpoint.  O(N)
//                    Put each such pair in the word queue.
//
// For this to work, the scoring mechanism has to ensure that shortest paths will be combined first.
// The score for a potential combination is the shortest path from the target start to the target end
// that it could be a part of.  It is the lengths of both paths plus the number of differences
// between each endpoint and the start and end targets.
//
// The following optimizations improve performance:
//
// Ties in scores are broken by the distance to the target.
// Strings aren't used as keys in the main data structures.  Instead, handles are used so
// that all keys are integers.
// The neighbors for a given word are only calculated once, and cached.
// The list of known paths is never thrown out, so subsequent searches have more data to work with.
//
// [1] I don't do this when I put things in the work queue because it is relatively expensive
//     and because the scoring mechanism means that these combinations are less likely to be tested.
//
// Notes:
// One danger of never throwing out path data is that I suspect that it could potentially slow
// down significantly after large numbers of connects.
// Much of the algorithm was designed so that searches didn't need to start from the beginning,
// and an earlier version searched from both ends, hoping to meet in the middle.  This turned
// out to be slower in practice.
// An earlier version also added all sublists of a path and reversed versions of all paths.  This
// also turned out to be slower in practice.
// It should be obvious that I trade memory for speed at every opportunity.  This is almost
// certainly a memory hog.
// I suspect I could get a little more speed by adding a third metric to the score, prefering
// paths to words with lots of neighbors.

typedef int WORDHDL;
typedef set<WORDHDL> WordSet;

// This class stores strings, and returns unique integer handles to those strings.
class WordHdls
{
public:
    WORDHDL AddHandle(const string& aWord)
    {
        map<string,WORDHDL>::iterator it = myWordsLookup.find(aWord);
        if( it != myWordsLookup.end() )
            return it->second;

        myWordsLookup.insert(map<string,WORDHDL>::value_type(aWord,myWords.size()));
        myWords.push_back(aWord);
    }

    WORDHDL GetHandle(const string& aWord) const
    {
        map<string,WORDHDL>::const_iterator it = myWordsLookup.find(aWord);
        if( it != myWordsLookup.end() )
            return it->second;
        return -1;
    }

    inline const string& GetWord(WORDHDL hdl) const
    {
        return myWords[hdl];
    }

    // This returns the set of all handles that are neighbors
    // to the given handle.  Requests are cached, as this is
    // relatively expansive.
    const WordSet& GetNeighbors(WORDHDL hdl)
    {
        map<WORDHDL, WordSet>::iterator it = myNeighborSets.find(hdl);
        if( it != myNeighborSets.end() )
            return it->second;
       
        string aWord = GetWord(hdl);

        WordSet neighbors;
        for(unsigned int i=0;i<aWord.size();i++)
            for(char ch='a';ch<='z';ch++)
                if( aWord[i] != ch ) {
                    string str = aWord;
                    str[i] = ch;
                    if( myWordsLookup.find(str) != myWordsLookup.end() ) {
                        neighbors.insert(GetHandle(str));
                    }
                }


        myNeighborSets.insert(map<WORDHDL,WordSet>::value_type(hdl, neighbors));
        return myNeighborSets[hdl];
    }

    inline int Distance(WORDHDL hdl1,WORDHDL hdl2)
    {
        return Distance(myWords[hdl1],myWords[hdl2]);
    }

    vector<string> GetPath(const vector<WORDHDL>& someHdls)
    {
        vector<string> rc;
        for(vector<WORDHDL>::const_iterator it = someHdls.begin();it!=someHdls.end();it++)
            rc.push_back(GetWord(*it));

        return rc;
    }
    inline void Clear()
    {
        myNeighborSets.clear();
    }
private:
    int Distance(const string& str1, const string& str2)
    {
        int Diffs = 0;
        for(unsigned int i=0;i < str2.size();i++)
            if( str1[i] != str2[i] )
                ++Diffs;

        return Diffs;
    }

    vector<string>  myWords;
    map<string, WORDHDL> myWordsLookup;
    map<WORDHDL, WordSet> myNeighborSets;
};

// A path between two words.
class Link
{
public:
    inline Link(const vector<WORDHDL>& aPath) : myPath(aPath) {}
    inline Link(const WORDHDL& aWord) { myPath.push_back(aWord); }

    inline unsigned int GetLength() const { return myPath.size(); }

    inline const vector<WORDHDL>& GetPathStrings() const { return myPath; }
    inline WORDHDL GetStart() const { return *myPath.begin(); }
    inline WORDHDL GetEnd() const { return *myPath.rbegin(); }

    // Note that we rely on the caller to ensure this is valid.  This is
    // for performance reasons.
    Link* Combine(const Link* Src) const
    {
        vector<WORDHDL> lhs = myPath;
        const vector<WORDHDL>& rhs = Src->myPath;

         lhs.insert(lhs.end(), rhs.begin(),rhs.end());
         return new Link(lhs);
    }

private:
    vector<WORDHDL> myPath;
};

class Dictionary
{
public:
    ~Dictionary() { Clear(); }
    inline void AddWord(const string& aWord) { myWords.insert(myWordHdls.AddHandle(aWord)); }

    // Add any paths that can be combined with this one to the work queue
    void AddScored(Link* aLnkPtr, WORDHDL anEnd)
    {
        unsigned int aScore = aLnkPtr->GetLength();

        WordSet neighbors = myWordHdls.GetNeighbors(aLnkPtr->GetEnd());
        for(WordSet::iterator it=neighbors.begin();it!=neighbors.end();it++) {
            for(multimap<WORDHDL, Link*>::iterator jt = myStarts.lower_bound(*it);
                jt != myStarts.end() && jt->first == *it;
                jt++) {

                Link* anotherLnkPtr=jt->second;

                unsigned int aTargetScore = 0;

                int aDist = myWordHdls.Distance(anotherLnkPtr->GetEnd(), anEnd);
                aTargetScore = aScore + anotherLnkPtr->GetLength();
                aTargetScore += aDist;
                myWorkQueue.insert(WorkQueue::value_type(
                    Score(aTargetScore, aDist),
                    LnkPr(aLnkPtr,anotherLnkPtr) ) );
            }
        }
    }

    inline void AddLink(Link* aLnkPtr, WORDHDL anEnd)
    {
        myPaths.insert(PrLnkMap::value_type(WordPr(aLnkPtr->GetStart(),aLnkPtr->GetEnd() ),aLnkPtr));
        myStarts.insert(multimap<WORDHDL, Link*>::value_type(aLnkPtr->GetStart(), aLnkPtr));

        AddScored(aLnkPtr, anEnd);
    }

    void AddLink(WORDHDL aWord)
    {
        if( 0 == GetPath(aWord,aWord) ) {
            Link* aLnkPtr = new Link(aWord);
            myPaths.insert(PrLnkMap::value_type(WordPr(aWord,aWord),aLnkPtr));
              myStarts.insert(multimap<WORDHDL, Link*>::value_type(aLnkPtr->GetStart(), aLnkPtr));
        }
    }

    void AddNeighbors(Link* aLnkPtr)
    {
        WordSet neighbors = myWordHdls.GetNeighbors(aLnkPtr->GetEnd());
        for(WordSet::iterator it=neighbors.begin();it!=neighbors.end();it++) {
            if( 0 == GetPath(*it,*it) ) {
                Link* anotherLnkPtr = new Link(*it);
                myPaths.insert(PrLnkMap::value_type(WordPr(*it,*it),anotherLnkPtr));
                myStarts.insert(multimap<WORDHDL, Link*>::value_type(anotherLnkPtr->GetStart(), anotherLnkPtr));
            }
        }

    }
    Link* GetPath(WORDHDL aStart, WORDHDL anEnd)
    {
        PrLnkMap::const_iterator it = myPaths.find(WordPr(aStart,anEnd));

        if( it == myPaths.end() )
            return NULL;
        else
            return it->second;
    }


    vector<string> Connect(const string& aStartStr, const string& anEndStr)
    {
        WORDHDL aStart = myWordHdls.GetHandle(aStartStr);
        WORDHDL anEnd = myWordHdls.GetHandle(anEndStr);
       
        // Unknown words already fail.
        if( aStart == -1 || anEnd == -1 )
            return vector<string>();

        // If we've already run into this path, return.  This will only happen
        // on searches with the previous start (though not necessarily with the
        // previous end)
        if( Link* aLnkPtr = GetPath(aStart,anEnd) ) {
            return myWordHdls.GetPath(aLnkPtr->GetPathStrings());
        }

        AddLink(aStart);
        AddNeighbors(GetPath(aStart,aStart));

        myWorkQueue.clear();

        for(PrLnkMap::iterator it=myPaths.begin();it!=myPaths.end();it++) {
            Link* aLnkPtr = it->second;

            if( aLnkPtr->GetStart() == aStart )    {
                AddScored(aLnkPtr, anEnd);
            }
        }

        while(!myWorkQueue.empty())
        {
            Link* aLeftLnkPtr = myWorkQueue.begin()->second.first;
            Link* aRightLnkPtr = myWorkQueue.begin()->second.second;

            myWorkQueue.erase(myWorkQueue.begin());

            Link *aResultLnkPtr = aLeftLnkPtr->Combine(aRightLnkPtr);

            if( myPaths.find(WordPr(aResultLnkPtr->GetStart(),aResultLnkPtr->GetEnd())) == myPaths.end() ) {
                AddNeighbors(aResultLnkPtr);
                AddLink(aResultLnkPtr, anEnd);

                if( aResultLnkPtr->GetStart() == aStart && aResultLnkPtr->GetEnd() == anEnd )
                    return myWordHdls.GetPath(aResultLnkPtr->GetPathStrings());
            }
        }

        return vector<string>();
    }

    void Clear()
    {
        for(PrLnkMap::iterator it=myPaths.begin();it!=myPaths.end();it++)
            delete it->second;

        myPaths.clear();
        myStarts.clear();
        myWordHdls.Clear();
    }
private:
    typedef pair<WORDHDL,WORDHDL> WordPr;
    typedef map<WordPr, Link*> PrLnkMap;

    typedef pair<unsigned int, unsigned int> Score;
    struct Less : public less<Score>
    {
      bool operator()(const Score& x, const Score& y) const
      {
          if( x.first == y.first )
              return x.second < y.second;
          else
              return x.first < y.first;
      }
    };
    typedef pair<Link*,Link*> LnkPr;
    typedef multimap< Score, LnkPr, Less > WorkQueue;


    PrLnkMap myPaths;
    WorkQueue myWorkQueue;

    WordSet myWords;
    multimap<WORDHDL, Link*> myStarts;

    WordHdls myWordHdls;
};

int main()
{
    vector<Dictionary> aDict;

    for(;;)
    {
        cout << "? ";

        string command;
        cin >> command;

        if( command == "exit" )
            return 0;
        else if( command == "add-dictionary") {

            // We have no choice but to dump everything we know if more
            // words are added.
            for(unsigned int i=0;i<aDict.size();i++)
                aDict[i].Clear();

            string dict;
            cin >> dict;

            ifstream in(dict.c_str());
            if( in )
            {
                string word;

                while( !in.eof() ) {
                    in >> word;
                    if( !word.empty() ) {
                        // g++ doesn't seem to do vector correctly.  reserve and resize don't work.
                        while( aDict.size() < word.size() )
                            aDict.push_back(Dictionary());
                        aDict[word.size()-1].AddWord(word);
                    }
                }
            }
        }
        else if( command == "connect" ) {
            string src;
            cin >> src;
            string trgt;
            cin >> trgt;

            vector<string> result;
            if( src.size() == trgt.size() )
                result = aDict[src.size()-1].Connect(src, trgt);

            if( result.size() == 0 )
                cout << "No way to connect " << src << " and " << trgt << "\n";
            else {
                for(vector<string>::iterator jt = result.begin();jt!=result.end();jt++) {

                    if( jt != result.begin() )
                        cout << " > ";
                    cout << *jt;
                }
                cout << "\n";
            }
        }
    }

    return 0;
}

On a 1 Ghz Athlon running Linux, the G++ version gives the following on the testlist defined elsewhere: real 9.43, user 9.13, sys 0.26.
-----------------------
This is k5. We're all tools - duxup

Sadly, (none / 0) (#174)
by trhurler on Mon Apr 29, 2002 at 02:17:09 PM EST

I am not going to be done in time. I haven't really worked on it since last Wednesday due to real life issues, and I don't have time now. However, if/when I finish it, I'll let you know just for comparison's sake. (I could have a stupid slow version done today, but that's pointless, as it is no challenge and I'd have to totally rewrite everything that isn't already written anyway.)

--
'God dammit, your posts make me hard.' --LilDebbie

my revised version (none / 0) (#180)
by wintergreen on Tue Apr 30, 2002 at 02:56:15 AM EST

It's slow, not at all clever, but it works. Kudos to jacob and all who participated, I know I'll be spending some time studying the other solutions.

use strict;

my @DICT;

my $inp;
do {
    print "? ";
    $inp = <>;
    if( $inp =~ /^add-dictionary (.*)$/ ) {
        if(open(FH, $1)) {
            my @newwords = <FH>;
            chomp for @newwords;
            push @DICT, @newwords;
            print "Added $1\n";
        }
        else {
            print "Error adding $1\n";
        }
    }
    elsif( $inp =~ /^connect (.*?) (.*)$/ ) {
        do_connect($1,$2);
    }
    elsif( $inp !~ /^exit$/ ) {
        print "Invalid input.\n";
    }
} while( $inp !~ /^exit$/ );

# branch and bound
sub do_connect {
    my( $w1, $w2 ) = @_;

    unless( length($w1) eq length($w2) ) {
        print "No way to connect $w1 to $w2.\n";
        return;
    }

    my @pathlist = ([$w1]);
    do {
        if( $pathlist[0][-1] eq $w2 ) {
            print join(" > ", @{$pathlist[0]}), "\n";
            return;
        }
        my $newpaths = add_path( shift @pathlist );
        push(@pathlist, @$newpaths) if @$newpaths;
        @pathlist = sort { scalar(@$a) <=> scalar(@$b) } @pathlist;
    } while( @pathlist );

    print "No way to connect $w1 to $w2.\n";
}

sub add_path {
    my $path = shift;
    my @newpaths;

    $" = '|';
    my $used = qr{^(@$path)$};
    my %difs;
    my @end = split / */, $path->[-1];
WORD: foreach my $word (@DICT) {
        next if length($word) != length($path->[-1]);
        next if $word =~ $used;
        my @word = split / */, $word;
        for my $i ( 0..$#word ) {
            $difs{$word}++ if ($word[$i] ne $end[$i]);
            if( $difs{$word} > 1 ) {
                delete $difs{$word};
                next WORD;
            }
        }
    }
    foreach my $newword ( keys %difs ) {
        push @newpaths, [ @$path, $newword ];
    }
    return \@newpaths;
}



When Tuesday (none / 0) (#183)
by Cal Bunny on Tue Apr 30, 2002 at 03:07:24 AM EST

When would you like the entries to be on Tuesday?

^cb^
Kudos to you for warping my fragile little mind. - communist
C++ version of my submission (none / 0) (#184)
by tmoertel on Tue Apr 30, 2002 at 03:11:42 AM EST

I finally got around to re-coding my submission into C++. It processes i's benchmark in 6.9 seconds and the 16X benchmark in about 22.4 seconds on a P2/233 (laptop). Here's the code:

// TGM 20020429
//
// PFC 4 entry -- a C++ translation of my Haskell entry
//
// $Id: tgm.cpp,v 1.5 2002/04/30 07:00:26 thor Exp $
//
// compile: g++ -o tgm -O2 tgm.cpp

#include <stdio.h>     // for perror
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/mman.h>  // can you see what what's coming?

#include <iostream>
#include <vector>
#include <string>
#include <map>
#include <queue>
#include <fstream>

// configurables

static const int INIT_LENGTH_PARTITIONS =  50;

// adapters

struct ltstr
{ bool operator()(const char* s1, const char* s2) const
    { return strcmp(s1, s2) < 0; }
};

// types

typedef map<string, vector<int> > TraceToIndices;
typedef map<const char*, int, ltstr> WordToIndex;
typedef vector<const char*> WordVector;

class WordDatabase
{
    public:

    WordDatabase() : _pending(0) { }

    void connect(const string& from, const string& to);
    void process_pending_words();

    void push_back(const char* s)   { _words.push_back(s); _pending++; }
    int size()                      { return _words.size(); }

    protected:

    WordVector       _words;         // words we know
    WordToIndex      _wti;           // word -> index into _words
    TraceToIndices   _tti;           // trace -> vector<int> of word indices
    int              _pending;       // number of words pending processing
};

typedef vector<WordDatabase> LPWordDatabase;
LPWordDatabase lpwdb(INIT_LENGTH_PARTITIONS);



void WordDatabase::connect(const string& from_str, const string& to_str)
{
    process_pending_words();

    WordToIndex::iterator from_i = _wti.find(from_str.c_str());
    WordToIndex::iterator to_i   = _wti.find(to_str.c_str());

    if (from_i != _wti.end() && to_i != _wti.end()) {

        // do BFS to find path (backwards to avoid later reversal)

        int goal   = from_i->second;
        int start  = to_i->second;

        queue<pair<int,int> > open;
        map<int, int> closed;
        open.push(make_pair<int,int>(start, start));

        // iterate until the open set is empty

        while (!open.empty()) {

            // examine the next node in the open set

            pair<int,int> np_pair = open.front(); open.pop();
            int node = np_pair.first;
            if (closed.find(node) != closed.end())
                continue;  // it has already been closed

            // close this node and remeber its parent for back-tracing

            int parent = np_pair.second;
            closed[node] = parent;

            // is the current node the goal?

            if (node == goal) {
                cout << _words[node];
                while (node != parent) {
                    node = parent;
                    parent = closed[parent];
                    cout << " > " << _words[node];                   
                }
                cout << endl;
                return; // !!! early exit !!!
            }

            // add the node's neighbors to the search

            string trace(_words[node]);
            for (int j = 0; j < trace.size(); j++) {
                char save = trace[j];
                trace[j] = '.';
                vector<int>& trace_neighbors = _tti[trace];
                for (int tni = 0; tni < trace_neighbors.size(); tni++) {
                    int neighbor = trace_neighbors[tni];
                    if (neighbor != node)
                        open.push(make_pair<int,int>(neighbor, node));
                }
                trace[j] = save;
            }
        }
    }

    cout << "No way to connect " << from_str
         << " to " << to_str << "." << endl;
}

void WordDatabase::process_pending_words()
{
    // update word-to-index and trace-to-indices mappings with any
    // words added since the last time they were rebuilt

    for (int i = _words.size() - _pending; i < _words.size(); i++) {
        _wti[_words[i]] = i;
        string trace(_words[i]);
        for (int j = 0; j < trace.size(); j++) {
            char save = trace[j];
            trace[j] = '.';
            _tti[trace].push_back(i);
            trace[j] = save;
        }
    }
    _pending = 0;
}

static void do_add_database();
static void do_connect();


// MAIN

int main(int, char**)
{
    string cmd;

    while (cin) {
        cout << "? ";
        cin >> cmd;
        if (cmd[0] == 'a') do_add_database();
        else if (cmd[0] == 'c') do_connect();
        else break;
    }
}


static off_t file_size(const char* filename)
{
    struct stat st;
    off_t len = 0;
    if (stat(filename, &st) < 0)
        perror("stat in file_size");
    else
        len = st.st_size;
    return len;
}

static void do_add_database()
{
    // the name of the game here is speeeeeeed

    string filename;
    cin >> filename;

    char *dict;

    // open the dictionary file

    int fd = open(filename.c_str(), O_RDONLY);

    if (fd >= 0) {

        // try to mmap it *all* into memmory (private)

        off_t len = file_size(filename.c_str());
        if (len && (dict = (char*) mmap(0, len, PROT_READ|PROT_WRITE,
                                        MAP_PRIVATE, fd, 0))) {

            // now that the dictionary is mmaped, convert all the
            // words within it into c_strings (in place!) and add them
            // into the appropriate length-partitioned word list

            char* dict_end = dict + len;
            while (dict < dict_end) {  // dict points to start of a word
                char *p;
                for (p = dict; p < dict_end && *p != '\n'; p++)
                    ; // find newline at end of word
                if (p < dict_end) {
                    *p = '\0';
                    int wordlen = p - dict;
                    if (wordlen >= lpwdb.size())
                        lpwdb.resize(wordlen+1);
                    lpwdb[wordlen].push_back(dict);
                }
                dict = p + 1;
            }
            cout << "Added " << filename << endl;
        }
        close(fd); // leaves mapping open (we have pointers into it)
        return; // !!! EARLY EXIT !!!
    }

    cout << "Error adding " << filename << endl;
}

static void do_connect()
{
    string from, to;
    cin >> from >> to;
    int len = from.length();
    if (len < lpwdb.size())
        lpwdb[len].connect(from, to);
    else
        cout << "No way to connect " << from << " to " << to << "." << endl;
}

--
My blog | LectroTest

[ Disagree? Reply. ]


Python 2.2 (none / 0) (#185)
by djotto on Tue Apr 30, 2002 at 06:04:07 AM EST

I'm not happy with this, but I've run out of time to test and tweak. I suspect that all I'm going to prove is that simple entries win out over complex ones.

I gave up trying to persuade Scoop to post, and I'm not about to install Perl, so my entry is available here.

It does a BFS simultaneously from both ends, and looks for them to join up in the middle. Checking for the boundaries to join in the middle is expensive, so it only does it occasionally.

Small wordlists are parsed into dictionaries where word => [list of neighbours]. (This pre-processing was dog-slow until I stole the x=y, y=x solution from MK77's elegant entry (this is how Python should be done)). Once the wordlist grows over a certain size, the pre-processing is abandoned in favour of run-time processing.



(Spoiler?) Some data observations (none / 0) (#205)
by KWillets on Tue Apr 30, 2002 at 09:16:09 PM EST

Hope this isn't a spoiler, but I thought some insight into the structure of the data might be interesting.

I was trying to some pre-indexing of adjacent words, building lists of "metawords" as one person called them, and I noticed a few things about WORD.LST.  

For one thing, about half the words have no connections to other words at all.  The longer ones in particular are often unlinked to any other word.    These could be deleted with little loss in functionality.  A large number of words only have one or two neighbors and often these groups are isolated from the rest of the dictionary.

My indexing was consuming a large amount of time trying words which had no connections, so I stopped indexing words longer than seven characters.  The longer words seemed to have a much lower hit rate for connections.  This step at least made the process terminate in a reasonable time, a few minutes for the perl version.  

Since there were so many singleton sets, I saved memory by only saving sets which had two or more entries (the only useful ones).  So far the method I use to find these sets is to scan the dictionary, generate all variants from the first word found, test them in a hash of the dictionary, and save them in a list as found.  That way I know right away if the list has more than one element, and I can save it or discard it once the list is done.

Another single-pass method might be to create singleton entries the first time a metaword is found, but delete them when it is obvious that no further matches in the (sorted) dictionary exist.  For instance, if you're looking for all variants of "dissolve_" (dissolves, dissolver, dissolved), you're finished once you get to "distemper" or some later entry, since the ordering guarantees no further prefix matches.  I'm thinking over ways to do this generate-and-cleanup without tons of code.

These are the perl subs for building the index; I haven't actually used the index for anything yet.

The output in %ix is "metaword" => ( index1, index2...)  where index1..n are array indices in @dictionary

eg:

"announce." => ( index of "announcer", "announces" in @dictionary )

The #'s can be removed to print out the index entries; the output might be interesting for those wanting to explore the data.
-----------------------

my @dictionary ;
my %dictix;

sub adddictionary {
    my $fname = shift;
    open DICT, "< $fname" || croak("Cannot open $fname\n");
    while( <DICT> ) {
    s/[^a-z]//g ;
    push( @dictionary, $_ );
    $dictix{ $_ } = $#dictionary ;  # reference by array index
    }
    close DICT;
    indexdictionary();
}

my %ix;

sub indexdictionary {
    my $wrd, $x;
    for(my $j = 0;  $j < $#dictionary; $j++ ) {
    my $word = @dictionary[ $j ];
    next if length( $word ) > 7;
    for( my $i = length( $word ) -1; $i >= 0;  $i-- ) {
        my $w2 = $word ;
        substr( $w2, $i, 1  ) =  ".";
        if( defined $ix{ $w2 } ) {
        } else {
#        print "\n$w2 ";
        my @hits;
        for ( $c = ord substr( $w2, $i, 1 ), my $tst = $w2; $c <= ord "z"; $c++ ) {
            substr( $tst, $i, 1 ) = chr $c;
            if( exists $dictix{ $tst } ) {
#            print "$tst ";
            push @hits, $dictix{ $tst };
            }
        }
        if( $#hits > 0 ) {
#            print "                  ****";
            $ix{ $w2 } = @hits;
        }
        }
    }
    }
}

Update (5.00 / 1) (#213)
by jacob on Thu May 02, 2002 at 04:01:31 PM EST

I have now commenced scoring each program. As I have a busy weekend, scores will likely not be announced until Wednesday or Thursday.

I'd like to thank everyone for their interest and enthusiasm. It has been really fun to see the amazing quality of work people have put in to this challenge. I'm really impressed with all of you who put energy into it.

--
"it's not rocket science" right right insofar as rocket science is boring

--Iced_Up

too late for judging, but super fast (none / 0) (#216)
by svillee on Thu May 09, 2002 at 10:09:38 PM EST

/*  based on tmoertel's algorithm, reimplemented in C  */

/*  time on a Celeron 700 MHz:  */

/*  i's test:             2.16 seconds  */
/*  tmoertel's 16X test:  2.81 seconds  */

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_INPUT_LENGTH 4096
#define MAX_VERB_LENGTH 32
#define MAX_WORD_LENGTH 1024

/*  A chain string may represent either a word or a trace.  Any code  */
/*  working with this structure either knows statically which one it  */
/*  is or doesn't care.                                               */

typedef struct ChainString_s {
  struct ChainString_s *left;
  struct ChainString_s *right;
  int                   balance;
  struct ChainString_s *nextPending;  /*  unused for trace  */

  /*  For a word, link is a pointer to an array of trace links.  */
  /*  For a trace, link is the head of a list of word links.     */

  struct Link_s *link;

  char data[1];  /*  variable length  */
} ChainString_t;

/*  A link is a record that a certain word matches a certain trace.  */

typedef struct Link_s {
  struct Link_s *nextSameTrace;
  ChainString_t *word;
  ChainString_t *trace;
  struct Link_s *nextActive;

  /*  The next two fields are meaningful only for the current connect.  */
  /*  If neighbor is not null, it means there is a chain from this      */
  /*  word to the target word, wherein this word and the specified      */
  /*  neighbor both match this trace.                                   */

  struct Link_s *nextInOrbit;
  struct Link_s *neighbor;
} Link_t;

static int            length;
static Link_t        *firstActive;
static ChainString_t *topWordByLength[MAX_WORD_LENGTH+1];
static ChainString_t *topTraceByLength[MAX_WORD_LENGTH+1];
static ChainString_t *firstPendingByLength[MAX_WORD_LENGTH+1];
static char           input[MAX_INPUT_LENGTH+1];
static char           verb[MAX_VERB_LENGTH+1];
static char           word1[MAX_WORD_LENGTH+1];
static char           word2[MAX_WORD_LENGTH+1];
static char           word3[MAX_WORD_LENGTH+1];

static ChainString_t *findWord(char *data)
{
  ChainString_t *wp;
  int m;

  wp = topWordByLength[length];
  while (wp != (ChainString_t *) 0) {
    m = strcmp(data, wp->data);
    if (m == 0)
      break;
    if (m < 0) {
      wp = wp->left;
    } else {
      wp = wp->right;
    }
  }
  return wp;
}

void insertNode(
  ChainString_t  *newNode,
  ChainString_t **where,
  int            *outerGrew)
{
  ChainString_t *nodeA;
  ChainString_t *nodeB;
  ChainString_t *nodeC;
  int innerGrew;

  nodeA = *where;
  if (nodeA == (ChainString_t *) 0) {
    /*  The tree was empty, so make the new node the only one.  */

    newNode->left = (ChainString_t *) 0;
    newNode->right = (ChainString_t *) 0;
    newNode->balance = 0;
    *where = newNode;
    *outerGrew = 1;
  } else if (strcmp(newNode->data, nodeA->data) < 0) {
    /*  Add the new node to the left subtree.  */

    insertNode(newNode, &nodeA->left, &innerGrew);
    if (!innerGrew) {
      *outerGrew = 0;
    } else if (nodeA->balance == 1) {
      /*  The right subtree height was 1 more, and now they are the same.  */

      nodeA->balance = 0;
      *outerGrew = 0;
    } else if (nodeA->balance == 0) {
      /*  The subtree heights were the same, and now the left subtree  */
      /*  height is 1 more.                                            */

      nodeA->balance = -1;
      *outerGrew = 1;
    } else {
      /*  The left subtree height was 1 more, and now it is 2 more,  */
      /*  so we must rebalance.  Examine the left subtree.           */

      nodeB = nodeA->left;
      nodeC = nodeB->right;
      if (nodeB->balance == -1) {
        /*      A              B      */
        /*     / \            / \     */
        /*    B   Z   --->   W   A    */
        /*   / \                / \   */
        /*  W   C              C   Z  */

        nodeA->left = nodeC;
        nodeA->balance = 0;
        nodeB->right = nodeA;
        nodeB->balance = 0;
        *where = nodeB;
      } else {
        /*      A               C       */
        /*     / \            /   \     */
        /*    B   Z   --->   B     A    */
        /*   / \            / \   / \   */
        /*  W   C          W   X Y   Z  */
        /*     / \                      */
        /*    X   Y                     */

        nodeB->right = nodeC->left;
        if (nodeC->balance == 1) {
          nodeB->balance = -1;
        } else {
          nodeB->balance = 0;
        }
        nodeA->left = nodeC->right;
        if (nodeC->balance == -1) {
          nodeA->balance = 1;
        } else {
          nodeA->balance = 0;
        }
        nodeC->left = nodeB;
        nodeC->right = nodeA;
        nodeC->balance = 0;
        *where = nodeC;
      }
      *outerGrew = 0;
    }
  } else {
    /*  Add the new node to the right subtree.  */

    insertNode(newNode, &nodeA->right, &innerGrew);
    if (!innerGrew) {
      *outerGrew = 0;
    } else if (nodeA->balance == -1) {
      /*  The left subtree height was 1 more, and now they are the same.  */

      nodeA->balance = 0;
      *outerGrew = 0;
    } else if (nodeA->balance == 0) {
      /*  The subtree heights were the same, and now the right subtree  */
      /*  height is 1 more.                                             */

      nodeA->balance = 1;
      *outerGrew = 1;
    } else {
      /*  The right subtree height was 1 more, and now it is 2 more,  */
      /*  so we must rebalance.  Examine the right subtree.           */

      nodeB = nodeA->right;
      nodeC = nodeB->left;
      if (nodeB->balance == 1) {
        /*      A              B      */
        /*     / \            / \     */
        /*    W   B   --->   A   Z    */
        /*       / \        / \       */
        /*      C   Z      W   C      */

        nodeA->right = nodeC;
        nodeA->balance = 0;
        nodeB->left = nodeA;
        nodeB->balance = 0;
        *where = nodeB;
      } else {
        /*      A               C       */
        /*     / \            /   \     */
        /*    W   B   --->   A     B    */
        /*       / \        / \   / \   */
        /*      C   Z      W   X Y   Z  */
        /*     / \                      */
        /*    X   Y                     */

        nodeB->left = nodeC->right;
        if (nodeC->balance == -1) {
          nodeB->balance = 1;
        } else {
          nodeB->balance = 0;
        }
        nodeA->right = nodeC->left;
        if (nodeC->balance == 1) {
          nodeA->balance = -1;
        } else {
          nodeA->balance = 0;
        }
        nodeC->left = nodeA;
        nodeC->right = nodeB;
        nodeC->balance = 0;
        *where = nodeC;
      }
      *outerGrew = 0;
    }
  }
}

static void doAddDictionary(void)
{
  FILE *fp;
  int grew;
  int n;
  ChainString_t *wp;
  Link_t *lp;

  fp = fopen(word1, "r");
  if (fp == (FILE *) 0) {
    fprintf(stderr, "Error adding %s\n", word1);
    return;
  }
  while (fscanf(fp, "%1024s", word2) > 0) {
    /*  See if the word is already in the dictionary.  */

    length = strlen(word2);
    if (findWord(word2) != (ChainString_t *) 0)
      continue;

    /*  Allocate a block to hold the array of links  */
    /*  followed by the word itself.                 */

    n = length * sizeof (Link_t);
    lp = (Link_t *) malloc(n + sizeof (ChainString_t) + length);
    if (lp == (Link_t *) 0) {
      fprintf(stderr, "out of memory for word\n");
      exit(-1);
    }
    wp = (ChainString_t *) (((char *) lp) + n);
    wp->link = lp;
    strcpy(wp->data, word2);

    /*  Add it to the balanced tree of words.  */

    insertNode(wp, &topWordByLength[length], &grew);

    /*  Add it to the list of words that need traces calculated.  */

    wp->nextPending = firstPendingByLength[length];
    firstPendingByLength[length] = wp;
  }
  fclose(fp);
  printf("Added %s\n", word1);
}

static void calculatePendingTraces(void)
{
  ChainString_t *wp;
  ChainString_t *tp;
  Link_t *lp;
  int grew;
  int m;
  int n;
  char saveC;

  wp = firstPendingByLength[length];
  if (wp != (ChainString_t *) 0) {
    do {
      lp = wp->link;
      strcpy(word3, wp->data);

      /*  Calculate traces by substituting dot ('.')  */
      /*  for each letter of the word.                */

      for (n = 0; n < length; ++n) {
        saveC = word3[n];
        word3[n] = '.';

        /*  See if the trace already exists.  */

        tp = topTraceByLength[length];
        while (tp != (ChainString_t *) 0) {
          m = strcmp(word3, tp->data);
          if (m == 0)
            break;
          if (m < 0) {
            tp = tp->left;
          } else {
            tp = tp->right;
          }
        }
        if (tp == (ChainString_t *) 0) {
          /*  Allocate a new trace.  */

          tp = (ChainString_t *) malloc(sizeof (ChainString_t) + length);
          if (tp == (ChainString_t *) 0) {
            fprintf(stderr, "out of memory for trace\n");
            exit(-1);
          }
          tp->link = (Link_t *) 0;
          strcpy(tp->data, word3);

          /*  Add it to the balanced tree of traces.  */

          insertNode(tp, &topTraceByLength[length], &grew);
        }

        /*  Fill in this link.  */

        lp->word = wp;
        lp->trace = tp;
        lp->neighbor = (Link_t *) 0;
        lp->nextSameTrace = tp->link;
        tp->link = lp;
        ++lp;
        word3[n] = saveC;
      }
      wp = wp->nextPending;
    } while (wp != (ChainString_t *) 0);
    firstPendingByLength[length] = (ChainString_t *) 0;
  }
}

static void doConnect(void)
{
  ChainString_t *target1;
  ChainString_t *target2;
  ChainString_t *wp;
  int n;
  Link_t *lp;
  Link_t *lp2;
  Link_t *lp3;
  Link_t *firstInOrbit;
  Link_t endMark;

  length = strlen(word1);
  if (strlen(word2) == length) {
    target1 = findWord(word1);
    target2 = findWord(word2);
    if ((target1 != (ChainString_t *) 0) && (target2 != (ChainString_t *) 0)) {
      if (target1 == target2) {
        printf("%s\n", word1);
        return;
      }
      calculatePendingTraces();

      /*  Reset any active links.  */

      lp = firstActive;
      if (lp != (Link_t *) 0) {
        do {
          lp->neighbor = (Link_t *) 0;
          lp = lp->nextActive;
        } while (lp != (Link_t *) 0);
        firstActive = (Link_t *) 0;
      }

      /*  Set up a dummy link as the chain end mark.  */

      endMark.word = target2;
      endMark.nextInOrbit = (Link_t *) 0;
      endMark.neighbor = (Link_t *) 0;

      /*  Build successive orbits from target2 until target1  */
      /*  is reached or the outermost orbit is empty.         */

      firstInOrbit = &endMark;
      while (firstInOrbit != (Link_t *) 0) {
        lp = firstInOrbit;
        firstInOrbit = (Link_t *) 0;
        while (lp != (Link_t *) 0) {
          wp = lp->word;
          lp2 = wp->link;
          for (n = 0; n < length; ++n) {
            if (lp2 != lp) {
              lp3 = lp2->trace->link;
              while (lp3 != (Link_t *) 0) {
                if ((lp3 != lp2) && (lp3->neighbor == (Link_t *) 0)) {
                  if (lp3->word == target1) {
                    printf("%s", target1->data);
                    lp2 = lp;
                    while (lp2 != (Link_t *) 0) {
                      printf(" > %s", lp2->word->data);
                      lp2 = lp2->neighbor;
                    }
                    putchar('\n');
                    return;
                  }
                  lp3->neighbor = lp;
                  lp3->nextInOrbit = firstInOrbit;
                  firstInOrbit = lp3;
                  lp3->nextActive = firstActive;
                  firstActive = lp3;
                }
                lp3 = lp3->nextSameTrace;
              }
            }
            ++lp2;
          }
          lp = lp->nextInOrbit;
        }
      }
    }
  }
  printf("No way to connect %s to %s.\n", word1, word2);
}

int main(int argc, char **argv)
{
  int i;

  while (1) {
    printf("? ");
    input[0] = '\0';
    fgets(input, MAX_INPUT_LENGTH+1, stdin);
    verb[0] = '\0';
    word1[0] = '\0';
    word2[0] = '\0';
    sscanf(input, "%32s %1024s %1024s", verb, word1, word2);
    if (strcmp(verb, "add-dictionary") == 0) {
      doAddDictionary();
    } else if (strcmp(verb, "connect") == 0) {
      doConnect();
    } else if (strcmp(verb, "exit") == 0) {
      break;
    }
  }
  return 0;
}


Partial results (none / 0) (#218)
by jacob on Wed May 15, 2002 at 03:57:06 AM EST

are here.

--
"it's not rocket science" right right insofar as rocket science is boring

--Iced_Up

Final results (5.00 / 1) (#219)
by jacob on Tue May 21, 2002 at 11:42:38 PM EST

are here. Thanks to everyone who entered!

--
"it's not rocket science" right right insofar as rocket science is boring

--Iced_Up

Programming Fun Challenge 4 | 219 comments (219 topical, 0 editorial, 0 hidden)
Display: Sort:

kuro5hin.org

[XML]
All trademarks and copyrights on this page are owned by their respective companies. The Rest © 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.

Powered by Scoop create account | help/FAQ | mission | links | search | IRC | YOU choose the stories!