Programming Fun Challenge 4 || kuro5hin.org


	create account \| help/FAQ \| contact \| links \| search \| IRC \| site news

Everything

We need your support: buy an ad | premium membership

Programming Fun Challenge 4

By jacob in jacob's Diary
Tue Apr 23, 2002 at 10:44:51 AM EST
Tags: (all tags)

The new Programming Fun Challenge is out! This time, we'll be writing a program to play the word chains game: write a program that, given two words, can produce a list of words that connect the first to the second, each word differing from its neighbors by only one letter.

The Challenge
Programming Challenge 4 is to write a program in the language of your choice that, given a dictionary, can produce minimum-length word chains that connect a start word to an end word. A word chain connecting a start word to an end word is a list of words that satisfies four properties:

It starts with the given start word.

It ends with the given end word.

All words in the list are valid dictionary words.

If two words are adjacent to each other in the list, they differ from each other by exactly one letter.

As you might expect, the minimum-length word chain is the shortest word chain that connects the start word to the end word.

When your program starts, it should present a '?' (question-mark character) and a space as a prompt to the user. It should accept the following commands and perform the corresponding actions:

add-dictionary filename
Adds all the words in filename to the current dictionary (the current dictionary starts out empty). filename is presumed to be a text file containing an arbitrarily-long alphabetical list of words (that is, sequences of alphanumeric characters) separated by newline.
Prints "Added filename" on success, "Error adding filename" on failure.

connect word1 word2
Prints a minimum-length word chain connecting word1 to word2 in the current dictionary, with words separated by a space, the `>' symbol, and another space. If no word chain exists that connects the two words, the program should print "No way to connect word1 to word2." (Note that it will always print this whenever either of the given words is not in the dictionary or if the two words are not of the same length.)

exit
Quits the program.

After it finishes executing any command other than exit, your program should print another prompt on a new line.

For example, suppose I have a file "sample-dict" with the following contents:
boat book boot coat hook look rake take

An interaction might look like this:

? add-dictionary sample-dict Added sample-dict ? connect coat hook coat > boat > boot > book > hook ? connect boot hook boot > book > hook ? connect boat take No way to connect boat to take. ? exit
Scoring
PFC-4 entries will be scored on three metrics:

correctness (5 points)
Your program must work exactly as I have described it. I will design test cases and let you know if your program passes them.

elegance (5 points)
Your program should be easy to read and understand. That means making it small, using appropriate function names, variable names, and comments, choosing lucid algorithms, and organizing source code appropriately.

algorithmic efficiency (5 points)
Your program should aim to do as little work as it can in producing its results. Here I am not concerned with the microefficiency of each individual instruction so much as the overall efficiency of the algorithms you choose.

That's the challenge. To enter, follow the usual procedure and post your program as a reply to this diary (see tmoertel's How to Post Code to K5 for a way to preserve your formatting). After a week, I will judge all entries and announce the winners. If you'd like to be notified by e-mail when that announcement happens (and for future PFC announcements) sign up for the PFC mailing list by sending a blank e-mail to:
<pfc-announce-subscribe@moertel.com>
Now get coding!

Sponsors

Managed Hosting
VoxCAST Content Delivery
Raw Infrastructure

Poll

How tough is this PFC?

Too tough 4%

Too easy 16%

A good challenge 80%

Votes: 25
Results | Other Polls

Related Links

How to Post Code to K5
jacob's Diary

Programming Fun Challenge 4 | 219 comments (219 topical, editorial, 0 hidden)

neat (5.00 / 1) (#1)
by hurstdog on Tue Apr 23, 2002 at 11:14:11 AM EST

Wish I could find a reason to spend time working on that instead of fixing bugs in scoop or working on my compiler :-) Little programming challenges like this make programming fun.

If it helps you out any ... by jacob, 04/23/2002 11:17:06 AM EST (5.00 / 1)

Anybody who is entering, post in this thread (5.00 / 1) (#3)
by tmoertel on Tue Apr 23, 2002 at 11:34:05 AM EST

That way jacob will know how many entries to expect.

P.S. I'm in! I'll produce a Haskell entry and maybe a Perl entry. I'm going to be bogged down for the next few days, though, and so my entries may take a while to show up.

--
My blog | LectroTest

[ Disagree? Reply. ]

I'm Entering by matthewg, 04/23/2002 11:41:00 AM EST (5.00 / 1)
maybe by Cal Bunny, 04/23/2002 12:27:13 PM EST (4.00 / 1)
Looking forward to Haskell version by mahlen, 04/23/2002 11:47:33 PM EST (5.00 / 1)
I'm in... by Logan, 04/24/2002 12:05:59 PM EST (5.00 / 1)

Question (5.00 / 1) (#5)
by bob6 on Tue Apr 23, 2002 at 11:53:32 AM EST

What if there is several solutions ? Should the program report them all ?

Cheers.

No by jacob, 04/23/2002 11:55:47 AM EST (5.00 / 1)

Question (5.00 / 1) (#6)
by matthewg on Tue Apr 23, 2002 at 11:53:47 AM EST

For connect a b, will a and b always be the same length? If not, does removing or adding a letter count as a transition? Ferinstance, if we have "dog" and "dogs" in the dict, are those two words considered adjacent?

connect and different-length words by jacob, 04/23/2002 11:59:49 AM EST (5.00 / 1)

Trivial. (5.00 / 1) (#8)
by ambrosen on Tue Apr 23, 2002 at 11:58:05 AM EST

If you do it in a .pl file. Prolog that is. I may have a quick go.

--
Procrastination does not make you cool. Being cool makes you procrastinate. DesiredUsername.

i was expecting that by jacob, 04/23/2002 12:01:12 PM EST (5.00 / 1)
Full backtracking by bob6, 04/23/2002 12:04:28 PM EST (5.00 / 1)

entire search space by Cal Bunny, 04/23/2002 12:33:55 PM EST (4.00 / 1)

Now you mention it. by ambrosen, 04/23/2002 01:29:52 PM EST (5.00 / 1)

Can we please have simplier input (4.00 / 1) (#12)
by Cal Bunny on Tue Apr 23, 2002 at 12:17:53 PM EST

A small request for future PFCs: can we please leave the input to simplier means. Half of my code is going to be for a lame command handler. These things are supposed to be fun (and simple), having to read and parse command input as well as do error detection does not fall into either of those categories for me.

Also, might I seggest changing the output format to match the input format. This would simplify things a little, and it would make the output much easier to read and automatically parse.

^cb^
Kudos to you for warping my fragile little mind. - communist

Input by jacob, 04/23/2002 12:25:30 PM EST (5.00 / 1)

command line by Cal Bunny, 04/23/2002 12:31:07 PM EST (4.00 / 1)

Command-line = no good by jacob, 04/23/2002 12:35:37 PM EST (5.00 / 1)

more command line by Cal Bunny, 04/23/2002 12:36:38 PM EST (4.00 / 1)

Unfortunately by jacob, 04/23/2002 12:39:35 PM EST (5.00 / 2)

Ask nicely by Cal Bunny, 04/23/2002 12:53:34 PM EST (4.00 / 1)

Information sharing by jacob, 04/23/2002 01:18:24 PM EST (5.00 / 1)

what sort of testing? by Delirium, 04/24/2002 11:31:15 AM EST (5.00 / 1)

Ugh by jacob, 04/24/2002 11:56:09 AM EST (5.00 / 1)

Clarification (4.00 / 1) (#13)
by Cal Bunny on Tue Apr 23, 2002 at 12:20:00 PM EST

What does it mean to differ by one letter? Am I understanding properly that insertions and deletions do not count?

^cb^
Kudos to you for warping my fragile little mind. - communist

Sorry, nevermind (nt) by Cal Bunny, 04/23/2002 12:22:17 PM EST (4.00 / 1)

Disjoint input dictionaries? (5.00 / 1) (#22)
by tmoertel on Tue Apr 23, 2002 at 12:51:58 PM EST

Can we assume that the sets of words in the input dictionaries are disjoint, or should we be prepared to handle duplicate words?

--
My blog | LectroTest

[ Disagree? Reply. ]

Assume disjoint by jacob, 04/23/2002 12:56:36 PM EST (5.00 / 1)

Python solution (5.00 / 3) (#25)
by MK77 on Tue Apr 23, 2002 at 01:04:42 PM EST

#!/usr/bin/env python import copy import sys metawords = {} words = {} path_not_found = "path not found" def link_metaword(metaword, word): if metaword not in metawords: metawords[metaword] = [ word ] else: for neighbor in metawords[metaword]: words[neighbor].append(word) words[word].append(neighbor) metawords[metaword].append(word) def add_word(word): if word in words: return words[word] = [] for i in range(len(word)): metaword = word[:i] + '_' + word[i + 1:] link_metaword(metaword, word) def find_path(start, end): if start not in words: raise path_not_found paths = [ (start, [ start ]) ] visited = {} while paths: (word, path) = paths[0] paths = paths[1:] if word == end: return path visited[word] = 1 for neighbor in words[word]: if neighbor not in visited: paths.append((neighbor, copy.copy(path) + [ neighbor ])) raise path_not_found def add_dict(filename): f = open(filename) dict_words = f.readlines() f.close() for word in dict_words: word = word[:-1] add_word(word) def main_loop(): while 1: print "? ", line = sys.stdin.readline() line = line[:-1] cmd_words = line.split(' ') cmd = cmd_words[0] if cmd == "exit": return if cmd == "add-dictionary": add_dict(cmd_words[1]) print "Added " + cmd_words[1] if cmd == "connect": try: path = find_path(cmd_words[1], cmd_words[2]) for i in range(len(path)): if i > 0: print ">", print path[i], print "\n", except path_not_found: print "No way to connect " + cmd_words[1] + " to " + cmd_words[2] + "." main_loop()

--
Mmm... rageahol

Couple of fixes by MK77, 04/24/2002 10:10:48 AM EST (5.00 / 3)

Small bugfix by MK77, 04/24/2002 10:59:42 AM EST (5.00 / 1)

perl implementation (5.00 / 2) (#27)
by matthewg on Tue Apr 23, 2002 at 01:20:05 PM EST

Here's my first go at it.

#!/usr/bin/perl use strict; use warnings; our(@dict, %adjacents, %seen); sub find_adjacent($); sub connect_words($$); sub build_adjacents(); # To find the shortest path between two words, # we first get a list of all words which are # adjacent to the target word. We keep doing this, # making sure not to revisit any words, until we've # exhausted all the possibilities. Then we check the # length of all paths which worked. sub find_adjacent($) { my($a) = shift; my @ret = (); my @a = split(//, $a); foreach my $word(@dict) { next if $word eq $a; next if length($word) != length($a); my $diffs = 0; my @word = split(//, $word); foreach my $letter_a(@a) { my $letter_b = shift @word; $diffs++ if $letter_a ne $letter_b; last if $diffs > 1; } push @ret, $word if $diffs == 1; } return @ret; } sub connect_words($$) { my($a, $b) = @_; my @paths = (); return if $seen{$a}++; # Don't go backwards return unless @{$adjacents{$a}}; # Nothing is adjacent to $a? # This is the last step return [$a, $b] if grep { $_ eq $b } @{$adjacents{$a}}; # Darn, not the last step - find all the ways to get there. foreach my $neighbor(@{$adjacents{$a}}) { push @paths, map { [$a, @$_] if $_ } connect_words($ neighbor, $b); } return @paths; } sub build_adjacents() { %adjacents = map { $_ => [find_adjacent($_)] } @dict; } @dict = (); while(1) { print "? "; my $input = <STDIN>; chomp $input; if($input =~ /^exit$/) { exit; } elsif($input =~ /^add-dictionary\s+(.+)/) { my $file = $1; if(open(FILE, "<", $file)) { my @lines = <FILE>; map { chomp $_ } @lines; push @dict, @lines; build_adjacents(); print "Added $file\n"; } else { print "Error adding $file\n"; } close FILE; } elsif($input =~ /^connect\s+(\S+)\s+(\S+)$/) { if(!@dict) { print "You must specify a dictionary f irst!\n"; next; } my($a, $b) = ($1, $2); %seen = (); my @paths = connect_words($a, $b); if(@paths) { my $best_path = undef; foreach my $path(@paths) { next if $best_path and @ $path > @$best_path; $best_path = $path; } print join(" > ", @$best_path), "\n"; } else { print "No way to connect $a to $b.\n"; } } }

Memoize by matthewg, 04/23/2002 02:52:03 PM EST (5.00 / 2)

stuff (5.00 / 1) (#29)
by trhurler on Tue Apr 23, 2002 at 01:56:54 PM EST

I have a command handler ready for this. Frankly, I wish you'd keep this format for future contests, as I can then reuse the command handler with truly minimal modifications. Anyone bitching about having to write one is either too lazy for words or else incompetent; it didn't take me long, and mine is probably better than what you'll write anyway(better error checking, handles a few odd cases gracefully, etc.)

This isn't too easy, too hard, or anything like that. The only problem I see is that you've (almost certainly deliberately) chosen a problem that's trivial in your favorite language and requires significant work in many others. Since you seem to have this habit, I am going to assemble a small library of list handling code in C for use in your contests. Hopefully you won't count this against my program length, for two reasons. One, because list handling libraries exist out there, and I'm just the sort of bastard who'd RATHER rewrite it than search for it - in other words, a typical C programmer would not have to write this code to complete your program. Two, because as I said, you obviously biased the whole thing in favor of languages with particular elements in their provided runtimes.

For anyone who is having trouble thinking about this problem, once you have the right data structures and appropriate methods for operating on them, the loading of data is straightforward and the solution of the problem can be one big stupid recursion. There are almost certainly more optimal ways to do it than the one big stupid recursion, but that method will work, which I'm sure is why jacob chose this particular exercise. I'm thinking of implementing the one big stupid recursion and then trying to do something better for grins, if I have time.

And when I say "one big stupid recursion" I do not mean to imply that recursion is stupid. I mean to imply that the method of solution comprised by this particular recursion is something of a blunt instrument - a brute force approach of extraordinary magnitude. The sort of solution I always sought and then tried to beat in college programming exercises. (I always did beat it back then, but admittedly I had lots of free time, and the resulting programs were rarely what you'd call "straightforward" A well documented heuristic is still a wild ass guess, after all, and sometimes the fastest algorithm is not the one you'd want to teach in class.)

--
'God dammit, your posts make me hard.' --LilDebbie

Why I chose the problem by jacob, 04/23/2002 02:32:09 PM EST (5.00 / 2)

Rice eh? by spoonz, 04/23/2002 04:57:55 PM EST (5.00 / 1)

WFR! by jacob, 04/23/2002 05:01:21 PM EST (5.00 / 1)

RRF! by spoonz, 04/23/2002 05:18:51 PM EST (5.00 / 1)

Steve and Mike by jacob, 04/23/2002 05:23:17 PM EST (5.00 / 1)

PS by jacob, 04/23/2002 02:52:17 PM EST (5.00 / 2)

Damn you :-) (5.00 / 1) (#30)
by kentm on Tue Apr 23, 2002 at 02:02:19 PM EST

I should never have clicked on your text ad. Now I want to work on this instead of my real work. I'm going to attempt a Java version for this.

efficiency question (4.00 / 1) (#34)
by Cal Bunny on Tue Apr 23, 2002 at 02:54:27 PM EST

How are you going to determine efficiency? Are you counting dictionary load time or just query time. Should we be efficient for a single query or for multiple queries (basically, how much preprocessing of the input should we be doing)?

^cb^
Kudos to you for warping my fragile little mind. - communist

Efficiency by jacob, 04/23/2002 03:03:17 PM EST (5.00 / 1)

One indexing method (5.00 / 1) (#36)
by KWillets on Tue Apr 23, 2002 at 03:15:32 PM EST

Try forming sets of words which differ by one character, by deleting one character, eg:

cat --> c_t cut --> c_t cot --> c_t (These all map to the "c_t" set.)

Map each word of length k to k different sets, one for each character, i.e:

cat --> _at cat --> c_t cat --> ca_ Given a word it's easy to look up the k sets of words one hop away, and recurse.

I'm looking at more global ways of doing this. If anyone has stats on the sparsity of the links, etc., let me know.

I don't have stats, but by jacob, 04/23/2002 03:17:13 PM EST (5.00 / 1)
On posting hints / spoilers by tmoertel, 04/23/2002 03:39:34 PM EST (5.00 / 1)

OK by KWillets, 04/23/2002 06:53:32 PM EST (5.00 / 1)

Code is okay by tmoertel, 04/23/2002 08:31:27 PM EST (5.00 / 1)

C++/STL implementation (5.00 / 3) (#37)
by ucblockhead on Tue Apr 23, 2002 at 03:16:45 PM EST

Here's an implementation in C++, using the STL. (Sad that I have enough freetime to do this, but oh well.) It's pretty straightforward and unoptimized. #include <string> #include <vector> #include <iostream> #include <fstream> using namespace std; class WordList; class Word { public: Word(const string& aText) : myText(aText) {} inline const string& GetText() const { return myText; } static bool IsOkCnxn(const string& str1, const string& str2); void MakeConnections(const WordList& aWordList); void Connect(vector<string>& result, vector<string>&current, const string& aTarget) const; private: vector<Word*> myTargets; string myText; }; class WordList : public vector<Word*> { public: ~WordList(); void AddWord(const string& str); void Connect(vector<string>& result, string aStart, string anEnd); const Word* GetWord(const string& aWord) const; }; const Word* WordList::GetWord(const string& aWord) const { for(vector<Word*>::const_iterator it = begin();it!=end();it++) if( *it && (*it)->GetText() == aWord ) return *it; return NULL; } void WordList::Connect(vector<string>& result, string aStart, string anEnd) { if( const Word* word = GetWord(aStart) ) { word->Connect(result, vector<string>(), anEnd); } } WordList::~WordList() { for(vector<Word*>::iterator it = begin();it!=end();it++) delete *it; } void WordList::AddWord(const string& str) { push_back(new Word(str)); } bool Word::IsOkCnxn(const string& str1, const string& str2) { if( str1.size() == str2.size() ) { int Diffs = 0; for(int i=0;Diffs < 2 && i < str2.size();i++) if( str1[i] != str2[i] ) &n bsp; if( ++Diffs > 1 ) &n bsp; return false; return (Diffs == 1); } return false; } void Word::MakeConnections(const WordList& aWordList) { for(WordList::const_iterator it=aWordList.begin();it!=aWordList.end();it++) { if( IsOkCnxn((*it)->GetText(), myText) ) { myTargets.push_back(*it); } } } void Word::Connect(vector<string>& result, vector<string>&current, const string& aTarget) const { current.push_back(myText); for(vector<Word*>::const_iterator it=myTargets.begin();it!=myTargets.end();it++) { if( aTarget != myText ) { bool isAlreadyInList=false; for(vector<string>::iterator jt=current.begin();jt!=current.end() && !isAlreadyInList;jt++) &n bsp; if( *jt == (*it)->GetText() ) &n bsp; isAlreadyInList = true; if( !isAlreadyInList ) { &n bsp; (*it)->Connect(result, current, aTarget); } } else if( aTarget == myText && &n bsp; (result.size() == 0 || current.size() < result.size() ) ) result = current; } current.pop_back(); } int main() { WordList someWords; for(;;) { cout << "? "; string command; cin >> command; if( command == "exit" ) return 0; else if( command == "add-dictionary") { string dict; cin >> dict; if( ifstream in(dict.c_str()) ) { &n bsp; string word; &n bsp; while( !in.eof() ) { &n bsp; in >> word; &n bsp; if( !word.empty() ) &n bsp; someWords.AddWord(word); &n bsp; } &n bsp; for(WordList::iterator it=someWords.begin();it!=someWords.end();it++) { &n bsp; (*it)->MakeConnections(someWords); &n bsp; } } } else if( command == "connect" ) { string src; cin >> src; string trgt; cin >> trgt; vector<string> result; someWords.Connect(result, src, trgt); if( result.size() == 0 ) &n bsp; cout << "No way to connect " << src << " and " << trgt << "\n"; else { &n bsp; for(vector<string>::iterator jt = result.begin();jt!=result.end();jt++) { &n bsp; if( jt != result.begin() ) &n bsp; cout << " > "; &n bsp; cout << *jt; &n bsp; } &n bsp; cout << "\n"; } } } return 0; }
-----------------------
This is k5. We're all tools - duxup

hmmm.... by ucblockhead, 04/23/2002 03:19:12 PM EST (5.00 / 1)

Scoop is breaking nbsps (link to fix enclosed) by tmoertel, 04/23/2002 03:33:19 PM EST (5.00 / 1)

can't leave well enough alone. by ucblockhead, 04/23/2002 04:54:21 PM EST (5.00 / 3)

Playing around by ucblockhead, 04/24/2002 01:23:05 PM EST (5.00 / 1)
Compiler? by jacob, 04/25/2002 11:44:27 AM EST (5.00 / 1)

compiler by ucblockhead, 04/25/2002 12:10:31 PM EST (5.00 / 1)

Another error by jacob, 04/25/2002 12:21:38 PM EST (5.00 / 1)

sorry...sigh... by ucblockhead, 04/25/2002 12:45:37 PM EST (5.00 / 1)

That fixed it by jacob, 04/25/2002 01:02:00 PM EST (5.00 / 1)

yet another perl (5.00 / 3) (#42)
by wintergreen on Tue Apr 23, 2002 at 03:39:36 PM EST

use strict; my @DICT; my $inp; do { print "? "; $inp = <>; if( $inp =~ /^add-dictionary (.*)$/ ) { open FH, $1 || die "Could not open '$1' for reading\n"; my @newwords = <FH>; chomp for @newwords; @DICT = (@DICT, @newwords); } elsif( $inp =~ /^connect (.*?) (.*)$/ ) { do_connect($1,$2); } elsif( $inp !~ /^exit/ ) { print "Invalid input.\n"; } } while( $inp !~ /^exit/ ); # branch and bound sub do_connect { my( $w1, $w2 ) = @_; unless( length($w1) eq length($w2) ) { print "No way to connect $w1 to $w2.\n"; return; } my @pathlist = ([$w1]); do { if( $pathlist[0][-1] eq $w2 ) { print join(" > ", @{$pathlist[0]}), "\n"; return; } my @newpaths = add_path( shift @pathlist ); push(@pathlist, @newpaths) if @newpaths; @pathlist = sort { scalar(@$a) <=> scalar(@$b) } @pathlist; } while( @pathlist ); print "No way to connect $w1 to $w2.\n"; } sub add_path { my $path = shift; my @newpaths; $" = '|'; my $used = qr{^(@$path)$}; my %difs; my @end = split / */, $path->[-1]; WORD: foreach my $word (@DICT) { next if $word =~ $used; my @word = split / */, $word; for my $i ( 0..$#word ) { $difs{$word}++ if ($word[$i] ne $end[$i]); if( $difs{$word} > 1 ) { delete $difs{$word}; next WORD; } } } foreach my $newword ( keys %difs ) { my @newpath = ( @$path, $newword ); push @newpaths, \@newpath; } return @newpaths; }

plan to repost by wintergreen, 04/24/2002 02:00:31 AM EST (5.00 / 1)

see my revised version by wintergreen, 04/30/2002 03:01:01 AM EST (none / 0)

Is case significant for determining adjacency? (5.00 / 1) (#43)
by tmoertel on Tue Apr 23, 2002 at 03:46:42 PM EST

For example, are the words "A" and "a" considered to be identical or one step apart?

--
My blog | LectroTest

[ Disagree? Reply. ]

Whichever is easier for you by jacob, 04/23/2002 03:47:53 PM EST (5.00 / 1)

I should have said ... by jacob, 04/23/2002 03:58:10 PM EST (5.00 / 1)

Question (5.00 / 1) (#46)
by trhurler on Tue Apr 23, 2002 at 04:49:46 PM EST

Are you only allowed to use connect commands with words in the dictionaries provided in advance? If this is not the case, then am I allowed to add words you put in connect commands to my internal dictionary?

--
'God dammit, your posts make me hard.' --LilDebbie

only words in the dictionary by jacob, 04/23/2002 04:54:03 PM EST (5.00 / 1)
Nevermind by trhurler, 04/23/2002 04:55:42 PM EST (5.00 / 1)

Test data (5.00 / 2) (#58)
by djotto on Wed Apr 24, 2002 at 06:39:48 AM EST

I just stripped 3000-odd four-letter words out of Webster's 1913. The file is available here

By the by ... by jacob, 04/24/2002 10:48:01 AM EST (5.00 / 1)

A good dictionary by tmoertel, 04/24/2002 12:01:14 PM EST (5.00 / 1)

Can you post this zipped? by Cal Bunny, 04/24/2002 01:49:50 PM EST (4.00 / 1)

Here by kuran42, 04/24/2002 02:01:44 PM EST (5.00 / 1)
bz2 in windows by jonathan_ingram, 04/28/2002 09:08:51 AM EST (none / 0)

Ok, trying Haskell. (5.00 / 3) (#59)
by i on Wed Apr 24, 2002 at 06:42:46 AM EST

I'm still learning the idioms. Forgive me, tmoertel :)

================ ladder.hs =================
module Main where import Set import FiniteMap import IO import BFS type SplitString = (String, String) type Dict = FiniteMap SplitString [String] splits :: String -> [SplitString] splits [_] = [("", "")] splits (x:xs) = ("",xs) : map addx (splits xs) where addx (u,v) = (x:u,v) splitString :: String -> SplitString -> Dict -> Dict splitString w u fm = addToFM_C (flip (++)) fm u [w] addString :: String -> Dict -> Dict addString w fm = foldr add1 fm (splits w) where add1 u fm = splitString w u fm ioAddDict :: FilePath -> Dict -> IO Dict ioAddDict path dict = do handle <- openFile path ReadMode dict' <- ioAddDict' handle dict hClose handle return dict' where ioAddDict' :: Handle -> Dict -> IO Dict ioAddDict' handle dict = do eof <- hIsEOF handle if eof then return dict else do word <- hGetLine handle let dict' = word `seq` dict `seq` addString word dict dict'' <- ioAddDict' handle dict' return dict'' tryAddDict :: FilePath -> Dict -> IO Dict tryAddDict path dict = do res <- try (ioAddDict path dict) case res of (Left _) -> do putStrLn $ "Error adding " ++ path return dict (Right dict') -> return dict' findPath :: String -> String -> Dict -> [String] findPath from to dict = findChain (adjacents dict) from to adjacents :: Dict -> String -> [String] adjacents fm w = setToList $ foldl addToSet emptySet adj where adj = just $ map (lookupFM fm) (splits w) just [] = [] just (Nothing:xs) = just xs just ((Just x):xs) = x ++ just xs mainLoop :: Dict -> IO Dict mainLoop dict = do putStr "? " hFlush stdout res <- try getLine case res of Left _ -> return dict Right cmdline -> doCmd (words cmdline) dict doCmd :: [String] -> Dict -> IO Dict doCmd ['e':_] d = return d doCmd ['a':_, w] d = tryAddDict w d >>= mainLoop doCmd ['?':_, w] d = do print (adjacents d w) mainLoop d doCmd [w1, w2] d = doCmd ["c", w1, w2] d doCmd ['c':_, w1, w2] d = do let path = findPath w1 w2 d case path of [] -> putStrLn $ "No way to connect " ++ w1 ++ " to " ++ w2 ++ "." x -> putStrLn $ showPath x mainLoop d doCmd _ d = putStrLn "Bad command" >> mainLoop d showPath :: [String] -> String showPath [x] = x showPath (x:xs) = x ++ " > " ++ showPath xs main :: IO () main = do let dict = emptyFM mainLoop dict hFlush stdout return ()

==================== BFS.hs ==============
module BFS (findChain) where import qualified Set as S import qualified SimpleQueue as Q -- Parented tree data PTree a = PNil | PTree (PTree a) a [PTree a] findChain :: Ord a => (a->[a]) -> a -> a -> [a] findChain f from to = reverse $ first chain where chain = dropWhile notFound $ bfsPaths $ toPTree f from first (x:_) = x first [] = [] notFound [] = True notFound xs = to /= head xs toPTree :: (a->[a]) -> a -> PTree a toPTree f a = toPTree' f PNil a where toPTree' f p a = tree where tree = PTree p a trees trees = map (toPTree' f tree) (f a) toParent :: PTree a -> [a] toParent PNil = [] toParent (PTree p a _) = a:toParent p bfsPaths :: Ord a => PTree a -> [[a]] bfsPaths PNil = [] bfsPaths tree@(PTree parent node branches) = map toParent paths where (paths,_) = bfs (Q.single tree) (S.emptySet) bfs :: Ord a => Q.Seq (PTree a) -> S.Set a -> ([PTree a], S.Set a) bfs queue set = case Q.lview queue of Q.Just2 (tree@(PTree p n bs)) queue' -> if S.elementOf n set then bfs queue' set else (tree : trees, set'') where (trees, set'') = bfs (foldr (flip Q.snoc) queue' bs) set' set' = S.addToSet set n Q.Just2 (PNil) queue' -> bfs queue' set Q.Nothing2 -> ([], set)

It is reasonably fast under ghc -O2, a bit slower under ghci, and crashes hugs :(

I may try to tidy it up later.
^{^{—
and we have a contradicton according to our assumptions and the factor theorem}}

Looks good to me! by tmoertel, 04/24/2002 10:50:00 AM EST (5.00 / 1)

Sure first query is slow. by i, 04/24/2002 01:06:11 PM EST (5.00 / 1)

My Haskell implementation (5.00 / 2) (#65)
by tmoertel on Wed Apr 24, 2002 at 11:47:00 AM EST

This one uses a number of optimizations to compute the connections quickly. For example, on a Celeron 433, it takes 2.6 seconds to compute the answer

masts > pasts > posts > hosts > hoots > hooks > books

when given the input

add-dictionary /usr/share/dict/words connect masts books exit

via

time ./tgm-word-connect < test1.tst

Here it is:

-- TGM 20020424 -- -- PFC 4 entry -- -- Optimizations: -- -- * Partitions the word database by word length so that only -- words of the desired length need be searched. -- * Builds DB partitions only when needed (i.e., lazily). -- * Adds new dictionaries to DB via quick incremental merge. -- * Uses BFS to find shortest path without having to consider -- potentially longer solutions. -- * Uses word "trace" (in DB) to rapidly compute BFS neighbors. module Main (main) where import List import Array import Maybe (fromJust) import StrictFiniteMap import IO import EdisonPrelude import qualified SimpleQueue as Q newtype WordDatabase = WD (WordArray, WordToIndex, TraceToIndices) type WordArray = Array Int String -- 33 => "cat" type WordToIndex = FiniteMap String Int -- "cat" => 33 type TraceToIndices = FiniteMap String [Int] -- "c.t" => [33, 329, ...] emptyDatabase = WD (array (0,-1) [], emptyFM, emptyFM) type LPWordDatabase = FiniteMap Int WordDatabase emptyLPDatabase = emptyFM :: LPWordDatabase main :: IO () main = commandLoop emptyLPDatabase commandLoop :: LPWordDatabase -> IO () commandLoop lpwdb = prompt >> getLine >>= \cmd -> case words cmd of ["exit"] -> return () ["connect", a, b] -> putStrLn (connect lpwdb a b) >> commandLoop lpwdb [_,dictFile] -> addDictFile dictFile >>= commandLoop _ -> putStrLn ("Illegal command: " ++ cmd) where prompt = putStr "? " >> hFlush stdout addDictFile df = catch tryDict failDict where tryDict = do contents <- readFile df putStrLn ("Added " ++ df) return (addDict lpwdb (words contents)) failDict = const $ do putStrLn ("Error adding " ++ df) return lpwdb connect :: LPWordDatabase -> String -> String -> String connect lpwdb a b = connect1 db a b where db = lookupWithDefaultFM lpwdb emptyDatabase (length a) connect1 :: WordDatabase -> String -> String -> String connect1 (WD (warry, widx, ttis)) a b = case (getIdx a, getIdx b) of (Nothing, _) -> noPath (_, Nothing) -> noPath (Just aix, Just bix) -> showPath (bfs bix aix) where getIdx = lookupFM widx noPath = "No way to connect " ++ a ++ " to " ++ b ++ "." showPath [] = noPath showPath ps = concat . intersperse " > " . map (warry!) $ ps -- standard BFS: C = closed set, O = open set bfs start goal = bfs' emptyFM (Q.single (start, start)) goal bfs' c o goal = case Q.lview o of Nothing2 -> [] Just2 (v,vp) o' -> if v == goal then walkPath else if elemFM v c then bfs' c o' goal -- closed else bfs' c' o'' goal -- node is open => follow it where c' = addToFM c v vp o'' = foldl' Q.snoc o' [(vc,v) | vc <- vcs, vc /= v] vcs = concatMap (lookupFMe ttis) (traces (warry ! v)) walkPath = v : takeWhile (v /=) (walkPath' vp) walkPath' n = n : if np == n then [] else walkPath' np where np = lookupFMe c n addDict :: LPWordDatabase -> [String] -> LPWordDatabase addDict lpwdb ws = plusFM_C mergeDB lpwdb lpwdb1 where lpws = keyedGroups length ws lpwdb1 = mapFM (\_ xs -> addDict1 emptyDatabase xs) lpws mergeDB a (WD(bwarry,_,_)) = addDict1 a (elems bwarry) addDict1 :: WordDatabase -> [String] -> WordDatabase addDict1 (WD (warry, widx, ttis)) ws = WD (warry', widx', ttis') where ws' = elems warry ++ ws base = sizeFM widx warry' = array (1, base + length ws) (zip [1..] ws') widx' = addListToFM widx (zip ws [base+1..]) ttis' = addListToFM_C (++) ttis (concatMap w2ts ws) where w2ts w = map (flip (,) [lookupFMe widx' w]) (traces w) -- UTILITIES traces :: String -> [String] traces [] = [] traces (x:xs) = ('.':xs) : map (x:) (traces xs) lookupFMe :: Ord a => FiniteMap a b -> a -> b lookupFMe = (fromJust.) . lookupFM foldl' _ z [] = z foldl' f z (x:xs) = (foldl' f $! f z x) xs keyedGroups :: (Ord a, Ord key) => (a -> key) -> [a] -> FiniteMap key [a] keyedGroups keyFn = addListToFM_C (flip (++)) emptyFM . map ( \x -> (keyFn x, [x]) )

--
My blog | LectroTest

[ Disagree? Reply. ]

Compiler? by jacob, 04/24/2002 12:09:08 PM EST (5.00 / 1)

Details by tmoertel, 04/24/2002 05:53:32 PM EST (5.00 / 1)

I wonder why your solution is so much faster... by i, 04/24/2002 01:34:59 PM EST (5.00 / 1)

Yep, it's because of partition by word length. by i, 04/24/2002 02:21:35 PM EST (5.00 / 1)

Thinking of it again... by i, 04/26/2002 11:42:20 AM EST (5.00 / 1)

Request (5.00 / 2) (#70)
by jacob on Wed Apr 24, 2002 at 12:13:38 PM EST

When you post a solution, please post brief instructions for how to run it. This is particularly important for languages with more than one interpreter/compiler: if you're using C++, for example, whether you compiled with gcc or VC++ or something else might make a difference. It's not a big deal -- I can figure it out if you don't tell me -- but things will run smoother if you just say up front.

By the way: what do you think of the text ad? I bought it with the intent of attracting new people to the contest, and it appears to have done so; is it worth continuing to buy them for future contests as well?

--
"it's not rocket science" right right insofar as rocket science is boring

--Iced_Up

Yes, do so by Hopfrog, 04/24/2002 03:34:31 PM EST (5.00 / 1)

We'll be here all week by jacob, 04/24/2002 03:39:03 PM EST (5.00 / 1)

As long as it's written... (5.00 / 2) (#71)
by kuran42 on Wed Apr 24, 2002 at 12:59:17 PM EST

My Python (2.2) version, straight boring brute force. If I can get my brain in gear I might try a GA solution.

import string, sys class Words: def __init__(self): self.words = {} def hasWord(self, word): return word and self.words.has_key(len(word)) and word in self.words[len(word)][word[0]] def addDictionary(self, input): try: for i in map(string.lower, map(string.strip, open(input).readlines())): if not self.words.has_key(len(i)): self.words[len(i)] = dict(map(lambda x: (x, []), string.lowercase)) self.words[len(i)][i[0]].append(i) except: print 'Error adding %s' % input else: print 'Added %s' % input def connectWords(self, a, b): self.tried, self.destination = {a.lower() : 0}, b.lower() if not self.hasWord(a) or not self.hasWord(b) or len(a) != len(b): print 'No way to connect %s to %s.' % (a, b) return x = self.connectWork(a.lower()) if len(x): print ' > '.join(x) else: print "No way to connect %s to %s." % (a, b) def connectWork(self, a): prospects = {} for i in range(len(a)): for j in string.lowercase: m = '%s%s%s' % (a[:i], j, a[i + 1:]) if m == self.destination: return [a, m] elif self.hasWord(m) and m not in self.tried and m != a: prospects[m] = 0 self.tried.update(prospects) if len(prospects): x = filter(None, map(self.connectWork, prospects.keys())) if x: y = map(len, x) z = min(y) if z: return [a] + x[y.index(z)] return [] def mainLoop(): words = Words() COMMANDS = { 'add-dictionary': (1, words.addDictionary), 'connect': (2, words.connectWords), 'exit': (0, lambda: sys.exit(0)) } while 1: x = raw_input('? ').split() if len(x): if x[0].lower() in COMMANDS: if COMMANDS[x[0]][0] != len(x) - 1: print 'Incorrect argument count' else: apply(COMMANDS[x[0].lower()][1], tuple(x[1:])) else: print 'Invalid command' if __name__ == '__main__': mainLoop()

--
kuran42? genius? Nary a difference betwixt the two. -- Defect

Two things (5.00 / 1) (#78)
by trhurler on Wed Apr 24, 2002 at 03:10:24 PM EST

First, what should the program output if you put in "connect weasel weasel" assuming "weasel" is in the dictionary? I can think of three options, but I'm not sure which you want.

Second, my entry is progressing nicely. At this point, other than the above question and some debugging, the only bit that isn't finished is the actual interesting case, which admittedly is the minority of the code for this thing written in C:) (I've been doing this in my spare moments.)

--
'God dammit, your posts make me hard.' --LilDebbie

'weasel' by jacob, 04/24/2002 03:13:37 PM EST (5.00 / 2)

(assuming 'weasel' is in the dictionary) (nt) by jacob, 04/24/2002 03:14:44 PM EST (5.00 / 1)

Incredibly familiar (5.00 / 2) (#84)
by X3nocide on Wed Apr 24, 2002 at 11:32:39 PM EST

In my Discrete Math homework I just worked on, one of the concepts dealt with was the n dimensional cube and a perculiar definition of it as a graph where each node represents a n digit sequence and is connected to every node that differs by 1 digit. The intersting thing about it is that the shortest path between two nodes is guarenteed to be at most n. Of course this could be a problem if there aren't enough connections.

Just thought I'd throw out an interesting idea to try if getting the challenge done isn't hard enough, and to see what other people think about optimal representation.

pwnguin.net

Hypercube routing by KWillets, 04/25/2002 12:09:46 AM EST (5.00 / 2)
Binary Gray Codes by Logan, 04/25/2002 03:25:10 AM EST (5.00 / 1)
N by djotto, 04/25/2002 03:28:46 AM EST (5.00 / 1)

Prove I can't code (5.00 / 2) (#86)
by alge on Thu Apr 25, 2002 at 12:13:49 AM EST

(can't code but trying to learn) (=
Hmm. My algos are pretty damned stupid and I don't know ruby too well..
but here you are (:
#!/usr/bin/ruby class WordConnect def initialize @done = false @dict = Array.new until @done read_input end end def read_input puts "? " line = gets case line.split[0] when "add-dictionary" if add_dict( line.split[1] ) puts "Added dictionary" else puts "Error adding " + line.split[1] end when "connect" puts connect( line.split[1], line.split[2] ) when "exit" @done = true end end def add_dict( filename ) begin File.open( filename ) { |file| file.each_line { |line| @dict << line.split[0] } } rescue return false end return true end def connect( start, stop ) @could_not_connect = "Could not connect " + start + " and " + stop + "." unless start.length == stop.length and @dict.include?(start) and @dict.include?(stop) return @could_not_connect end # Remove all elements from our dictionary witch don't match # the length of start (or stop, they're equally long). array = @dict.dup @dict.each { |element| unless element.length == start.length array = array - element.to_a end } @maxlevel = 1; until @maxlevel > 10 if connect_recurse(array.dup, start, stop, 0) return start + " > " + @ret end @maxlevel = @maxlevel + 1 end @could_not_connect end def match?( word1, word2 ) diff = 0 0.step(word1.length - 1, 1) { |index| unless word1[index] == word2[index] diff = diff.next end } if diff < 2 then true else false end end # recursive method. # first we remove prune from array, # then we iterate over array until we match stop def connect_recurse( array, prune, stop, level ) @ret = "" level = level.next if level > @maxlevel then return false end array.delete(prune) array.each { |element| if match?(prune, stop) @ret = stop return true end if match?(element, prune) if connect_recurse(array, element, stop, level) @ret = element + " > " + @ret return true end end } return false end end WordConnect.new
_{vi er ikke lenger elsket her}

damn, by alge, 04/25/2002 12:23:00 AM EST (5.00 / 1)

my implementation in C (5.00 / 3) (#87)
by svillee on Thu Apr 25, 2002 at 12:22:23 AM EST

#include <stdio.h> #include <stdlib.h> #include <string.h> #define MAX_INPUT_LENGTH 4096 #define MAX_VERB_LENGTH 32 #define MAX_WORD_LENGTH 1024 typedef struct Word_s { struct Word_s *next; struct Word_s *left; struct Word_s *right; int balance; struct Links_s *links; char name[1]; } Word_t; typedef struct Link_s { struct Link_s *nextSameSource; Word_t *destination; } Link_t; typedef struct Links_s { struct Links_s *next; Word_t *source; Link_t *firstLink; } Links_t; typedef struct Partial_s { struct Partial_s *next; Word_t *word; } Partial_t; typedef enum { CHAIN_IMPOSSIBLE, CHAIN_TOO_SHORT, CHAIN_FOUND } ChainResult_t; static int length; static Word_t *target1; static Links_t *firstLinks; static Word_t *firstByLength[MAX_WORD_LENGTH+1]; static Word_t *topByLength[MAX_WORD_LENGTH+1]; static char input[MAX_INPUT_LENGTH+1]; static char verb[MAX_VERB_LENGTH+1]; static char word1[MAX_WORD_LENGTH+1]; static char word2[MAX_WORD_LENGTH+1]; static Word_t *findWord(char *name) { Word_t *wp; int m; length = strlen(name); wp = topByLength[length]; while (wp != (Word_t *) 0) { m = strcmp(name, wp->name); if (m == 0) break; if (m < 0) { wp = wp->left; } else { wp = wp->right; } } return wp; } void insertNode( Word_t *newNode, Word_t **where, int *outerGrew) { Word_t *nodeA; Word_t *nodeB; Word_t *nodeC; int innerGrew; nodeA = *where; if (nodeA == (Word_t *) 0) { /* The tree was empty, so make the new node the only one. */ newNode->left = (Word_t *) 0; newNode->right = (Word_t *) 0; newNode->balance = 0; *where = newNode; *outerGrew = 1; } else if (strcmp(newNode->name, nodeA->name) < 0) { /* Add the new node to the left subtree. */ insertNode(newNode, &nodeA->left, &innerGrew); if (!innerGrew) { *outerGrew = 0; } else if (nodeA->balance == 1) { /* The right subtree height was 1 more, and now they are the same. */ nodeA->balance = 0; *outerGrew = 0; } else if (nodeA->balance == 0) { /* The subtree heights were the same, and now the left subtree */ /* height is 1 more. */ nodeA->balance = -1; *outerGrew = 1; } else { /* The left subtree height was 1 more, and now it is 2 more, */ /* so we must rebalance. Examine the left subtree. */ nodeB = nodeA->left; nodeC = nodeB->right; if (nodeB->balance == -1) { /* A B */ /* / \ / \ */ /* B Z ---> W A */ /* / \ / \ */ /* W C C Z */ nodeA->left = nodeC; nodeA->balance = 0; nodeB->right = nodeA; nodeB->balance = 0; *where = nodeB; } else { /* A C */ /* / \ / \ */ /* B Z ---> B A */ /* / \ / \ / \ */ /* W C W X Y Z */ /* / \ */ /* X Y */ nodeB->right = nodeC->left; if (nodeC->balance == 1) { nodeB->balance = -1; } else { nodeB->balance = 0; } nodeA->left = nodeC->right; if (nodeC->balance == -1) { nodeA->balance = 1; } else { nodeA->balance = 0; } nodeC->left = nodeB; nodeC->right = nodeA; nodeC->balance = 0; *where = nodeC; } *outerGrew = 0; } } else { /* Add the new node to the right subtree. */ insertNode(newNode, &nodeA->right, &innerGrew); if (!innerGrew) { *outerGrew = 0; } else if (nodeA->balance == -1) { /* The left subtree height was 1 more, and now they are the same. */ nodeA->balance = 0; *outerGrew = 0; } else if (nodeA->balance == 0) { /* The subtree heights were the same, and now the right subtree */ /* height is 1 more. */ nodeA->balance = 1; *outerGrew = 1; } else { /* The right subtree height was 1 more, and now it is 2 more, */ /* so we must rebalance. Examine the right subtree. */ nodeB = nodeA->right; nodeC = nodeB->left; if (nodeB->balance == 1) { /* A B */ /* / \ / \ */ /* W B ---> A Z */ /* / \ / \ */ /* C Z W C */ nodeA->right = nodeC; nodeA->balance = 0; nodeB->left = nodeA; nodeB->balance = 0; *where = nodeB; } else { /* A C */ /* / \ / \ */ /* W B ---> A B */ /* / \ / \ / \ */ /* C Z W X Y Z */ /* / \ */ /* X Y */ nodeB->left = nodeC->right; if (nodeC->balance == -1) { nodeB->balance = 1; } else { nodeB->balance = 0; } nodeA->right = nodeC->left; if (nodeC->balance == 1) { nodeA->balance = -1; } else { nodeA->balance = 0; } nodeC->left = nodeA; nodeC->right = nodeB; nodeC->balance = 0; *where = nodeC; } *outerGrew = 0; } } } static void doAddDictionary(void) { FILE *fp; int grew; Links_t *links1; Links_t *links2; Link_t *link1; Link_t *link2; Word_t *wp; if (firstLinks != (Links_t *) 0) { /* Discard any links calculated so far. */ links1 = firstLinks; while (links1 != (Links_t *) 0) { links2 = links1->next; link1 = links1->firstLink; while (link1 != (Link_t *) 0) { link2 = link1->nextSameSource; free(link1); link1 = link2; } links1->source->links = (Links_t *) 0; free(links1); links1 = links2; } firstLinks = (Links_t *) 0; } fp = fopen(word1, "r"); if (fp == (FILE *) 0) { fprintf(stderr, "Error adding %s\n", word1); return; } while (fscanf(fp, "%1024s", word2) > 0) { /* See if the word is already in the dictionary. */ if (findWord(word2) != (Word_t *) 0) continue; /* Create the new word. */ wp = (Word_t *) malloc(sizeof (Word_t) + length - 1); if (wp == (Word_t *) 0) { fprintf(stderr, "out of memory for word\n"); exit(-1); } wp->next = firstByLength[length]; wp->links = (Links_t *) 0; strcpy(wp->name, word2); firstByLength[length] = wp; /* Add it to the balanced tree. */ insertNode(wp, &topByLength[length], &grew); } fclose(fp); printf("Added %s\n", word1); } static void findLinks(Word_t *wp) { char *sp; char *dp; Word_t *wp2; Link_t *link; Links_t *links; links = (Links_t *) malloc(sizeof (Links_t)); if (links == (Links_t *) 0) { fprintf(stderr, "out of memory for links\n"); exit(-1); } links->next = firstLinks; links->source = wp; links->firstLink = (Link_t *) 0; firstLinks = links; wp->links = links; /* Look for any existing words that differ by one letter. */ wp2 = firstByLength[length]; while (wp2 != (Word_t *) 0) { if (wp2 != wp) { sp = wp->name; dp = wp2->name; while (*sp == *dp) { ++sp; ++dp; } if (strcmp(sp + 1, dp + 1) == 0) { /* They differ by one letter, so create a link. */ link = (Link_t *) malloc(sizeof (Link_t)); if (link == (Link_t *) 0) { fprintf(stderr, "out of memory for link\n"); exit(-1); } link->nextSameSource = links->firstLink; link->destination = wp2; links->firstLink = link; } } wp2 = wp2->next; } } static ChainResult_t chain(Partial_t *partial, int chainLength) { int nextLength; ChainResult_t innerResult; ChainResult_t outerResult; Link_t *lp; Partial_t *ap; Partial_t nextPartial; nextLength = chainLength - 1; if (nextLength == 0) { lp = partial->word->links->firstLink; while (lp != (Link_t *) 0) { if (lp->destination == target1) { printf("%s", target1->name); ap = partial; while (ap != (Partial_t *) 0) { printf(" > %s", ap->word->name); ap = ap->next; } putchar('\n'); return CHAIN_FOUND; } lp = lp->nextSameSource; } return CHAIN_TOO_SHORT; } outerResult = CHAIN_IMPOSSIBLE; nextPartial.next = partial; lp = partial->word->links->firstLink; while (lp != (Link_t *) 0) { nextPartial.word = lp->destination; if (nextPartial.word->links == (Links_t *) 0) findLinks(nextPartial.word); ap = partial; while (ap != (Partial_t *) 0) { if (ap->word == nextPartial.word) break; ap = ap->next; } if (ap == (Partial_t *) 0) { innerResult = chain(&nextPartial, nextLength); if (innerResult == CHAIN_FOUND) return CHAIN_FOUND; if (innerResult == CHAIN_TOO_SHORT) outerResult = CHAIN_TOO_SHORT; } lp = lp->nextSameSource; } return outerResult; } static void doConnect(void) { int length1; int chainLength; ChainResult_t chainResult; Word_t *target2; Partial_t lastPartial; target1 = findWord(word1); length1 = length; target2 = findWord(word2); if ((target1 != (Word_t *) 0) && (target2 != (Word_t *) 0) && (length1 == length)) { if (target1 == target2) { printf("%s\n", word1); return; } if (target2->links == (Links_t *) 0) findLinks(target2); if (target2->links->firstLink != (Link_t *) 0) { lastPartial.next = (Partial_t *) 0; lastPartial.word = target2; chainLength = 1; while (1) { chainResult = chain(&lastPartial, chainLength); if (chainResult == CHAIN_FOUND) return; if (chainResult == CHAIN_IMPOSSIBLE) break; ++chainLength; } } } printf("No way to connect %s to %s.\n", word1, word2); } int main(int argc, char **argv) { int i; while (1) { printf("? "); input[0] = '\0'; fgets(input, MAX_INPUT_LENGTH+1, stdin); verb[0] = '\0'; word1[0] = '\0'; word2[0] = '\0'; sscanf(input, "%32s %1024s %1024s", verb, word1, word2); if (strcmp(verb, "add-dictionary") == 0) { doAddDictionary(); } else if (strcmp(verb, "connect") == 0) { doConnect(); } else if (strcmp(verb, "exit") == 0) { break; } } return 0; }

ASCII art comments by KWillets, 04/25/2002 03:14:25 AM EST (5.00 / 1)

I think that it is a red-black (nt) by Cal Bunny, 04/25/2002 03:28:39 AM EST (4.00 / 1)

Ignore that I'm on crack. Some height/AVL tree? by Cal Bunny, 04/25/2002 03:52:34 AM EST (4.00 / 1)

quick bug fix by svillee, 04/25/2002 07:43:52 AM EST (5.00 / 1)

B-Tree != AVL Tree, as far as I know by Cal Bunny, 04/25/2002 05:33:30 PM EST (4.00 / 1)

yes on B-trees, not yet sold on skip lists by svillee, 04/25/2002 08:11:38 PM EST (5.00 / 1)

Tabs? by tmoertel, 04/25/2002 11:27:28 AM EST (5.00 / 1)

my bad by svillee, 04/25/2002 07:52:53 PM EST (5.00 / 1)

My Slow Haskell Implementation (5.00 / 1) (#93)
by Logan on Thu Apr 25, 2002 at 03:40:19 AM EST

Here is my slow Haskell implementation. I believe it is somewhat slower than tmoertel's implementation. To compile, run

ghc -package data -O2 -o pfc4 pfc4.hs

My code could use a little commenting, and perhaps I will explain my method in a followup (although it is somewhat straight-forward).

module Main where import FiniteMap import IO import List import Maybe type Dictionary = FiniteMap String [(String, [String])] type UberDictionary = FiniteMap Int Dictionary main :: IO () main = do parser (emptyFM) return () prompt :: IO String prompt = do putStr "? " hFlush stdout getLine parser :: UberDictionary -> IO UberDictionary parser s = do line <- prompt runcom s line runcom :: UberDictionary -> String -> IO UberDictionary runcom udict line | com == "add-dictionary" = do { udict' <- addDictionary udict args; parser udict' } | com == "connect" = do { udict' <- connect udict args; parser udict' } | com == "exit" = return udict | otherwise = do { putStrLn "Huh?"; parser udict } where (com:args) = words line addDictionary :: UberDictionary -> [String] -> IO UberDictionary addDictionary udict [file] = do h <- openFile file ReadMode text <- hGetContents h return (loadDict $ lines text) addDictionary s _ = do { putStrLn "Huh?"; return s} loadDict :: [String] -> UberDictionary loadDict content = foldl addWord (emptyFM) content addWord :: UberDictionary -> String -> UberDictionary addWord udict word | elemFM len udict = addToFM udict len dict' | otherwise = addToFM udict len (addToFM (emptyFM) word []) where len = length word (Just dict) = lookupFM udict len dict' = addToFM dict word [] connect :: UberDictionary -> [String] -> IO UberDictionary connect udict [w1, w2] | length w1 /= length w2 = failMessage | not (elemFM w1 dict && elemFM w2 dict) = failMessage | maybeCachedAnswer /= Nothing = do { putStrLn (output cachedAnswer); return udict } | null answers = failMessage | otherwise = do { putStrLn (output $ head answers); return udict' } where failMessage = do { putStrLn ("No way to connect " ++ w1 ++ " to " ++ w2); return udict } dict = lookupWithDefaultFM udict (emptyFM) (length w1) (Just cache) = lookupFM dict w1 maybeCachedAnswer = lookup w2 cache (Just cachedAnswer) = maybeCachedAnswer answers = findAnswer dict w2 [[w1]] output = concat . (intersperse " > ") . reverse udict' = addToFM udict (length w1) (addToFM dict w1 ((w2, (head answers)) : cache)) connect udict _ = do { putStrLn "Huh?"; return udict } findAnswer :: Dictionary -> String -> [[String]] -> [[String]] findAnswer _ _ [] = [] findAnswer dict word paths | (not . null) answers = answers | otherwise = findAnswer dict' word newPaths where frontier = map head paths dict' = delListFromFM dict frontier answers = filter (word `elem`) paths newPaths = concat $ map (getNeighbors dict) paths getNeighbors :: Dictionary -> [String] -> [[String]] getNeighbors dict path = map (:path) $ filter ((flip elemFM) dict) xforms where (word:_) = path len = length word - 1 xforms = [ setPos word x l | x <- [0..len], l <- ['a'..'z'] ] setPos :: String -> Int -> Char -> String setPos word x l = prefix ++ (l : suffix) where (prefix, (_:suffix)) = splitAt x word

A small test case. (5.00 / 1) (#95)
by i on Thu Apr 25, 2002 at 04:03:29 AM EST

add-dictionary WORD.LST

connect make love

connect bread wheat

connect resist futile

connect effaces cabaret

WORD.LST is linked somewhere in this diary.
^{^{—
and we have a contradicton according to our assumptions and the factor theorem}}

My answers (and a Q regarding the spec) by tmoertel, 04/25/2002 11:53:37 AM EST (5.00 / 1)

Sorry, forgot the exit. by i, 04/25/2002 12:10:38 PM EST (5.00 / 1)

Correct by jacob, 04/25/2002 12:16:56 PM EST (5.00 / 1)

Enough is enough. (5.00 / 1) (#106)
by i on Thu Apr 25, 2002 at 01:41:58 PM EST

Here's the final version. It's still slower than tmoertel's, and has a stack-leak somewhere, but at least it runs my testcase in under 1 minute. (Changing FiniteMap to StrictFiniteMap fixes the space leak but makes it even slower; the inverse operation on tmoertel's entry both introduces a space leak and makes it slower. Strictness is so much fun.) So I'm leaving it as is. Enjoy!

--------------------- ladder.hs -------------

-- PFC4 entry: username = "i" -- compilation (on Un*x): ghc -O2 ladder.hs -o ladder -package data -- You probably need at least GHC5.0x -- run: ladder [+RTS <runtime system switches>] -- runtime switches help: ladder +RTS -\? -- Fiddling with them may improve performance, but only marginally -- -- with Hugs: hugs ladder.hs -- with GHCi: ghci ladder.hs -package data module Main where import StrictFiniteMap (FiniteMap, addToFM, addToFM_C, emptyFM, lookupFM, plusFM_C, lookupWithDefaultFM) import Maybe (catMaybes) import IO import qualified Set as S import qualified SimpleQueue as Q -- algorithms + data structures splits :: String -> [String] splits [] = [] splits (x:xs) = ('?':xs) : map (x:) (splits xs) type LDict = FiniteMap String [String] type Dict = FiniteMap Int LDict -- our dictionary maps word length ro the real dictionary -- the real dictionary maps a splitword to a list of words -- like this: "c?t" ==> ["cat", "cut"] addWord :: Dict -> String -> Dict addWord dict word = addToFM_C combiner dict (length word) unitDict where combiner :: LDict -> LDict -> LDict combiner = plusFM_C (++) unitDict = foldl' addWord' emptyFM (splits word) addWord' dict splitWord = addToFM dict splitWord [word] -- we cannot find directly whether a word is in the dictionary! -- so we have to look up its adjacents and find if it's -- adjacent to itself! This is completely unnesseccary, -- but challenge rules say so, amd we must comply. -- Pattern quards would be handy here findPath :: String -> String -> Dict -> [String] findPath from to dict | len /= len' = [] | not $ elem from (adjacents ldict from) = [] | otherwise = findPath' from to ldict where ldict = lookupWithDefaultFM dict emptyFM len findPath' :: String -> String -> LDict -> [String] findPath' from to dict = findChain (adjacents dict) from to len = length from len' = length to -- A list of words reachable from a given word in one step adjacents :: LDict -> String -> [String] adjacents dict word = maybes $ map (lookupFM dict) (splits word) where maybes = concat . catMaybes -- handle user interaction addDict :: FilePath -> Dict -> IO Dict addDict path dict = catch addDict' report where addDict' = do handle <- openFile path ReadMode contents <- hGetContents handle let dict' = foldl' addWord dict (words contents) putStrLn $ "Added " ++ path return dict' report _ = putStrLn ("Error adding " ++ path) >> return dict connect :: Dict -> String -> String -> IO () connect dict word1 word2 = do let path = findPath word1 word2 dict case path of [] -> putStrLn $ "No way to connect " ++ word1 ++ " to " ++ word2 xs -> putStrLn $ showPath xs where showPath = foldr1 (\a b -> a ++ " > " ++ b) doCommands :: Dict -> IO Dict doCommands dict = do putStr "? " hFlush stdout command <- getLine case (words command) of ['e':_] -> return dict ['a':_, path] -> addDict path dict >>= doCommands ['c':_, word1, word2] -> connect dict word1 word2 >> doCommands dict _ -> doCommands dict main :: IO () main = catch (doCommands emptyFM >> return ()) (\_->return ()) -- This section implements the standard BFS algorithm findChain :: Ord a => (a->[a]) -> a -> a -> [a] findChain f from to = reverse $ first chain where chain = dropWhile notFound $ bfsPaths $ toPTree f from first (x:_) = x first [] = [] notFound [] = True notFound xs = to /= head xs -- Parented tree, so we can walk from a node back to root data PTree a = PNil | PTree (PTree a) a [PTree a] -- Functional tree to parented tree toPTree :: (a->[a]) -> a -> PTree a toPTree f a = toPTree' f PNil a where toPTree' f p a = tree where tree = PTree p a trees trees = map (toPTree' f tree) (f a) -- The BFS itself bfsPaths :: Ord a => PTree a -> [[a]] bfsPaths PNil = [] bfsPaths tree@(PTree parent node branches) = map walkPath paths where (paths,_) = bfs (Q.single tree) (S.emptySet) bfs :: Ord a => Q.Seq (PTree a) -> S.Set a -> ([PTree a], S.Set a) bfs queue set = case Q.lview queue of Q.Just2 (tree@(PTree p n bs)) queue' -> if S.elementOf n set then bfs queue' set else (tree : trees, set'') where (trees, set'') = bfs (foldl' Q.snoc queue' bs) set' set' = S.addToSet set n Q.Just2 (PNil) queue' -> bfs queue' set Q.Nothing2 -> ([], set) walkPath :: PTree a -> [a] walkPath PNil = [] walkPath (PTree p a _) = a:walkPath p -- foldl' (why oh why isn't it standard?) foldl' :: (a -> b -> a) -> a -> [b] -> a foldl' f a [] = a foldl' f a (x:xs) = (foldl' f $! f a x) xs
^{^{—
and we have a contradicton according to our assumptions and the factor theorem}}

faster but still slower than tmoertel's (5.00 / 1) (#110)
by svillee on Thu Apr 25, 2002 at 10:48:12 PM EST

/* benchmark input: */ /* add-dictionary word.lst */ /* connect make love */ /* connect bread wheat */ /* connect resist futile */ /* connect effaces cabaret */ /* exit */ /* time on a Celeron 700 MHz: */ /* 67.72user 0.05system 1:09.80elapsed 97%CPU (0avgtext+0avgdata 0maxresident)k */ /* 0inputs+0outputs (111major+2280minor)pagefaults 0swaps */ #include <stdio.h> #include <stdlib.h> #include <string.h> #define MAX_INPUT_LENGTH 4096 #define MAX_VERB_LENGTH 32 #define MAX_WORD_LENGTH 1024 typedef struct Word_s { struct Word_s *next; struct Word_s *left; struct Word_s *right; int balance; struct Links_s *links; char name[1]; } Word_t; typedef struct Link_s { struct Link_s *nextSameSource; Word_t *destination; /* The next two fields are meaningful only for the current connect. */ struct Link_s *nextInOrbit; struct Link_s *sourceLink; } Link_t; typedef struct Links_s { struct Links_s *next; Word_t *source; Link_t *firstLink; } Links_t; static int length; static Links_t *firstLinks; static Word_t *firstByLength[MAX_WORD_LENGTH+1]; static Word_t *topByLength[MAX_WORD_LENGTH+1]; static char input[MAX_INPUT_LENGTH+1]; static char verb[MAX_VERB_LENGTH+1]; static char word1[MAX_WORD_LENGTH+1]; static char word2[MAX_WORD_LENGTH+1]; static Word_t *findWord(char *name) { Word_t *wp; int m; length = strlen(name); wp = topByLength[length]; while (wp != (Word_t *) 0) { m = strcmp(name, wp->name); if (m == 0) break; if (m < 0) { wp = wp->left; } else { wp = wp->right; } } return wp; } void insertNode( Word_t *newNode, Word_t **where, int *outerGrew) { Word_t *nodeA; Word_t *nodeB; Word_t *nodeC; int innerGrew; nodeA = *where; if (nodeA == (Word_t *) 0) { /* The tree was empty, so make the new node the only one. */ newNode->left = (Word_t *) 0; newNode->right = (Word_t *) 0; newNode->balance = 0; *where = newNode; *outerGrew = 1; } else if (strcmp(newNode->name, nodeA->name) < 0) { /* Add the new node to the left subtree. */ insertNode(newNode, &nodeA->left, &innerGrew); if (!innerGrew) { *outerGrew = 0; } else if (nodeA->balance == 1) { /* The right subtree height was 1 more, and now they are the same. */ nodeA->balance = 0; *outerGrew = 0; } else if (nodeA->balance == 0) { /* The subtree heights were the same, and now the left subtree */ /* height is 1 more. */ nodeA->balance = -1; *outerGrew = 1; } else { /* The left subtree height was 1 more, and now it is 2 more, */ /* so we must rebalance. Examine the left subtree. */ nodeB = nodeA->left; nodeC = nodeB->right; if (nodeB->balance == -1) { /* A B */ /* / \ / \ */ /* B Z ---> W A */ /* / \ / \ */ /* W C C Z */ nodeA->left = nodeC; nodeA->balance = 0; nodeB->right = nodeA; nodeB->balance = 0; *where = nodeB; } else { /* A C */ /* / \ / \ */ /* B Z ---> B A */ /* / \ / \ / \ */ /* W C W X Y Z */ /* / \ */ /* X Y */ nodeB->right = nodeC->left; if (nodeC->balance == 1) { nodeB->balance = -1; } else { nodeB->balance = 0; } nodeA->left = nodeC->right; if (nodeC->balance == -1) { nodeA->balance = 1; } else { nodeA->balance = 0; } nodeC->left = nodeB; nodeC->right = nodeA; nodeC->balance = 0; *where = nodeC; } *outerGrew = 0; } } else { /* Add the new node to the right subtree. */ insertNode(newNode, &nodeA->right, &innerGrew); if (!innerGrew) { *outerGrew = 0; } else if (nodeA->balance == -1) { /* The left subtree height was 1 more, and now they are the same. */ nodeA->balance = 0; *outerGrew = 0; } else if (nodeA->balance == 0) { /* The subtree heights were the same, and now the right subtree */ /* height is 1 more. */ nodeA->balance = 1; *outerGrew = 1; } else { /* The right subtree height was 1 more, and now it is 2 more, */ /* so we must rebalance. Examine the right subtree. */ nodeB = nodeA->right; nodeC = nodeB->left; if (nodeB->balance == 1) { /* A B */ /* / \ / \ */ /* W B ---> A Z */ /* / \ / \ */ /* C Z W C */ nodeA->right = nodeC; nodeA->balance = 0; nodeB->left = nodeA; nodeB->balance = 0; *where = nodeB; } else { /* A C */ /* / \ / \ */ /* W B ---> A B */ /* / \ / \ / \ */ /* C Z W X Y Z */ /* / \ */ /* X Y */ nodeB->left = nodeC->right; if (nodeC->balance == -1) { nodeB->balance = 1; } else { nodeB->balance = 0; } nodeA->right = nodeC->left; if (nodeC->balance == 1) { nodeA->balance = -1; } else { nodeA->balance = 0; } nodeC->left = nodeA; nodeC->right = nodeB; nodeC->balance = 0; *where = nodeC; } *outerGrew = 0; } } } static void doAddDictionary(void) { FILE *fp; int grew; Links_t *links1; Links_t *links2; Link_t *link1; Link_t *link2; Word_t *wp; if (firstLinks != (Links_t *) 0) { /* Discard any links calculated so far. */ links1 = firstLinks; while (links1 != (Links_t *) 0) { links2 = links1->next; link1 = links1->firstLink; while (link1 != (Link_t *) 0) { link2 = link1->nextSameSource; free(link1); link1 = link2; } links1->source->links = (Links_t *) 0; free(links1); links1 = links2; } firstLinks = (Links_t *) 0; } fp = fopen(word1, "r"); if (fp == (FILE *) 0) { fprintf(stderr, "Error adding %s\n", word1); return; } while (fscanf(fp, "%1024s", word2) > 0) { /* See if the word is already in the dictionary. */ if (findWord(word2) != (Word_t *) 0) continue; /* Create the new word. */ wp = (Word_t *) malloc(sizeof (Word_t) + length); if (wp == (Word_t *) 0) { fprintf(stderr, "out of memory for word\n"); exit(-1); } wp->next = firstByLength[length]; wp->links = (Links_t *) 0; strcpy(wp->name, word2); firstByLength[length] = wp; /* Add it to the balanced tree. */ insertNode(wp, &topByLength[length], &grew); } fclose(fp); printf("Added %s\n", word1); } static void findLinks(Word_t *wp) { char *sp; char *dp; Word_t *wp2; Link_t *link; Links_t *links; links = (Links_t *) malloc(sizeof (Links_t)); if (links == (Links_t *) 0) { fprintf(stderr, "out of memory for links\n"); exit(-1); } links->next = firstLinks; links->source = wp; links->firstLink = (Link_t *) 0; firstLinks = links; wp->links = links; /* Look for any existing words that differ by one letter. */ wp2 = firstByLength[length]; while (wp2 != (Word_t *) 0) { if (wp2 != wp) { sp = wp->name; dp = wp2->name; while (*sp == *dp) { ++sp; ++dp; } if (strcmp(sp + 1, dp + 1) == 0) { /* They differ by one letter, so create a link. */ link = (Link_t *) malloc(sizeof (Link_t)); if (link == (Link_t *) 0) { fprintf(stderr, "out of memory for link\n"); exit(-1); } link->nextSameSource = links->firstLink; link->destination = wp2; link->nextInOrbit = (Link_t *) 0; link->sourceLink = (Link_t *) 0; links->firstLink = link; } } wp2 = wp2->next; } } static void doConnect(void) { int length1; Word_t *target1; Word_t *target2; Word_t *wp; Links_t *links; Link_t *link; Link_t *link2; Link_t *firstInOrbit; Link_t endMark; target1 = findWord(word1); length1 = length; target2 = findWord(word2); if ((target1 != (Word_t *) 0) && (target2 != (Word_t *) 0) && (length1 == length)) { if (target1 == target2) { printf("%s\n", word1); return; } /* Reset any existing links. */ links = firstLinks; while (links != (Links_t *) 0) { link = links->firstLink; while (link != (Link_t *) 0) { link->nextInOrbit = (Link_t *) 0; link->sourceLink = (Link_t *) 0; link = link->nextSameSource; } links = links->next; } /* Set up a dummy link as the chain end mark. */ endMark.nextSameSource = (Link_t *) 0; endMark.destination = target2; endMark.nextInOrbit = (Link_t *) 0; endMark.sourceLink = (Link_t *) 0; /* Build successive orbits from target2 until target1 */ /* is reached or the outermost orbit is empty. */ firstInOrbit = &endMark; while (firstInOrbit != (Link_t *) 0) { link = firstInOrbit; firstInOrbit = (Link_t *) 0; while (link != (Link_t *) 0) { wp = link->destination; if (wp->links == (Links_t *) 0) findLinks(wp); link2 = wp->links->firstLink; while (link2 != (Link_t *) 0) { if (link2->sourceLink == (Link_t *) 0) { if (link2->destination == target1) { printf("%s", target1->name); link2 = link; while (link2 != (Link_t *) 0) { printf(" > %s", link2->destination->name); link2 = link2->sourceLink; } putchar('\n'); return; } link2->sourceLink = link; link2->nextInOrbit = firstInOrbit; firstInOrbit = link2; } link2 = link2->nextSameSource; } link = link->nextInOrbit; } } } printf("No way to connect %s to %s.\n", word1, word2); } int main(int argc, char **argv) { int i; while (1) { printf("? "); input[0] = '\0'; fgets(input, MAX_INPUT_LENGTH+1, stdin); verb[0] = '\0'; word1[0] = '\0'; word2[0] = '\0'; sscanf(input, "%32s %1024s %1024s", verb, word1, word2); if (strcmp(verb, "add-dictionary") == 0) { doAddDictionary(); } else if (strcmp(verb, "connect") == 0) { doConnect(); } else if (strcmp(verb, "exit") == 0) { break; } } return 0; }

version 3, finally faster than tmoertel's!!! (5.00 / 1) (#111)
by svillee on Fri Apr 26, 2002 at 12:51:46 AM EST

/* benchmark input: */ /* add-dictionary word.lst */ /* connect make love */ /* connect bread wheat */ /* connect resist futile */ /* connect effaces cabaret */ /* exit */ /* time on a Celeron 700 MHz: */ /* 3.51user 0.05system 0:03.69elapsed 96%CPU (0avgtext+0avgdata 0maxresident)k */ /* 0inputs+0outputs (111major+2098minor)pagefaults 0swaps */ #include <stdio.h> #include <stdlib.h> #include <string.h> #define MAX_INPUT_LENGTH 4096 #define MAX_VERB_LENGTH 32 #define MAX_WORD_LENGTH 1024 typedef struct Word_s { struct Word_s *left; struct Word_s *right; int balance; struct Links_s *links; char name[1]; } Word_t; typedef struct Link_s { struct Link_s *nextSameSource; Word_t *destination; /* The next two fields are meaningful only for the current connect. */ struct Link_s *nextInOrbit; struct Link_s *sourceLink; } Link_t; typedef struct Links_s { struct Links_s *next; Word_t *source; Link_t *firstLink; } Links_t; static int length; static Links_t *firstLinks; static Word_t *topByLength[MAX_WORD_LENGTH+1]; static char input[MAX_INPUT_LENGTH+1]; static char verb[MAX_VERB_LENGTH+1]; static char word1[MAX_WORD_LENGTH+1]; static char word2[MAX_WORD_LENGTH+1]; static char word3[MAX_WORD_LENGTH+1]; static Word_t *findWord(char *name) { Word_t *wp; int m; length = strlen(name); wp = topByLength[length]; while (wp != (Word_t *) 0) { m = strcmp(name, wp->name); if (m == 0) break; if (m < 0) { wp = wp->left; } else { wp = wp->right; } } return wp; } void insertNode( Word_t *newNode, Word_t **where, int *outerGrew) { Word_t *nodeA; Word_t *nodeB; Word_t *nodeC; int innerGrew; nodeA = *where; if (nodeA == (Word_t *) 0) { /* The tree was empty, so make the new node the only one. */ newNode->left = (Word_t *) 0; newNode->right = (Word_t *) 0; newNode->balance = 0; *where = newNode; *outerGrew = 1; } else if (strcmp(newNode->name, nodeA->name) < 0) { /* Add the new node to the left subtree. */ insertNode(newNode, &nodeA->left, &innerGrew); if (!innerGrew) { *outerGrew = 0; } else if (nodeA->balance == 1) { /* The right subtree height was 1 more, and now they are the same. */ nodeA->balance = 0; *outerGrew = 0; } else if (nodeA->balance == 0) { /* The subtree heights were the same, and now the left subtree */ /* height is 1 more. */ nodeA->balance = -1; *outerGrew = 1; } else { /* The left subtree height was 1 more, and now it is 2 more, */ /* so we must rebalance. Examine the left subtree. */ nodeB = nodeA->left; nodeC = nodeB->right; if (nodeB->balance == -1) { /* A B */ /* / \ / \ */ /* B Z ---> W A */ /* / \ / \ */ /* W C C Z */ nodeA->left = nodeC; nodeA->balance = 0; nodeB->right = nodeA; nodeB->balance = 0; *where = nodeB; } else { /* A C */ /* / \ / \ */ /* B Z ---> B A */ /* / \ / \ / \ */ /* W C W X Y Z */ /* / \ */ /* X Y */ nodeB->right = nodeC->left; if (nodeC->balance == 1) { nodeB->balance = -1; } else { nodeB->balance = 0; } nodeA->left = nodeC->right; if (nodeC->balance == -1) { nodeA->balance = 1; } else { nodeA->balance = 0; } nodeC->left = nodeB; nodeC->right = nodeA; nodeC->balance = 0; *where = nodeC; } *outerGrew = 0; } } else { /* Add the new node to the right subtree. */ insertNode(newNode, &nodeA->right, &innerGrew); if (!innerGrew) { *outerGrew = 0; } else if (nodeA->balance == -1) { /* The left subtree height was 1 more, and now they are the same. */ nodeA->balance = 0; *outerGrew = 0; } else if (nodeA->balance == 0) { /* The subtree heights were the same, and now the right subtree */ /* height is 1 more. */ nodeA->balance = 1; *outerGrew = 1; } else { /* The right subtree height was 1 more, and now it is 2 more, */ /* so we must rebalance. Examine the right subtree. */ nodeB = nodeA->right; nodeC = nodeB->left; if (nodeB->balance == 1) { /* A B */ /* / \ / \ */ /* W B ---> A Z */ /* / \ / \ */ /* C Z W C */ nodeA->right = nodeC; nodeA->balance = 0; nodeB->left = nodeA; nodeB->balance = 0; *where = nodeB; } else { /* A C */ /* / \ / \ */ /* W B ---> A B */ /* / \ / \ / \ */ /* C Z W X Y Z */ /* / \ */ /* X Y */ nodeB->left = nodeC->right; if (nodeC->balance == -1) { nodeB->balance = 1; } else { nodeB->balance = 0; } nodeA->right = nodeC->left; if (nodeC->balance == 1) { nodeA->balance = -1; } else { nodeA->balance = 0; } nodeC->left = nodeA; nodeC->right = nodeB; nodeC->balance = 0; *where = nodeC; } *outerGrew = 0; } } } static void doAddDictionary(void) { FILE *fp; int grew; Links_t *links1; Links_t *links2; Link_t *link1; Link_t *link2; Word_t *wp; if (firstLinks != (Links_t *) 0) { /* Discard any links calculated so far. */ links1 = firstLinks; while (links1 != (Links_t *) 0) { links2 = links1->next; link1 = links1->firstLink; while (link1 != (Link_t *) 0) { link2 = link1->nextSameSource; free(link1); link1 = link2; } links1->source->links = (Links_t *) 0; free(links1); links1 = links2; } firstLinks = (Links_t *) 0; } fp = fopen(word1, "r"); if (fp == (FILE *) 0) { fprintf(stderr, "Error adding %s\n", word1); return; } while (fscanf(fp, "%1024s", word2) > 0) { /* See if the word is already in the dictionary. */ if (findWord(word2) != (Word_t *) 0) continue; /* Create the new word. */ wp = (Word_t *) malloc(sizeof (Word_t) + length); if (wp == (Word_t *) 0) { fprintf(stderr, "out of memory for word\n"); exit(-1); } wp->links = (Links_t *) 0; strcpy(wp->name, word2); /* Add it to the balanced tree. */ insertNode(wp, &topByLength[length], &grew); } fclose(fp); printf("Added %s\n", word1); } static void findLinks(Word_t *wp) { int i; int m; char j; char k; char *cp; Word_t *wp2; Link_t *link; Links_t *links; links = (Links_t *) malloc(sizeof (Links_t)); if (links == (Links_t *) 0) { fprintf(stderr, "out of memory for links\n"); exit(-1); } links->next = firstLinks; links->source = wp; links->firstLink = (Link_t *) 0; firstLinks = links; wp->links = links; /* Look for any existing words that differ by one letter. */ for (i = 0; i < length; ++i) { strcpy(word3, wp->name); cp = &word3[i]; k = *cp; for (j = 'a'; j <= 'z'; ++j) if (j != k) { *cp = j; wp2 = topByLength[length]; while (wp2 != (Word_t *) 0) { m = strcmp(word3, wp2->name); if (m == 0) break; if (m < 0) { wp2 = wp2->left; } else { wp2 = wp2->right; } } if (wp2 != (Word_t *) 0) { /* They differ by one letter, so create a link. */ link = (Link_t *) malloc(sizeof (Link_t)); if (link == (Link_t *) 0) { fprintf(stderr, "out of memory for link\n"); exit(-1); } link->nextSameSource = links->firstLink; link->destination = wp2; link->nextInOrbit = (Link_t *) 0; link->sourceLink = (Link_t *) 0; links->firstLink = link; } } } } static void doConnect(void) { int length1; Word_t *target1; Word_t *target2; Word_t *wp; Links_t *links; Link_t *link; Link_t *link2; Link_t *firstInOrbit; Link_t endMark; target1 = findWord(word1); length1 = length; target2 = findWord(word2); if ((target1 != (Word_t *) 0) && (target2 != (Word_t *) 0) && (length1 == length)) { if (target1 == target2) { printf("%s\n", word1); return; } /* Reset any existing links. */ links = firstLinks; while (links != (Links_t *) 0) { link = links->firstLink; while (link != (Link_t *) 0) { link->nextInOrbit = (Link_t *) 0; link->sourceLink = (Link_t *) 0; link = link->nextSameSource; } links = links->next; } /* Set up a dummy link as the chain end mark. */ endMark.nextSameSource = (Link_t *) 0; endMark.destination = target2; endMark.nextInOrbit = (Link_t *) 0; endMark.sourceLink = (Link_t *) 0; /* Build successive orbits from target2 until target1 */ /* is reached or the outermost orbit is empty. */ firstInOrbit = &endMark; while (firstInOrbit != (Link_t *) 0) { link = firstInOrbit; firstInOrbit = (Link_t *) 0; while (link != (Link_t *) 0) { wp = link->destination; if (wp->links == (Links_t *) 0) findLinks(wp); link2 = wp->links->firstLink; while (link2 != (Link_t *) 0) { if (link2->sourceLink == (Link_t *) 0) { if (link2->destination == target1) { printf("%s", target1->name); link2 = link; while (link2 != (Link_t *) 0) { printf(" > %s", link2->destination->name); link2 = link2->sourceLink; } putchar('\n'); return; } link2->sourceLink = link; link2->nextInOrbit = firstInOrbit; firstInOrbit = link2; } link2 = link2->nextSameSource; } link = link->nextInOrbit; } } } printf("No way to connect %s to %s.\n", word1, word2); } int main(int argc, char **argv) { int i; while (1) { printf("? "); input[0] = '\0'; fgets(input, MAX_INPUT_LENGTH+1, stdin); verb[0] = '\0'; word1[0] = '\0'; word2[0] = '\0'; sscanf(input, "%32s %1024s %1024s", verb, word1, word2); if (strcmp(verb, "add-dictionary") == 0) { doAddDictionary(); } else if (strcmp(verb, "connect") == 0) { doConnect(); } else if (strcmp(verb, "exit") == 0) { break; } } return 0; }

How did you go from 70sec -> 4sec? by Cal Bunny, 04/26/2002 01:04:23 AM EST (4.00 / 1)

summary of optimization by svillee, 04/26/2002 07:30:38 AM EST (5.00 / 1)

time left? (5.00 / 1) (#113)
by codemonkey_uk on Fri Apr 26, 2002 at 05:30:03 AM EST

I only spotted this yesterday. How much time is left? I spend an hour or so last night getting a c++ / stl version working using the A* algorithm, but I've still got to do some testing, and bug fixing. Is it worth it? When's the "final" entry supposed to be in?

---
Thad

"The most savage controversies are those about matters as to which there is no good evidence either way." - Bertrand Russell

By tuesday, lots of time (nt) by obsidian head, 04/26/2002 07:05:25 AM EST (5.00 / 2)

Entry: C++ (5.00 / 1) (#116)
by codemonkey_uk on Fri Apr 26, 2002 at 08:12:43 AM EST

My entry is implemented in Standard C++^*. It is split into two files, which I will post seperatly as replys to this comment.

astar.h is a header file providing a template based implementation of the A* algorithm.

pfc4.cpp provides the nuts & bolts needed by astar.h to solve the problem.

Some example results follow:

? add-dictionary word Added 173529 words from word to dictionary in 1.453 seconds. ? connect make love make > lake > lave > love Done in 0 seconds. ? connect bread wheat bread > break > bleak > bleat > cleat > cheat > wheat Done in 0.047 seconds. ? connect resist futile resist > relist > relict > relics > relies > belies > bevies > levies > levees > levels > revels > refels > refell > refill > refile > retile > rutile > futile Done in 3.937 seconds. ? connect effaces cabaret effaces > effaced > enfaced > enlaced > unlaced > unladed > unfaded > unfaked > uncaked > uncakes > uncases > uneases > ureases > creases > cresses > tresses > trasses > brasses > brashes > braches > beaches > benches > bunches > buncoes > bunkoes > buckoes > buckles > huckles > heckles > deckles > deciles > defiles >refiles > reviles > reviled > reveled > raveled > ravened > havened > havered > wavered > watered > catered > capered > tapered > tabered > tabored > taboret > tabaret > cabaret Done in 34 seconds. ? connect breads cheese breads > breaks > creaks > creaky > creasy > crease > creese > cheese Done in 0.109 seconds. ? connect driving tickets driving > droving > proving > prosing > prising > peising > peining > paining > panning > panting > patting > matting > mattins > lattins > lattens > rattens > ratters > ranters > rankers > rackers > rackets > rickets > tickets Done in 6.734 seconds. ? exit

^* Tested on MSVC++ v6. Let me know if you have problems with other compilers.

---
Thad

"The most savage controversies are those about matters as to which there is no good evidence either way." - Bertrand Russell

astar.h by codemonkey_uk, 04/26/2002 08:48:21 AM EST (5.00 / 1)

fuckin' scoop by codemonkey_uk, 04/26/2002 08:57:57 AM EST (5.00 / 1)

astar.h (2nd attempt) by codemonkey_uk, 04/26/2002 09:01:55 AM EST (5.00 / 1)

a* code by Cal Bunny, 04/27/2002 03:40:25 PM EST (4.00 / 1)

What modifications? by codemonkey_uk, 04/28/2002 12:08:04 PM EST (none / 0)

pfc4.cpp by codemonkey_uk, 04/26/2002 09:03:50 AM EST (5.00 / 1)

Stick to the spec for output -- automated tester by tmoertel, 04/26/2002 10:30:08 AM EST (5.00 / 1)

Okay by codemonkey_uk, 04/26/2002 10:42:35 AM EST (5.00 / 1)

Updated formatting script by obsidian head, 04/26/2002 09:59:30 AM EST (5.00 / 1)

Thanks, but by codemonkey_uk, 04/26/2002 10:11:16 AM EST (5.00 / 1)
Another k5 code formatter by codemonkey_uk, 04/29/2002 09:31:34 AM EST (none / 0)

g++ doesn't like it at all by jacob, 04/26/2002 02:41:41 PM EST (5.00 / 1)

astar.h has been tested on g++ ... by codemonkey_uk, 04/26/2002 06:42:29 PM EST (5.00 / 1)

g++ by ucblockhead, 04/26/2002 09:25:43 PM EST (5.00 / 1)

pfc4.cpp v2 by codemonkey_uk, 04/29/2002 07:11:12 AM EST (none / 0)

buggy by codemonkey_uk, 04/29/2002 09:30:09 AM EST (none / 0)

pfc4.cpp v3 by codemonkey_uk, 04/29/2002 09:28:20 AM EST (none / 0)

My C Submission (5.00 / 1) (#126)
by Logan on Fri Apr 26, 2002 at 11:53:10 AM EST

Here is my C submission. It is far, far faster than my Haskell submission. I will post the code here, and post some timing information and description of my approach (which is quite simple) in a followup.

First, some simple build instructions. I compiled and tested on my own computer with the following command line:

gcc -g -O2 -Wall -o pfc4 pfc4.c

In theory, the compiler and flags should not matter. I've supposedly written some fairly portable code here.

#include <stdio.h> #include <stdlib.h> #include <string.h> /* Uncomment the following line to have timing information displayed */ #define TIMING #ifdef TIMING #include <sys/time.h> #endif #define MAX_LENGTH 64 /* dict[len] is a sorted array of (len+1)-letter words * ndict[len] is number of words in dict[len] * sdict[len] is number of bytes allocated for dict[len] */ char ***dict; int *ndict, *sdict; /* Initialize dict, ndict, and sdict for first call of add_dictionary */ void new_dict(void) { int i; dict = malloc(MAX_LENGTH * sizeof(char **)); ndict = calloc(MAX_LENGTH, sizeof(int)); sdict = malloc(MAX_LENGTH * sizeof(int)); /* We start each dictionary with a size of 64 and let it grow */ for(i = 0; i < MAX_LENGTH; i++) dict[i] = malloc((sdict[i] = 64) * sizeof(char *)); } /* Free up memory used by previous dictionary when new dictionary is added */ void release_dict(void) { int i, j; for(i = 0; i < MAX_LENGTH; i++) { for(j = 0; j < ndict[i]; j++) free(dict[i][j]); } memset(ndict, 0, MAX_LENGTH * sizeof(int)); } /* Wrapper around strcmp to do some pointer dereferencing */ int cmp(const void *a, const void *b) { return strcmp(*(char **)a, *(char **)b); } int add_dictionary(const char *filename) { FILE *f = fopen(filename, "r"); char line[256], *p; int len; if(!f) return 0; if(dict) release_dict(); else new_dict(); while(fgets(line, sizeof(line), f)) { if((p = strchr(line, '\n'))) *p = 0; len = strlen(line) - 1; if(ndict[len] >= sdict[len]) // need to resize array dict[len] = realloc(dict[len], (sdict[len] <<= 1) * sizeof(char *)); dict[len][ndict[len]++] = strcpy(malloc(len + 2), line); } for(len = 0; len < MAX_LENGTH; len++) qsort(dict[len], ndict[len], sizeof(char **), cmp); return 1; } /* Simple binary search */ int lookup(int len, const char *s1) { int left = 0, right = ndict[len], mid; if(!right) return -1; while(left != right - 1) { mid = (left + right) >> 1; if(strcmp(s1, dict[len][mid]) < 0) right = mid; else left = mid; } if(!strcmp(s1, dict[len][left])) return left; return -1; } /* Does a breadth-first search, to see if y can be reached from x */ int connection(int len, int x, int y) { int *pred, *queue, *path; int h = 0, t = 1, i, w, z, npath; char buf[256], oldc; #ifdef TIMING struct timeval tvs, tve; gettimeofday(&tvs, 0); #endif /* If the words are the same, avoid allocating all these buffers */ if(x == y) { printf("%s\n", dict[len][x]); return 1; } pred = malloc(ndict[len] * sizeof(int)); queue = malloc(ndict[len] * sizeof(int)); memset(pred, -1, ndict[len] * sizeof(int)); /* The queue, over the interval of indices [h, t), represents the frontier */ queue[0] = x; while(h != t) { z = queue[h++ % ndict[len]]; if(z == y) goto success; strcpy(buf, dict[len][z]); /* transform buf into all possible strings that differ from the original * by exactly one letter. For all transformations that correspond to an * actual word that has not yet been visited or added to the frontier, * we note that the original word was its predecessor and append it to * the queue */ for(i = 0; buf[i]; i++) { oldc = buf[i]; for(buf[i] = 'a'; buf[i] <= 'z'; buf[i]++) if(oldc != buf[i] && (w = lookup(len, buf)) > 0 && pred[w] == -1) { pred[w] = z; if(w == y) goto success; queue[t++ % ndict[len]] = w; } buf[i] = oldc; } } /* We only reach this point if the search fails */ free(pred); free(queue); #ifdef TIMING gettimeofday(&tve, 0); tve.tv_usec -= tvs.tv_usec; tve.tv_sec -= tvs.tv_sec; while(tve.tv_usec < 0) { tve.tv_usec += 1000000; tve.tv_sec--; } printf("%d.%03d seconds\n", tve.tv_sec, tve.tv_usec / 1000); #endif return 0; success: /* We now build our path from y to x by following predecessors. Then * we traverse it in reverse order for the output. */ path = malloc(ndict[len] * sizeof(int)); npath = 0; while(y != x) y = pred[path[npath++] = y]; path[npath] = x; printf("%s", dict[len][x]); for(i = npath; i--;) printf(" > %s", dict[len][path[i]]); printf("\n"); #ifdef TIMING gettimeofday(&tve, 0); tve.tv_usec -= tvs.tv_usec; tve.tv_sec -= tvs.tv_sec; while(tve.tv_usec < 0) { tve.tv_usec += 1000000; tve.tv_sec--; } printf("%d.%03d seconds\n", tve.tv_sec, tve.tv_usec / 1000); #endif free(pred); free(queue); free(path); return 1; } /* Frontend to connection */ void connect(const char *s1, const char *s2) { int len = strlen(s1), x, y; if(len-- != strlen(s2) || (x = lookup(len, s1)) < 0 || (y = lookup(len, s2)) < 0 || !connection(len, x, y)) printf("No way to connect %s to %s\n", s1, s2); } int main(void) { char line[256], command[64], arg1[64], arg2[64]; int n; printf("? "); fflush(stdout); while(fgets(line, sizeof(line), stdin)) { n = sscanf(line, "%s %s %s\n", command, arg1, arg2); if(n > 1 && !strcmp(command, "add-dictionary")) { if(add_dictionary(arg1)) printf("Added %s\n", arg1); else printf("Error adding %s\n", arg1); } else if(n > 2 && !strcmp(command, "connect")) connect(arg1, arg2); else if(!strcmp(command, "exit")) return 0; else printf("Huh?\n"); printf("? "); fflush(stdout); } return 0; }

Logan

Analysis and stuff by Logan, 04/26/2002 12:19:18 PM EST (5.00 / 1)

I used something similar by KWillets, 04/28/2002 11:27:55 PM EST (none / 0)

One appropriate way to hash the words by Cal Bunny, 04/29/2002 03:17:48 AM EST (none / 0)

Agreed on most points by KWillets, 04/29/2002 03:40:12 PM EST (none / 0)

Bugfix by Logan, 04/26/2002 04:04:16 PM EST (5.00 / 1)

question (5.00 / 1) (#128)
by ucblockhead on Fri Apr 26, 2002 at 01:40:01 PM EST

Can we assume that only one dictionary will be loaded at a time?
-----------------------
This is k5. We're all tools - duxup

I'm not sure exactly what you mean by jacob, 04/26/2002 02:01:40 PM EST (5.00 / 1)

damn by ucblockhead, 04/26/2002 02:18:34 PM EST (5.00 / 1)

Don't start over! by Cal Bunny, 04/27/2002 03:25:28 PM EST (4.00 / 1)

starting over by ucblockhead, 04/27/2002 03:53:59 PM EST (5.00 / 1)

chart search by Cal Bunny, 04/27/2002 05:17:07 PM EST (4.00 / 1)

You are joking, right? by Cal Bunny, 04/26/2002 04:04:50 PM EST (4.00 / 1)

For you mind warping pleasure. (4.00 / 1) (#136)
by Cal Bunny on Sat Apr 27, 2002 at 05:09:17 AM EST

I only had about 10 minutes to work on a solution, so this is what you would call a first draft. It completely recalculates the transition matrices each time a new dictionary is added (I will fix this to only recalculate what is necesary). To make it worse, it overcalculates the transition matrices (does twice the amount of work necessary), too. It uses a simple breath-first search (I plan to improve it to only keep track of the fringe and maybe move it to dual-*).

On my PII-233 w/128M running Win2000 Workstation it takes a while to load the files and calculate the adjacency matrices. After that, though most queries run in under a second.

/-------------- group by count, 1 equal sum cross-product -------------- w:() add_dictionary:{w,:0:x;g::group w;a::transitions'g} group:{x@((=n),,!0)(?n)?/:!1+|/n:#:'x} transitions:{&:'1=+/''~x=\:/:x} /-------------- intersection scan reverse union scan ---------------- connect:{`0:{(-#y)_,/x,\:y}[co[x;y];" > "],"\n"} co:{A::a[#x]; G::g[#x]; G@h,|bfs[h:G?x;G?y]} bfs:{y{*x@&(x:A x)_lin y}\1_|(~y _in)(?,/A@)\A x} /--------------- command processor ------------------------------------ c:""; exit:{}; sp:{1_'(&x=*x)_ x:y,x}[;" "] while[~"exit"~*c;`0:"? ";c:sp _ssr[;"-";"_"](0:`);(.*c). 1_ c] \\

^cb^
Kudos to you for warping my fragile little mind. - communist

Um, what kind of C is that? by KWillets, 04/28/2002 08:09:42 PM EST (none / 0)

It is not C by Cal Bunny, 04/29/2002 03:26:41 AM EST (none / 0)

Ah, good by KWillets, 04/29/2002 03:46:10 PM EST (none / 0)

scarily by Cal Bunny, 04/30/2002 03:04:52 AM EST (none / 0)

Program malfunction by caine, 05/02/2002 10:24:57 AM EST (none / 0)

Ooops. wrong version. Thank You. by Cal Bunny, 05/02/2002 02:13:44 PM EST (none / 0)

Scoop did something awful to your code by tmoertel, 05/02/2002 03:52:16 PM EST (none / 0)

Quickie Prototype (5.00 / 1) (#137)
by KWillets on Sat Apr 27, 2002 at 02:55:53 PM EST

This is a working prototype, but my mouse (or the motherboard it's connected to) keeps failing, so I'll post quick before I lose it again. Sorry about the formatting. It's hard-coded with a testcase until I manage to download a dictionary.

That's perl, by the way.
===
my %dictionary = ( "cab" => 1, "cat" => 1, "cot" => 1, "can" => 1, "ban" => 1, "bun" => 1 );
my $startword = "cab";
my $endword = "bun";
my %unvisited = %dictionary;
my %frontier = ( $startword => "");
delete $unvisited{ $startword };
my %newfrontier;
my $src, $path;

print "start = $startword / end = $endword \n";
#
# Breadth-first search
# The first visit to a word is minimal distance; later visits can be ignored
# Here each word is deleted as soon as it is visited
# on each iteration, frontier set is used to generate next frontier set
# join method is to generate candidates from src string and test in hash
# each hit is moved into new frontier set with path from src
# stop when frontier reaches endword, or frontier is empty
#
while( %frontier && ! exists ( $frontier{ $endword } )) {
while( ($src, $path) = each %frontier ) {
   for( my $i = 0; $i < length($src); $i++ ) {
    for ( $c = ord "a", my $tst = $src; $c <= ord "z"; $c++ ) {
      substr( $tst, $i, 1 ) = chr $c;
      if( delete $unvisited{ $tst } ) {
       $newfrontier{ $tst } = "$path\n$src" ;
      }
    }
   }
}
%frontier = %newfrontier;
%newfrontier = ();
}

print "DONE:\n $frontier{ $endword }\n$endword\n\n";

Working version by KWillets, 04/29/2002 03:54:09 PM EST (none / 0)

Another interesting test case (none / 0) (#141)
by tmoertel on Sat Apr 27, 2002 at 04:48:25 PM EST

Here's a variant on i's test case. This one attempts to measure how efficiently an implementation handles reverse and repeated queries, i.e., how smart it is at caching.

add-dictionary WORD.LST connect make love connect bread wheat connect resist futile connect effaces cabaret connect make love connect bread wheat connect resist futile connect effaces cabaret connect love make connect wheat bread connect futile resist connect cabaret effaces connect love make connect wheat bread connect futile resist connect cabaret effaces connect make love connect bread wheat connect resist futile connect effaces cabaret connect love make connect wheat bread connect futile resist connect cabaret effaces connect make love connect bread wheat connect resist futile connect effaces cabaret connect love make connect wheat bread connect futile resist connect cabaret effaces connect wood fire connect climb rocks connect stones chisel connect amperes voltage connect wood fire connect climb rocks connect stones chisel connect amperes voltage connect fire wood connect rocks climb connect chisel stones connect voltage amperes connect fire wood connect rocks climb connect chisel stones connect voltage amperes connect wood fire connect climb rocks connect stones chisel connect amperes voltage connect fire wood connect rocks climb connect chisel stones connect voltage amperes connect wood fire connect climb rocks connect stones chisel connect amperes voltage connect fire wood connect rocks climb connect chisel stones connect voltage amperes exit

Note that the new test case asks for sixteen times as many connections as the original, and that there is no connection between "chisel" and "stones" and no connection between "amperes" and "voltage".

As a basis for comparison, on a Celeron 433 my implementation took 28.1 user seconds for i's orignal test case and 63.5 user seconds for 16-times variant. Each measurement was taken via bash's time built-in command, as follows:

time ./tgm-word-connect < bench16x.tst

and I cited the "user" line from the results:

real 1m4.435s user 1m3.480s sys 0m0.960s

Anyone care to try the test and post the results? I'm especially curious about how the C/C++ and Perl implementations fare.

--
My blog | LectroTest

[ Disagree? Reply. ]

My time... by Logan, 04/27/2002 09:33:53 PM EST (none / 0)
Repeated searches by Cal Bunny, 04/28/2002 03:09:47 AM EST (4.00 / 1)

Good point by tmoertel, 04/28/2002 11:52:34 AM EST (none / 0)

caching endpoints by ucblockhead, 04/28/2002 12:29:47 PM EST (none / 0)

not cheating by jacob, 04/28/2002 12:43:57 PM EST (5.00 / 1)
No, your design seems reasonable by tmoertel, 04/28/2002 01:02:40 PM EST (none / 0)

Speed.. by geekmug, 04/30/2002 07:19:19 PM EST (none / 0)

Mine does this by Cal Bunny, 04/30/2002 07:44:51 PM EST (none / 0)

177.39 user :( by i, 04/28/2002 03:42:03 AM EST (5.00 / 1)
my timing by svillee, 04/28/2002 10:02:55 AM EST (none / 0)
Memory Usage vs CPU usage? by codemonkey_uk, 04/28/2002 12:14:36 PM EST (none / 0)

Good point by jacob, 04/28/2002 12:32:24 PM EST (none / 0)

Time by codemonkey_uk, 04/29/2002 09:40:14 AM EST (none / 0)

A supplemental dict: 6926 words not in WORD.LST (none / 0) (#155)
by tmoertel on Sun Apr 28, 2002 at 01:50:05 PM EST

Here are 6926 additional words, none of which are in the WORD.LST I linked to earlier:

WORD2.LST.bz2

Here's an example of it in action:

$ ./tgm-word-connect ? add-dictionary WORD.LST Added WORD.LST ? connect aba ode aba > ala > ale > ole > ode ? add-dictionary WORD2.LST Added WORD2.LST ? connect aba ode aba > abe > obe > ode ? exit

--
My blog | LectroTest

[ Disagree? Reply. ]

Ok I think I finally got it right. (none / 0) (#156)
by i on Sun Apr 28, 2002 at 02:49:33 PM EST

Runs in constant (log actually but who cares) space, and speed is on par with tmoertel's, so I consider my job done :)

Thanks to wonders of laziness (tying a knot) and strictness (foldl').

-- PFC4 entry -- Username: 'i' -- Version: 0.3 -- Compilation (on Un*x): ghc -O2 ladder.hs -o ladder -package data -- You probably need at least GHC5.0x -- Run: ladder [+RTS <runtime system switches>] -- Runtime switches help: ladder +RTS -\? -- Fiddling with them may improve performance, but only marginally -- -- Interpreting -- With Hugs (SLOW): hugs ladder.hs -h20000000 -- With GHCi: ghci ladder.hs -package data -- then type `main' at the command prompt module Main where import FiniteMap hiding (addListToFM, addListToFM_C) import Maybe import IO import IOExts import qualified SimpleQueue as Q import qualified Set as S import List -- graph data structure -- each vertex has a label and a set (list) of adjacent vertices data Vertex = Vertex String [Vertex] -- useful instances instance Eq Vertex where (==) (Vertex x _) (Vertex y _) = x == y instance Ord Vertex where compare (Vertex x _) (Vertex y _) = compare x y (<=) (Vertex x _) (Vertex y _) = x <= y -- This is a secondary data structure: maps splits to list of words -- like c?t => [cat,cut] type SplitMap = FiniteMap String [String] -- This is the primary data structure -- maps word to vertex of the graph type WordMap = FiniteMap String Vertex -- all words in a SmallWordDB are of equal length type SmallWordDB = (SplitMap, WordMap) -- this maps word length to a SmallWordDB type WordDB = FiniteMap Int SmallWordDB -- -- Main logic and algorithms -- -- addToWordDB adds list of words to the database addToWordDB :: WordDB -> [String] -> WordDB addToWordDB db ws = plusFM_C combineSmallWordDBs db db' where db' :: WordDB db' = mapFM (\_->listToSmallWordDB) (partitionWords ws) -- utility for addToWordDB -- combines two graps into a single graph combineSmallWordDBs :: SmallWordDB -> SmallWordDB -> SmallWordDB combineSmallWordDBs (sm1, wm1) (sm2, wm2) = (sm, wm) where sm = plusFM_C (++) sm1 sm2 wm' = plusFM_C ev wm1 wm2 wm = rekeyWordMap wm wm' sm ev (Vertex x _) _ = Vertex x [] -- utility for addToWordDB -- partitionWords partitions the input list by length -- the output is a map from length to list of words partitionWords :: [String] -> FiniteMap Int [String] partitionWords words = addListToFM_C (flip (++)) emptyFM words' where words' = map (\x->(length x, [x])) words -- utility for addToWordDB -- build a graph -- given a list of words, build a map from words to vertices listToSmallWordDB :: [String] -> SmallWordDB listToSmallWordDB ws = (splitMap, realWordMap) where dummyWM = addListToFM emptyFM (map (\x->(x, Vertex x [])) ws) splitMap = addListToFM_C (flip (++)) emptyFM (splitws ws) realWordMap = rekeyWordMap realWordMap dummyWM splitMap splitws ws = concatMap splits' ws splits' w = zip (splits w) (repeat [w]) -- tying the knot! -- build a map from words to vertices -- where vertex is a word and a list of adjacent vertices -- this definition is recursive, so we can walk the graph -- directly, without having to look up e.g. edges from a -- vertex rekeyWordMap :: WordMap -> WordMap -> SplitMap -> WordMap rekeyWordMap newWM oldWM sm = mapFM rekeyWord oldWM where rekeyWord :: String -> Vertex -> Vertex rekeyWord x _ = Vertex x (map head $ group $ sort $ vertices x) vertices x = catMaybes $ map (lookupFM newWM) (adjs x) adjs :: String -> [String] adjs x = adjacents sm x -- given a word and a map from splits to lists of words, -- return all words adjacent to given one adjacents :: SplitMap -> String -> [String] adjacents sdb word = concatMaybes $ map (lookupFM sdb) (splits word) where concatMaybes = concat . catMaybes -- return all splits of a word: cat => [?at, c?t, ca?] splits :: String -> [String] splits [] = [] splits (x:xs) = ('?':xs) : map (x:) (splits xs) -- implementation of add-dictionary addDict :: FilePath -> WordDB -> IO WordDB addDict path db = catch addDict' report where addDict' = do handle <- openFile path ReadMode contents <- hGetContents handle let db' = addToWordDB db (words contents) putStrLn $ "Added " ++ path return db' report _ = putStrLn ("Error adding " ++ path) >> return db -- implementation of connect connect :: WordDB -> String -> String -> IO () connect db word1 word2 = do let path = findPath word1 word2 db case path of [] -> putStrLn $ "No way to connect " ++ word1 ++ " to " ++ word2 xs -> putStrLn $ showPath xs where showPath = foldr1 (\a b -> a ++ " > " ++ b) -- findPath is a utility for connect findPath :: String -> String -> WordDB -> [String] findPath from to db | len /= len' = [] | isNothing from' = [] | isNothing to' = [] | otherwise = mapv $ findChain (\(Vertex x vs) -> vs) from'' to'' where (sm, wm) = lookupWithDefaultFM db (emptyFM,emptyFM) len len = length from len' = length to from' = lookupFM wm from to' = lookupFM wm to from'' = fromJust from' to'' = fromJust to' mapv = map (\(Vertex x _) -> x) -- findChain is a utility for findPath findChain :: Ord a => (a->[a]) -> a -> a -> [a] findChain f from to = reverse $ first chain where chain = dropWhile notFound $ bfsPaths $ toPTree f from first (x:_) = x first [] = [] notFound [] = True notFound xs = to /= head xs -- Main loop doCommands :: WordDB -> IO WordDB doCommands db = do putStr "? " hFlush stdout command <- getLine case (words command) of ['e':_] -> return db ['a':_, path] -> addDict path db >>= doCommands ['c':_, word1, word2] -> connect db word1 word2 >> doCommands db _ -> doCommands db main :: IO () main = catch (doCommands emptyFM >> return ()) (\_->return ()) -------------------------------------------------------------- -- -- In the ideal world this would be in a separate module -- -- Parented tree, so we can walk from a node back to root data PTree a = PNil | PTree (PTree a) a [PTree a] -- Functional tree to parented tree toPTree :: (a->[a]) -> a -> PTree a toPTree f a = toPTree' f PNil a where toPTree' f p a = tree where tree = PTree p a trees trees = map (toPTree' f tree) (f a) -- The BFS itself bfsPaths :: Ord a => PTree a -> [[a]] bfsPaths PNil = [] bfsPaths tree@(PTree parent node branches) = map walkPath paths where (paths,_) = bfs (Q.single tree) (S.emptySet) bfs :: Ord a => Q.Seq (PTree a) -> S.Set a -> ([PTree a], S.Set a) bfs queue set = case Q.lview queue of Q.Nothing2 -> ([], set) Q.Just2 (tree@(PTree p n bs)) queue' -> if S.elementOf n set then bfs queue' set else (tree : trees, set'') where (trees, set'') = bfs (foldl' Q.snoc queue' bs) set' set' = S.addToSet set n Q.Just2 (PNil) queue' -> bfs queue' set walkPath :: PTree a -> [a] walkPath PNil = [] walkPath (PTree p a _) = a:walkPath p -- -- Misc utilities -- -- strict variant of addListToFM, addListToFM_C -- replace those found in the standard FiniteMap addListToFM :: Ord a => FiniteMap a b -> [(a,b)] -> FiniteMap a b addListToFM fm pairs = addListToFM_C (\old new -> new) fm pairs addListToFM_C :: Ord a => (b -> b -> b) -> FiniteMap a b -> [(a,b)] -> FiniteMap a b addListToFM_C combiner fm key_elt_pairs = foldl' add fm key_elt_pairs -- foldl adds from the left where add fmap (key,elt) = addToFM_C combiner fmap key elt -- strict variant of foldl foldl' :: (a -> b -> a) -> a -> [b] -> a foldl' f a [] = a foldl' f a (x:xs) = (foldl' f $! f a x) xs
^{^{—
and we have a contradicton according to our assumptions and the factor theorem}}

Optimizations (none / 0) (#157)
by ucblockhead on Sun Apr 28, 2002 at 05:25:22 PM EST

I found a cool optimization I suspect people may have missed. The version I am working on (and will post as soon as I clean it up) used lots of data structures with strings as keys. String compares are really expensive, though. So I created a separate structure that mapped integers to strings (which works as long as you have less than 2 billion words or so :-) ). I then use these integer handles in all my other data structures.

This doubled the program's speed. It worked well because very little of the algorithm cares about whether or not it is dealing with words. Only the part that checks how "different" two words are, really. And that can be cached.

(I suspect that this would be even easier in C, as you could simply used the character pointer as the "handle".)
-----------------------
This is k5. We're all tools - duxup

Yes, look at mine (*laugh*) by Cal Bunny, 04/28/2002 05:32:42 PM EST (4.50 / 2)

Idea for Optimizing for Large Number of Searches (4.00 / 1) (#159)
by Cal Bunny on Sun Apr 28, 2002 at 06:05:44 PM EST

If you think that there will be a large number of searches that walk through most of the graph, I had an interesting idea. Of course a more asymptotically optimal solution would be a chart search, but a simplier solution that requires very little memory would be a variant of an interated deepening depth first search.

In an iterated deepening DFS, you run a DFS search with a maximum depth of 1, if you do not find the goal, you run a new DFS with a depth of 2, and this continues until you find the goal state. An ID-DFS search takes advantage large branching factors. If each node connects to just 2 deeper nodes, then half of your work is done on the last expansion. This is a good solution for simple problems and instances where you havd a hard cutoff. A DFS can be substantially faster than a BFS: it will consume only memory linear to the length of the path, you do not need to record an open list, and you do not need to explicitely record the current shortest path since it is sitting on the stack (if you want to make an incredibly dumb DFS you don't even need to record a closed set, either).

On dictionary load you could run Kruskal's all-pairs shortest path, that is O(n^3) -- for comparison, Dijkstra's single-source shortest path is O(n^2). To solve a query, you lookup what the depth of the shortest path is, then run a DFS search with the correct cutoff. To help direct the DFS you can order the list of potential expansions by the number of characters both strings have in common. On a sparse graph it is easy to see that with a good next exapansion strategy you coule get results close to linear in the length of the string and certainly no worse than O(word_length*avg_braching_factor/2).

Since determining all-pairs shortest path is stronger than the transitive closure, all queries where there is not answer are known immediately.

One of the best preformance gains for ID-DFS is to remember the results of the previous expansions. If you were to remember the results of every expansion you turn into BFS/A*. (Well, you don't turn into anything, but the algorithm expands the same nodes that a BFS/A* would expand).

^cb^

My algorithm notes by KWillets, 04/28/2002 08:06:45 PM EST (none / 0)
On the other hand . . . by tmoertel, 04/29/2002 01:29:49 PM EST (none / 0)

nice work by Cal Bunny, 04/29/2002 02:38:40 PM EST (4.00 / 1)

Further analysis by tmoertel, 04/29/2002 03:50:17 PM EST (none / 0)

My submission (none / 0) (#170)
by Hopfrog on Mon Apr 29, 2002 at 11:47:26 AM EST

Hell, here is me submission. It doesn't work (apart from on that small collection of words in the sample), and is incredibly mind-boggling slow - about 30 minutes for large dics.

I had to go to military camp, so this is the most I could do.

C++ and STL, uses some windows specific stuff that can be defined away, will compile with msvc++6. I use structs the C++ way, not the C way. If you enable time checking, you need to link the winmm.lib file in.

I made everything inline to make it a bit neater, not for any idealogical reasons.

//


#include "stdafx.h"

  #include <iostream>

  #include <string>

  #include <list>

  #include <fstream>

  #include <crtdbg.h>

  #include <windows.h>

  #include <mmsystem.h>
using namespace std;
#define STR_ADD_DIC "add-dictionary"

  #define STR_CONNECT "connect"

  #define MAX_NODE    2048

  #define SAFE_DELETE(x) if(x) { delete x; x = NULL; }

  #define SIMILARITY_DEVIANCE 2

  #define MEASURE_TIME
#pragma warning (disable : 4786)
struct DictionaryWord

  {

              string strWord;

              bool   bInChain;

  };
struct WordTree

  {

              WordTree() : nSubTreeCount(0), iterWord(NULL)

              {

              }
            int AddSubTree(string& strTreeWord, list<DictionaryWord>::iterator 
  iter)

              {
                        arrSubTree[nSubTreeCount] = new WordTree();

                          arrSubTree[nSubTreeCount]->strWord = strTreeWord;

                          arrSubTree[nSubTreeCount]->iterWord = iter;

                          nSubTreeCount++;

                          return (nSubTreeCount - 1);

              };
            void DestroyTree()

              {

                          // We loop through all trees we created with "new" 
  and delete them
                        for (int i=0;i<nSubTreeCount;i++)

                          {

                                     arrSubTree[i]->DestroyTree();

                                     SAFE_DELETE(arrSubTree[i]);

                          }
                        if (iterWord != NULL)

                              (*iterWord).bInChain = false;

              };
            string strWord;

              list<DictionaryWord>::iterator iterWord;

              int nSubTreeCount;

              WordTree* arrSubTree[MAX_NODE];

  };
class Dictionary

  {

  public:

              bool LoadDictionary(const string& strFilepath)

              {

                          // Load the dictionary into memory.

                          // Words are stored in a list
                        string str;                    
                        ifstream file;

                          file.open(strFilepath.c_str());
                        if (!file.is_open())

                                     return false;
                        while (file.good())

                          {

                                     char szLine[120] = {'\0'};

                                     file.getline(szLine, 120);                                   

                                   if (strlen(szLine) > 0)

                                     {

                                                 DictionaryWord wrd;

                                                 wrd.strWord = szLine;

                                                 wrd.bInChain = false;

                                                 m_wordList.push_back(wrd);

                                     }

                          }
                        file.close();

                         return true;

              }
            bool GetSimilarWords(WordTree& tree, bool bVerifyWord)

              {
                       // bVerifyWord, if set to true, makes us check if

                          // the word exists at all in the dictionary. You need

                          // set it to true only once, as all other words definately

                          // exist in the dictionary.

                          bool bWordFound = false; 

                          bool bChildFound = false;

                          int nIndex = 0;


                         list<DictionaryWord>::iterator iter;

                          for (iter=m_wordList.begin(); iter != m_wordList.end(); 
  ++iter)

                          {          

                                     if ((*iter).bInChain)

                                                 continue;
                                   if (bVerifyWord && !bWordFound && 
  tree.strWord == (*iter).strWord) // lazy eval

                                     {

                                                 tree.iterWord = iter;

                                                 bWordFound = true;

                                     }

                                     else

                                     {           

                                      int n = AreSimilar(tree.strWord, (*iter).strWord);
                                               if (n == 1)

                                                 {

                                                             bChildFound = true;

                                                             tree.AddSubTree((*iter).strWord, 
  iter);

                                                 }

                                     }
                                   nIndex++;

                          }
                        

                          if (bChildFound == false || bVerifyWord && bWordFound 
  == false)

                                     return false;
                        return true;

              };
protected:
            int AreSimilar(string& str1, string& str2)

              {
                        // if the same, return 0, if similar, return 1, else

                          // return -1
                        

                          int nFail  = 0;

                          int nSize1 = str1.size();

                          int nSize2 = str2.size();

                          if (nSize1 != nSize2)

                                     return -1;

  

                          for (int i=0;i<nSize1;i++)

                          {
                                   if (str1.at(i) != str2.at(i))

                                     {

                                                 nFail++;

                                     }
                                   if (nFail == SIMILARITY_DEVIANCE) // failed 
  twice.

                                                 return -1;

                          }
                        return nFail; //1 -> similar, 0 -> equal

              }
private:

              list<DictionaryWord> m_wordList;

  };
class CommandParser

  {
public:

              typedef enum {cmdUnkown = 0, cmdAddDictionary, cmdConnect, cmdExit 
  } Command;
            

              Command ParseCommand(string strInput)

              {

                          if (strInput.compare(0, sizeof(STR_ADD_DIC) - 1, STR_ADD_DIC) 
  == 0)

                          {

                                     return cmdAddDictionary;

                          }

                          else if (strInput.compare(0, sizeof(STR_CONNECT) - 1, 
  STR_CONNECT) == 0)

                          {

                                     return cmdConnect;

                          }

                          else if (strInput == "exit")

                          {

                                     return cmdExit;

                          }

                          

                          return cmdUnkown;

              }
            bool ParseAddDictionary(string strInput, string& strFilepath)

              {

                          // We have a command, which idealy would look like

                          // "add-dictionary c:\dic.txt".

                          

                          if (strInput.size() <= sizeof(STR_ADD_DIC))

                                     return false;
                        strFilepath = strInput.substr(sizeof(STR_ADD_DIC));


                        // We don't perform a check for multiple characters

                          // in this case, because spaces are allowed in paths.

                          

                          return true;

              }
            

              bool ParseConnect(string strInput, string& strStartWord, string& 
  strEndWord)

              {

                          // We have a command, which idealy would look like

                          // "connect coat hook". However, it might 
  be typed wrongly

                          // so we also check that

                          if (strInput.size() <= sizeof(STR_CONNECT))

                          {

                                     // The user typed "connect". This 
  would cause

                                     // an access violation if we do the next 
  call,

                                     // so we return here.

                                     return false;

                          }

                          

                          // remove "connect"

                          strInput = strInput.substr(sizeof(STR_CONNECT));

                          

                          

                          // Find the space char separating the first word

                          // from the second

                          string::size_type nPos;

                          nPos = strInput.find(' ');

                          if (nPos == string::npos)

                          {

                                     // If there is no space, we only have one 
  word,

                                     // so we indicate an error

                                     return false;

                          }

                          

                          // We retrieve the two words

                          strStartWord = strInput.substr(0, nPos);

                          strEndWord  = strInput.substr(nPos + 1);

                          

                          // We check if there are more than two parameters

                          // if so, we indicate an error.

                          if (strEndWord.find(' ') != string::npos)

                          {

                                     return false;

                          }

                          return true;

              }

  };

  class WordChain : public Dictionary

  {

  public:

              static bool ConnectWords(string strStartWord, string strEndWord, 
  string& strChain, Dictionary& dic)

              {

                          bool bChained = false;

                          strChain = strStartWord + " > ";

                          WordTree tree;

                          tree.strWord = strStartWord;

                          

                          WordTree* pLink = GetNextLinksInChain(dic, tree, strEndWord, 
  bChained, true);

                          while (pLink && !bChained)

                          {

                                     strChain += pLink->strWord + " > 
  ";

                                     pLink = GetNextLinksInChain(dic, *pLink, 
  strEndWord, bChained, false);

                          }

                          

                          if (bChained && pLink)

                          {

                                     strChain += pLink->strWord + "\n";

                          }

                          

                          tree.DestroyTree();

                          return bChained;

              }

              static WordTree * GetNextLinksInChain(Dictionary& dic, WordTree& 
  tree, string& strEndWord, bool& bComplete, bool bVerifyWord)

              {

                          WordTree * pResult = NULL;

                          if (dic.GetSimilarWords(tree, bVerifyWord))

                          {

                                     (*tree.iterWord).bInChain = true;

                                     

                                     for (int i=0;i<tree.nSubTreeCount;i++)

                                     {

                                                 if (tree.arrSubTree[i]->strWord 
  == strEndWord)

                                                 {

                                                             bComplete = true;

                                                             return tree.arrSubTree[i];

                                                 }

                                                 if (dic.GetSimilarWords(*tree.arrSubTree[i], 
  false))

                                                 {

                                                             // tree.strWord is 
  definately in our chain

                                                             // We hide it so 
  that future lookups do not use

                                                             // it

                                                             (*tree.iterWord).bInChain 
  = true;

                                                             pResult = tree.arrSubTree[i];

                                                 }

                                     }

                          }

                          

                          return pResult;

              }

  };

  int main(int argc, char* argv[])

  {

              char szInput[200] = {'\0'};

              WordChain dic;

              CommandParser cmd;

              while (true)

              {

                          //////

                          // Display prompt and get command

                          cout << "?";

                          cin.get(szInput, 200, '\n'); 

                          cin.get();

                          

                          ////

                          // Parse command

                          CommandParser::Command cmdType = cmd.ParseCommand(szInput);

                          switch (cmdType)

                          {

                          case CommandParser::cmdUnkown:

                                     {          

                                                 cout << "\"" 
  << szInput << "\": This command does not exist. Use \"exit\" 
  to exit.\n";

                                     }

                                     break;

                                     /////////////////////////////////////

                          case CommandParser::cmdAddDictionary:

                                     {

                                                 string strFilepath;

                                                 if (cmd.ParseAddDictionary(szInput, 
  strFilepath) == true)

                                                 {

                                                             if (dic.LoadDictionary(strFilepath)) 

                                                           {

                                                                         cout 
  << "Added " << strFilepath << "\n";

                                                             }

                                                             else

                                                                         cout 
  << "Error adding " << strFilepath << "\n";

                                                 }

                                                 else

                                                             cout << "Invalid 
  parameters. Use \"add-dictionary filename\"\n";                      
              

                                     }

                                     break;

                                     /////////////////////////////////////    

                          case CommandParser::cmdConnect:

                                     {

                                                 string strStartWord, strEndWord, 
  strChain;

                                                 if (cmd.ParseConnect(szInput, 
  strStartWord, strEndWord) == true)

                                                 {

  #if defined(MEASURE_TIME)

                                                             DWORD dwStart = timeGetTime();

  #endif

                                                             if (dic.ConnectWords(strStartWord, 
  strEndWord, strChain, dic) == false)

                                                             {

                                                                         cout 
  << "No way to connect " << strStartWord << " 
  to " << strEndWord << "\n";

                                                             }

                                                             else

                                                             {

                                                                         cout 
  << strChain;

                                                             }

  #if defined(MEASURE_TIME)

                                                             DWORD dwEnd = timeGetTime();

                                                             cout << "Completed 
  in " << dwEnd - dwStart << " milliseconds\n";

  #endif

                                                 }

                                                 else

                                                             cout << "Invalid 
  parameters. Use \"connect word1 word2\"\n";

                                      }

                                     break;

                                     //////////////////////////////////

                          

                          case CommandParser::cmdExit:

                                     {

                                                 return 0;

                                     }

                                     break;

                                     //////////////////////////////////

                          }

              }

              return 0;

  }

Final version (none / 0) (#171)
by ucblockhead on Mon Apr 29, 2002 at 12:46:12 PM EST

Damn you to hell! I started out thinking "Hey, I'll just throw together a braindead version". Next thing you know, the hours I put in... Anyway, here's a nonbraindead C++/STL version:

#include <string> #include <vector> #include <map> #include <set> #include <queue> #include <iostream> #include <fstream> #include <algorithm> #include <functional> using namespace std; // Algorithm: // // There are two important data structures here. The first is a set of all // of the minimal paths between two words that the program knows about (Dictionary::myPaths). // The second is a priority queue of paths to combine (Dictionary::myWorkQueue). // // The basic algorithm goes like this: // // Create a path representing the starting word. (A one word path.) // Create a path for each of that word's neighbors (i.e. those that are only one letter // different.) // Search for all paths where the end of one is a neighbor of the other. O(N^2) // Put each such pair of paths in the work queue. // // While we have items in our work queue // Take the pair of paths out of the work queue that is the best guess and combine them. // If this is the target // Report success // If no other path with these endpoints exists [1] // Add it to our path list // For each path in our path list // Search for all paths that are a neighbor of its endpoint. O(N) // Put each such pair in the word queue. // // For this to work, the scoring mechanism has to ensure that shortest paths will be combined first. // The score for a potential combination is the shortest path from the target start to the target end // that it could be a part of. It is the lengths of both paths plus the number of differences // between each endpoint and the start and end targets. // // The following optimizations improve performance: // // Ties in scores are broken by the distance to the target. // Strings aren't used as keys in the main data structures. Instead, handles are used so // that all keys are integers. // The neighbors for a given word are only calculated once, and cached. // The list of known paths is never thrown out, so subsequent searches have more data to work with. // // [1] I don't do this when I put things in the work queue because it is relatively expensive // and because the scoring mechanism means that these combinations are less likely to be tested. // // Notes: // One danger of never throwing out path data is that I suspect that it could potentially slow // down significantly after large numbers of connects. // Much of the algorithm was designed so that searches didn't need to start from the beginning, // and an earlier version searched from both ends, hoping to meet in the middle. This turned // out to be slower in practice. // An earlier version also added all sublists of a path and reversed versions of all paths. This // also turned out to be slower in practice. // It should be obvious that I trade memory for speed at every opportunity. This is almost // certainly a memory hog. // I suspect I could get a little more speed by adding a third metric to the score, prefering // paths to words with lots of neighbors. typedef int WORDHDL; typedef set<WORDHDL> WordSet; // This class stores strings, and returns unique integer handles to those strings. class WordHdls { public: WORDHDL AddHandle(const string& aWord) { map<string,WORDHDL>::iterator it = myWordsLookup.find(aWord); if( it != myWordsLookup.end() ) return it->second; myWordsLookup.insert(map<string,WORDHDL>::value_type(aWord,myWords.size())); myWords.push_back(aWord); } WORDHDL GetHandle(const string& aWord) const { map<string,WORDHDL>::const_iterator it = myWordsLookup.find(aWord); if( it != myWordsLookup.end() ) return it->second; return -1; } inline const string& GetWord(WORDHDL hdl) const { return myWords[hdl]; } // This returns the set of all handles that are neighbors // to the given handle. Requests are cached, as this is // relatively expansive. const WordSet& GetNeighbors(WORDHDL hdl) { map<WORDHDL, WordSet>::iterator it = myNeighborSets.find(hdl); if( it != myNeighborSets.end() ) return it->second; string aWord = GetWord(hdl); WordSet neighbors; for(unsigned int i=0;i<aWord.size();i++) for(char ch='a';ch<='z';ch++) if( aWord[i] != ch ) { string str = aWord; str[i] = ch; if( myWordsLookup.find(str) != myWordsLookup.end() ) { neighbors.insert(GetHandle(str)); } } myNeighborSets.insert(map<WORDHDL,WordSet>::value_type(hdl, neighbors)); return myNeighborSets[hdl]; } inline int Distance(WORDHDL hdl1,WORDHDL hdl2) { return Distance(myWords[hdl1],myWords[hdl2]); } vector<string> GetPath(const vector<WORDHDL>& someHdls) { vector<string> rc; for(vector<WORDHDL>::const_iterator it = someHdls.begin();it!=someHdls.end();it++) rc.push_back(GetWord(*it)); return rc; } inline void Clear() { myNeighborSets.clear(); } private: int Distance(const string& str1, const string& str2) { int Diffs = 0; for(unsigned int i=0;i < str2.size();i++) if( str1[i] != str2[i] ) ++Diffs; return Diffs; } vector<string> myWords; map<string, WORDHDL> myWordsLookup; map<WORDHDL, WordSet> myNeighborSets; }; // A path between two words. class Link { public: inline Link(const vector<WORDHDL>& aPath) : myPath(aPath) {} inline Link(const WORDHDL& aWord) { myPath.push_back(aWord); } inline unsigned int GetLength() const { return myPath.size(); } inline const vector<WORDHDL>& GetPathStrings() const { return myPath; } inline WORDHDL GetStart() const { return *myPath.begin(); } inline WORDHDL GetEnd() const { return *myPath.rbegin(); } // Note that we rely on the caller to ensure this is valid. This is // for performance reasons. Link* Combine(const Link* Src) const { vector<WORDHDL> lhs = myPath; const vector<WORDHDL>& rhs = Src->myPath; lhs.insert(lhs.end(), rhs.begin(),rhs.end()); return new Link(lhs); } private: vector<WORDHDL> myPath; }; class Dictionary { public: ~Dictionary() { Clear(); } inline void AddWord(const string& aWord) { myWords.insert(myWordHdls.AddHandle(aWord)); } // Add any paths that can be combined with this one to the work queue void AddScored(Link* aLnkPtr, WORDHDL anEnd) { unsigned int aScore = aLnkPtr->GetLength(); WordSet neighbors = myWordHdls.GetNeighbors(aLnkPtr->GetEnd()); for(WordSet::iterator it=neighbors.begin();it!=neighbors.end();it++) { for(multimap<WORDHDL, Link*>::iterator jt = myStarts.lower_bound(*it); jt != myStarts.end() && jt->first == *it; jt++) { Link* anotherLnkPtr=jt->second; unsigned int aTargetScore = 0; int aDist = myWordHdls.Distance(anotherLnkPtr->GetEnd(), anEnd); aTargetScore = aScore + anotherLnkPtr->GetLength(); aTargetScore += aDist; myWorkQueue.insert(WorkQueue::value_type( Score(aTargetScore, aDist), LnkPr(aLnkPtr,anotherLnkPtr) ) ); } } } inline void AddLink(Link* aLnkPtr, WORDHDL anEnd) { myPaths.insert(PrLnkMap::value_type(WordPr(aLnkPtr->GetStart(),aLnkPtr->GetEnd() ),aLnkPtr)); myStarts.insert(multimap<WORDHDL, Link*>::value_type(aLnkPtr->GetStart(), aLnkPtr)); AddScored(aLnkPtr, anEnd); } void AddLink(WORDHDL aWord) { if( 0 == GetPath(aWord,aWord) ) { Link* aLnkPtr = new Link(aWord); myPaths.insert(PrLnkMap::value_type(WordPr(aWord,aWord),aLnkPtr)); myStarts.insert(multimap<WORDHDL, Link*>::value_type(aLnkPtr->GetStart(), aLnkPtr)); } } void AddNeighbors(Link* aLnkPtr) { WordSet neighbors = myWordHdls.GetNeighbors(aLnkPtr->GetEnd()); for(WordSet::iterator it=neighbors.begin();it!=neighbors.end();it++) { if( 0 == GetPath(*it,*it) ) { Link* anotherLnkPtr = new Link(*it); myPaths.insert(PrLnkMap::value_type(WordPr(*it,*it),anotherLnkPtr)); myStarts.insert(multimap<WORDHDL, Link*>::value_type(anotherLnkPtr->GetStart(), anotherLnkPtr)); } } } Link* GetPath(WORDHDL aStart, WORDHDL anEnd) { PrLnkMap::const_iterator it = myPaths.find(WordPr(aStart,anEnd)); if( it == myPaths.end() ) return NULL; else return it->second; } vector<string> Connect(const string& aStartStr, const string& anEndStr) { WORDHDL aStart = myWordHdls.GetHandle(aStartStr); WORDHDL anEnd = myWordHdls.GetHandle(anEndStr); // Unknown words already fail. if( aStart == -1 || anEnd == -1 ) return vector<string>(); // If we've already run into this path, return. This will only happen // on searches with the previous start (though not necessarily with the // previous end) if( Link* aLnkPtr = GetPath(aStart,anEnd) ) { return myWordHdls.GetPath(aLnkPtr->GetPathStrings()); } AddLink(aStart); AddNeighbors(GetPath(aStart,aStart)); myWorkQueue.clear(); for(PrLnkMap::iterator it=myPaths.begin();it!=myPaths.end();it++) { Link* aLnkPtr = it->second; if( aLnkPtr->GetStart() == aStart ) { AddScored(aLnkPtr, anEnd); } } while(!myWorkQueue.empty()) { Link* aLeftLnkPtr = myWorkQueue.begin()->second.first; Link* aRightLnkPtr = myWorkQueue.begin()->second.second; myWorkQueue.erase(myWorkQueue.begin()); Link *aResultLnkPtr = aLeftLnkPtr->Combine(aRightLnkPtr); if( myPaths.find(WordPr(aResultLnkPtr->GetStart(),aResultLnkPtr->GetEnd())) == myPaths.end() ) { AddNeighbors(aResultLnkPtr); AddLink(aResultLnkPtr, anEnd); if( aResultLnkPtr->GetStart() == aStart && aResultLnkPtr->GetEnd() == anEnd ) return myWordHdls.GetPath(aResultLnkPtr->GetPathStrings()); } } return vector<string>(); } void Clear() { for(PrLnkMap::iterator it=myPaths.begin();it!=myPaths.end();it++) delete it->second; myPaths.clear(); myStarts.clear(); myWordHdls.Clear(); } private: typedef pair<WORDHDL,WORDHDL> WordPr; typedef map<WordPr, Link*> PrLnkMap; typedef pair<unsigned int, unsigned int> Score; struct Less : public less<Score> { bool operator()(const Score& x, const Score& y) const { if( x.first == y.first ) return x.second < y.second; else return x.first < y.first; } }; typedef pair<Link*,Link*> LnkPr; typedef multimap< Score, LnkPr, Less > WorkQueue; PrLnkMap myPaths; WorkQueue myWorkQueue; WordSet myWords; multimap<WORDHDL, Link*> myStarts; WordHdls myWordHdls; }; int main() { vector<Dictionary> aDict; for(;;) { cout << "? "; string command; cin >> command; if( command == "exit" ) return 0; else if( command == "add-dictionary") { // We have no choice but to dump everything we know if more // words are added. for(unsigned int i=0;i<aDict.size();i++) aDict[i].Clear(); string dict; cin >> dict; ifstream in(dict.c_str()); if( in ) { string word; while( !in.eof() ) { in >> word; if( !word.empty() ) { // g++ doesn't seem to do vector correctly. reserve and resize don't work. while( aDict.size() < word.size() ) aDict.push_back(Dictionary()); aDict[word.size()-1].AddWord(word); } } } } else if( command == "connect" ) { string src; cin >> src; string trgt; cin >> trgt; vector<string> result; if( src.size() == trgt.size() ) result = aDict[src.size()-1].Connect(src, trgt); if( result.size() == 0 ) cout << "No way to connect " << src << " and " << trgt << "\n"; else { for(vector<string>::iterator jt = result.begin();jt!=result.end();jt++) { if( jt != result.begin() ) cout << " > "; cout << *jt; } cout << "\n"; } } } return 0; }

On a 1 Ghz Athlon running Linux, the G++ version gives the following on the testlist defined elsewhere: real 9.43, user 9.13, sys 0.26.
-----------------------
This is k5. We're all tools - duxup

Nice by codemonkey_uk, 04/29/2002 01:09:44 PM EST (none / 0)

Sadly, (none / 0) (#174)
by trhurler on Mon Apr 29, 2002 at 02:17:09 PM EST

I am not going to be done in time. I haven't really worked on it since last Wednesday due to real life issues, and I don't have time now. However, if/when I finish it, I'll let you know just for comparison's sake. (I could have a stupid slow version done today, but that's pointless, as it is no challenge and I'd have to totally rewrite everything that isn't already written anyway.)

--
'God dammit, your posts make me hard.' --LilDebbie

my revised version (none / 0) (#180)
by wintergreen on Tue Apr 30, 2002 at 02:56:15 AM EST

It's slow, not at all clever, but it works. Kudos to jacob and all who participated, I know I'll be spending some time studying the other solutions.

use strict; my @DICT; my $inp; do { print "? "; $inp = <>; if( $inp =~ /^add-dictionary (.*)$/ ) { if(open(FH, $1)) { my @newwords = <FH>; chomp for @newwords; push @DICT, @newwords; print "Added $1\n"; } else { print "Error adding $1\n"; } } elsif( $inp =~ /^connect (.*?) (.*)$/ ) { do_connect($1,$2); } elsif( $inp !~ /^exit$/ ) { print "Invalid input.\n"; } } while( $inp !~ /^exit$/ ); # branch and bound sub do_connect { my( $w1, $w2 ) = @_; unless( length($w1) eq length($w2) ) { print "No way to connect $w1 to $w2.\n"; return; } my @pathlist = ([$w1]); do { if( $pathlist[0][-1] eq $w2 ) { print join(" > ", @{$pathlist[0]}), "\n"; return; } my $newpaths = add_path( shift @pathlist ); push(@pathlist, @$newpaths) if @$newpaths; @pathlist = sort { scalar(@$a) <=> scalar(@$b) } @pathlist; } while( @pathlist ); print "No way to connect $w1 to $w2.\n"; } sub add_path { my $path = shift; my @newpaths; $" = '|'; my $used = qr{^(@$path)$}; my %difs; my @end = split / */, $path->[-1]; WORD: foreach my $word (@DICT) { next if length($word) != length($path->[-1]); next if $word =~ $used; my @word = split / */, $word; for my $i ( 0..$#word ) { $difs{$word}++ if ($word[$i] ne $end[$i]); if( $difs{$word} > 1 ) { delete $difs{$word}; next WORD; } } } foreach my $newword ( keys %difs ) { push @newpaths, [ @$path, $newword ]; } return \@newpaths; }

When Tuesday (none / 0) (#183)
by Cal Bunny on Tue Apr 30, 2002 at 03:07:24 AM EST

When would you like the entries to be on Tuesday?

^cb^
Kudos to you for warping my fragile little mind. - communist

Any time by jacob, 04/30/2002 10:11:20 AM EST (none / 0)

*ahem* - excuse me... by codemonkey_uk, 04/30/2002 10:34:14 AM EST (none / 0)

Oops! Sorry for the late response by jacob, 04/30/2002 10:51:44 AM EST (none / 0)

Please upgrade your compiler by codemonkey_uk, 04/30/2002 10:59:16 AM EST (none / 0)

Fixed one problem by jacob, 04/30/2002 11:13:40 AM EST (none / 0)

Suggested fixes by codemonkey_uk, 04/30/2002 11:27:17 AM EST (none / 0)

more fun by jacob, 04/30/2002 11:37:48 AM EST (none / 0)

Okay... by codemonkey_uk, 04/30/2002 11:53:17 AM EST (none / 0)

That worked by jacob, 04/30/2002 12:14:51 PM EST (5.00 / 1)

Hazar! by codemonkey_uk, 04/30/2002 12:16:09 PM EST (none / 0)

C++ version of my submission (none / 0) (#184)
by tmoertel on Tue Apr 30, 2002 at 03:11:42 AM EST

I finally got around to re-coding my submission into C++. It processes i's benchmark in 6.9 seconds and the 16X benchmark in about 22.4 seconds on a P2/233 (laptop). Here's the code:

// TGM 20020429 // // PFC 4 entry -- a C++ translation of my Haskell entry // // $Id: tgm.cpp,v 1.5 2002/04/30 07:00:26 thor Exp $ // // compile: g++ -o tgm -O2 tgm.cpp #include <stdio.h> // for perror #include <unistd.h> #include <fcntl.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/mman.h> // can you see what what's coming? #include <iostream> #include <vector> #include <string> #include <map> #include <queue> #include <fstream> // configurables static const int INIT_LENGTH_PARTITIONS = 50; // adapters struct ltstr { bool operator()(const char* s1, const char* s2) const { return strcmp(s1, s2) < 0; } }; // types typedef map<string, vector<int> > TraceToIndices; typedef map<const char*, int, ltstr> WordToIndex; typedef vector<const char*> WordVector; class WordDatabase { public: WordDatabase() : _pending(0) { } void connect(const string& from, const string& to); void process_pending_words(); void push_back(const char* s) { _words.push_back(s); _pending++; } int size() { return _words.size(); } protected: WordVector _words; // words we know WordToIndex _wti; // word -> index into _words TraceToIndices _tti; // trace -> vector<int> of word indices int _pending; // number of words pending processing }; typedef vector<WordDatabase> LPWordDatabase; LPWordDatabase lpwdb(INIT_LENGTH_PARTITIONS); void WordDatabase::connect(const string& from_str, const string& to_str) { process_pending_words(); WordToIndex::iterator from_i = _wti.find(from_str.c_str()); WordToIndex::iterator to_i = _wti.find(to_str.c_str()); if (from_i != _wti.end() && to_i != _wti.end()) { // do BFS to find path (backwards to avoid later reversal) int goal = from_i->second; int start = to_i->second; queue<pair<int,int> > open; map<int, int> closed; open.push(make_pair<int,int>(start, start)); // iterate until the open set is empty while (!open.empty()) { // examine the next node in the open set pair<int,int> np_pair = open.front(); open.pop(); int node = np_pair.first; if (closed.find(node) != closed.end()) continue; // it has already been closed // close this node and remeber its parent for back-tracing int parent = np_pair.second; closed[node] = parent; // is the current node the goal? if (node == goal) { cout << _words[node]; while (node != parent) { node = parent; parent = closed[parent]; cout << " > " << _words[node]; } cout << endl; return; // !!! early exit !!! } // add the node's neighbors to the search string trace(_words[node]); for (int j = 0; j < trace.size(); j++) { char save = trace[j]; trace[j] = '.'; vector<int>& trace_neighbors = _tti[trace]; for (int tni = 0; tni < trace_neighbors.size(); tni++) { int neighbor = trace_neighbors[tni]; if (neighbor != node) open.push(make_pair<int,int>(neighbor, node)); } trace[j] = save; } } } cout << "No way to connect " << from_str << " to " << to_str << "." << endl; } void WordDatabase::process_pending_words() { // update word-to-index and trace-to-indices mappings with any // words added since the last time they were rebuilt for (int i = _words.size() - _pending; i < _words.size(); i++) { _wti[_words[i]] = i; string trace(_words[i]); for (int j = 0; j < trace.size(); j++) { char save = trace[j]; trace[j] = '.'; _tti[trace].push_back(i); trace[j] = save; } } _pending = 0; } static void do_add_database(); static void do_connect(); // MAIN int main(int, char**) { string cmd; while (cin) { cout << "? "; cin >> cmd; if (cmd[0] == 'a') do_add_database(); else if (cmd[0] == 'c') do_connect(); else break; } } static off_t file_size(const char* filename) { struct stat st; off_t len = 0; if (stat(filename, &st) < 0) perror("stat in file_size"); else len = st.st_size; return len; } static void do_add_database() { // the name of the game here is speeeeeeed string filename; cin >> filename; char *dict; // open the dictionary file int fd = open(filename.c_str(), O_RDONLY); if (fd >= 0) { // try to mmap it *all* into memmory (private) off_t len = file_size(filename.c_str()); if (len && (dict = (char*) mmap(0, len, PROT_READ|PROT_WRITE, MAP_PRIVATE, fd, 0))) { // now that the dictionary is mmaped, convert all the // words within it into c_strings (in place!) and add them // into the appropriate length-partitioned word list char* dict_end = dict + len; while (dict < dict_end) { // dict points to start of a word char *p; for (p = dict; p < dict_end && *p != '\n'; p++) ; // find newline at end of word if (p < dict_end) { *p = '\0'; int wordlen = p - dict; if (wordlen >= lpwdb.size()) lpwdb.resize(wordlen+1); lpwdb[wordlen].push_back(dict); } dict = p + 1; } cout << "Added " << filename << endl; } close(fd); // leaves mapping open (we have pointers into it) return; // !!! EARLY EXIT !!! } cout << "Error adding " << filename << endl; } static void do_connect() { string from, to; cin >> from >> to; int len = from.length(); if (len < lpwdb.size()) lpwdb[len].connect(from, to); else cout << "No way to connect " << from << " to " << to << "." << endl; }

--
My blog | LectroTest

[ Disagree? Reply. ]

Newer, faster, tastier version by tmoertel, 04/30/2002 09:43:37 PM EST (none / 0)

coding time C++ versus Haskell by svillee, 05/01/2002 08:19:28 PM EST (none / 0)

33% and coding time by tmoertel, 05/02/2002 12:28:00 AM EST (none / 0)

Python 2.2 (none / 0) (#185)
by djotto on Tue Apr 30, 2002 at 06:04:07 AM EST

I'm not happy with this, but I've run out of time to test and tweak. I suspect that all I'm going to prove is that simple entries win out over complex ones.

I gave up trying to persuade Scoop to post, and I'm not about to install Perl, so my entry is available here.

It does a BFS simultaneously from both ends, and looks for them to join up in the middle. Checking for the boundaries to join in the middle is expensive, so it only does it occasionally.

Small wordlists are parsed into dictionaries where word => [list of neighbours]. (This pre-processing was dog-slow until I stole the x=y, y=x solution from MK77's elegant entry (this is how Python should be done)). Once the wordlist grows over a certain size, the pre-processing is abandoned in favour of run-time processing.

dual search by Cal Bunny, 04/30/2002 06:12:04 AM EST (none / 0)

Performance by djotto, 04/30/2002 07:12:54 AM EST (none / 0)

Updated by djotto, 04/30/2002 07:47:55 AM EST (none / 0)
Posting to k5 with python by MK77, 04/30/2002 09:12:43 AM EST (none / 0)

Thanks by djotto, 04/30/2002 10:13:14 AM EST (none / 0)

syntax highlighting by MK77, 04/30/2002 10:21:19 AM EST (none / 0)

Colours by djotto, 04/30/2002 11:00:33 AM EST (none / 0)

Syntax Highlighted Source Code by Logan, 04/30/2002 09:20:31 PM EST (none / 0)

(Spoiler?) Some data observations (none / 0) (#205)
by KWillets on Tue Apr 30, 2002 at 09:16:09 PM EST

Hope this isn't a spoiler, but I thought some insight into the structure of the data might be interesting.

I was trying to some pre-indexing of adjacent words, building lists of "metawords" as one person called them, and I noticed a few things about WORD.LST.

For one thing, about half the words have no connections to other words at all. The longer ones in particular are often unlinked to any other word. These could be deleted with little loss in functionality. A large number of words only have one or two neighbors and often these groups are isolated from the rest of the dictionary.

My indexing was consuming a large amount of time trying words which had no connections, so I stopped indexing words longer than seven characters. The longer words seemed to have a much lower hit rate for connections. This step at least made the process terminate in a reasonable time, a few minutes for the perl version.

Since there were so many singleton sets, I saved memory by only saving sets which had two or more entries (the only useful ones). So far the method I use to find these sets is to scan the dictionary, generate all variants from the first word found, test them in a hash of the dictionary, and save them in a list as found. That way I know right away if the list has more than one element, and I can save it or discard it once the list is done.

Another single-pass method might be to create singleton entries the first time a metaword is found, but delete them when it is obvious that no further matches in the (sorted) dictionary exist. For instance, if you're looking for all variants of "dissolve_" (dissolves, dissolver, dissolved), you're finished once you get to "distemper" or some later entry, since the ordering guarantees no further prefix matches. I'm thinking over ways to do this generate-and-cleanup without tons of code.

These are the perl subs for building the index; I haven't actually used the index for anything yet.

The output in %ix is "metaword" => ( index1, index2...) where index1..n are array indices in @dictionary

eg:

"announce." => ( index of "announcer", "announces" in @dictionary )

The #'s can be removed to print out the index entries; the output might be interesting for those wanting to explore the data.
-----------------------

my @dictionary ;
my %dictix;

sub adddictionary {
my $fname = shift;
open DICT, "< $fname" || croak("Cannot open $fname\n");
while( <DICT> ) {
    s/[^a-z]//g ;
    push( @dictionary, $_ );
    $dictix{ $_ } = $#dictionary ; # reference by array index
}
close DICT;
indexdictionary();
}

my %ix;

sub indexdictionary {
my $wrd, $x;
for(my $j = 0; $j < $#dictionary; $j++ ) {
    my $word = @dictionary[ $j ];
    next if length( $word ) > 7;
    for( my $i = length( $word ) -1; $i >= 0; $i-- ) {
    my $w2 = $word ;
    substr( $w2, $i, 1 ) = ".";
    if( defined $ix{ $w2 } ) {
    } else {
#        print "\n$w2 ";
        my @hits;
        for ( $c = ord substr( $w2, $i, 1 ), my $tst = $w2; $c <= ord "z"; $c++ ) {
        substr( $tst, $i, 1 ) = chr $c;
        if( exists $dictix{ $tst } ) {
#            print "$tst ";
            push @hits, $dictix{ $tst };
        }
        }
        if( $#hits > 0 ) {
#        print " ****";
        $ix{ $w2 } = @hits;
        }
    }
    }
}
}

Update (5.00 / 1) (#213)
by jacob on Thu May 02, 2002 at 04:01:31 PM EST

I have now commenced scoring each program. As I have a busy weekend, scores will likely not be announced until Wednesday or Thursday.

I'd like to thank everyone for their interest and enthusiasm. It has been really fun to see the amazing quality of work people have put in to this challenge. I'm really impressed with all of you who put energy into it.

--
"it's not rocket science" right right insofar as rocket science is boring

--Iced_Up

Thursday by codemonkey_uk, 05/09/2002 01:11:07 PM EST (none / 0)

Working on it now by jacob, 05/09/2002 01:26:56 PM EST (5.00 / 1)
I'll take "Things that will upset cm_uk" by jacob, 05/10/2002 02:28:49 AM EST (5.00 / 1)

too late for judging, but super fast (none / 0) (#216)
by svillee on Thu May 09, 2002 at 10:09:38 PM EST

/* based on tmoertel's algorithm, reimplemented in C */ /* time on a Celeron 700 MHz: */ /* i's test: 2.16 seconds */ /* tmoertel's 16X test: 2.81 seconds */ #include <stdio.h> #include <stdlib.h> #include <string.h> #define MAX_INPUT_LENGTH 4096 #define MAX_VERB_LENGTH 32 #define MAX_WORD_LENGTH 1024 /* A chain string may represent either a word or a trace. Any code */ /* working with this structure either knows statically which one it */ /* is or doesn't care. */ typedef struct ChainString_s { struct ChainString_s *left; struct ChainString_s *right; int balance; struct ChainString_s *nextPending; /* unused for trace */ /* For a word, link is a pointer to an array of trace links. */ /* For a trace, link is the head of a list of word links. */ struct Link_s *link; char data[1]; /* variable length */ } ChainString_t; /* A link is a record that a certain word matches a certain trace. */ typedef struct Link_s { struct Link_s *nextSameTrace; ChainString_t *word; ChainString_t *trace; struct Link_s *nextActive; /* The next two fields are meaningful only for the current connect. */ /* If neighbor is not null, it means there is a chain from this */ /* word to the target word, wherein this word and the specified */ /* neighbor both match this trace. */ struct Link_s *nextInOrbit; struct Link_s *neighbor; } Link_t; static int length; static Link_t *firstActive; static ChainString_t *topWordByLength[MAX_WORD_LENGTH+1]; static ChainString_t *topTraceByLength[MAX_WORD_LENGTH+1]; static ChainString_t *firstPendingByLength[MAX_WORD_LENGTH+1]; static char input[MAX_INPUT_LENGTH+1]; static char verb[MAX_VERB_LENGTH+1]; static char word1[MAX_WORD_LENGTH+1]; static char word2[MAX_WORD_LENGTH+1]; static char word3[MAX_WORD_LENGTH+1]; static ChainString_t *findWord(char *data) { ChainString_t *wp; int m; wp = topWordByLength[length]; while (wp != (ChainString_t *) 0) { m = strcmp(data, wp->data); if (m == 0) break; if (m < 0) { wp = wp->left; } else { wp = wp->right; } } return wp; } void insertNode( ChainString_t *newNode, ChainString_t **where, int *outerGrew) { ChainString_t *nodeA; ChainString_t *nodeB; ChainString_t *nodeC; int innerGrew; nodeA = *where; if (nodeA == (ChainString_t *) 0) { /* The tree was empty, so make the new node the only one. */ newNode->left = (ChainString_t *) 0; newNode->right = (ChainString_t *) 0; newNode->balance = 0; *where = newNode; *outerGrew = 1; } else if (strcmp(newNode->data, nodeA->data) < 0) { /* Add the new node to the left subtree. */ insertNode(newNode, &nodeA->left, &innerGrew); if (!innerGrew) { *outerGrew = 0; } else if (nodeA->balance == 1) { /* The right subtree height was 1 more, and now they are the same. */ nodeA->balance = 0; *outerGrew = 0; } else if (nodeA->balance == 0) { /* The subtree heights were the same, and now the left subtree */ /* height is 1 more. */ nodeA->balance = -1; *outerGrew = 1; } else { /* The left subtree height was 1 more, and now it is 2 more, */ /* so we must rebalance. Examine the left subtree. */ nodeB = nodeA->left; nodeC = nodeB->right; if (nodeB->balance == -1) { /* A B */ /* / \ / \ */ /* B Z ---> W A */ /* / \ / \ */ /* W C C Z */ nodeA->left = nodeC; nodeA->balance = 0; nodeB->right = nodeA; nodeB->balance = 0; *where = nodeB; } else { /* A C */ /* / \ / \ */ /* B Z ---> B A */ /* / \ / \ / \ */ /* W C W X Y Z */ /* / \ */ /* X Y */ nodeB->right = nodeC->left; if (nodeC->balance == 1) { nodeB->balance = -1; } else { nodeB->balance = 0; } nodeA->left = nodeC->right; if (nodeC->balance == -1) { nodeA->balance = 1; } else { nodeA->balance = 0; } nodeC->left = nodeB; nodeC->right = nodeA; nodeC->balance = 0; *where = nodeC; } *outerGrew = 0; } } else { /* Add the new node to the right subtree. */ insertNode(newNode, &nodeA->right, &innerGrew); if (!innerGrew) { *outerGrew = 0; } else if (nodeA->balance == -1) { /* The left subtree height was 1 more, and now they are the same. */ nodeA->balance = 0; *outerGrew = 0; } else if (nodeA->balance == 0) { /* The subtree heights were the same, and now the right subtree */ /* height is 1 more. */ nodeA->balance = 1; *outerGrew = 1; } else { /* The right subtree height was 1 more, and now it is 2 more, */ /* so we must rebalance. Examine the right subtree. */ nodeB = nodeA->right; nodeC = nodeB->left; if (nodeB->balance == 1) { /* A B */ /* / \ / \ */ /* W B ---> A Z */ /* / \ / \ */ /* C Z W C */ nodeA->right = nodeC; nodeA->balance = 0; nodeB->left = nodeA; nodeB->balance = 0; *where = nodeB; } else { /* A C */ /* / \ / \ */ /* W B ---> A B */ /* / \ / \ / \ */ /* C Z W X Y Z */ /* / \ */ /* X Y */ nodeB->left = nodeC->right; if (nodeC->balance == -1) { nodeB->balance = 1; } else { nodeB->balance = 0; } nodeA->right = nodeC->left; if (nodeC->balance == 1) { nodeA->balance = -1; } else { nodeA->balance = 0; } nodeC->left = nodeA; nodeC->right = nodeB; nodeC->balance = 0; *where = nodeC; } *outerGrew = 0; } } } static void doAddDictionary(void) { FILE *fp; int grew; int n; ChainString_t *wp; Link_t *lp; fp = fopen(word1, "r"); if (fp == (FILE *) 0) { fprintf(stderr, "Error adding %s\n", word1); return; } while (fscanf(fp, "%1024s", word2) > 0) { /* See if the word is already in the dictionary. */ length = strlen(word2); if (findWord(word2) != (ChainString_t *) 0) continue; /* Allocate a block to hold the array of links */ /* followed by the word itself. */ n = length * sizeof (Link_t); lp = (Link_t *) malloc(n + sizeof (ChainString_t) + length); if (lp == (Link_t *) 0) { fprintf(stderr, "out of memory for word\n"); exit(-1); } wp = (ChainString_t *) (((char *) lp) + n); wp->link = lp; strcpy(wp->data, word2); /* Add it to the balanced tree of words. */ insertNode(wp, &topWordByLength[length], &grew); /* Add it to the list of words that need traces calculated. */ wp->nextPending = firstPendingByLength[length]; firstPendingByLength[length] = wp; } fclose(fp); printf("Added %s\n", word1); } static void calculatePendingTraces(void) { ChainString_t *wp; ChainString_t *tp; Link_t *lp; int grew; int m; int n; char saveC; wp = firstPendingByLength[length]; if (wp != (ChainString_t *) 0) { do { lp = wp->link; strcpy(word3, wp->data); /* Calculate traces by substituting dot ('.') */ /* for each letter of the word. */ for (n = 0; n < length; ++n) { saveC = word3[n]; word3[n] = '.'; /* See if the trace already exists. */ tp = topTraceByLength[length]; while (tp != (ChainString_t *) 0) { m = strcmp(word3, tp->data); if (m == 0) break; if (m < 0) { tp = tp->left; } else { tp = tp->right; } } if (tp == (ChainString_t *) 0) { /* Allocate a new trace. */ tp = (ChainString_t *) malloc(sizeof (ChainString_t) + length); if (tp == (ChainString_t *) 0) { fprintf(stderr, "out of memory for trace\n"); exit(-1); } tp->link = (Link_t *) 0; strcpy(tp->data, word3); /* Add it to the balanced tree of traces. */ insertNode(tp, &topTraceByLength[length], &grew); } /* Fill in this link. */ lp->word = wp; lp->trace = tp; lp->neighbor = (Link_t *) 0; lp->nextSameTrace = tp->link; tp->link = lp; ++lp; word3[n] = saveC; } wp = wp->nextPending; } while (wp != (ChainString_t *) 0); firstPendingByLength[length] = (ChainString_t *) 0; } } static void doConnect(void) { ChainString_t *target1; ChainString_t *target2; ChainString_t *wp; int n; Link_t *lp; Link_t *lp2; Link_t *lp3; Link_t *firstInOrbit; Link_t endMark; length = strlen(word1); if (strlen(word2) == length) { target1 = findWord(word1); target2 = findWord(word2); if ((target1 != (ChainString_t *) 0) && (target2 != (ChainString_t *) 0)) { if (target1 == target2) { printf("%s\n", word1); return; } calculatePendingTraces(); /* Reset any active links. */ lp = firstActive; if (lp != (Link_t *) 0) { do { lp->neighbor = (Link_t *) 0; lp = lp->nextActive; } while (lp != (Link_t *) 0); firstActive = (Link_t *) 0; } /* Set up a dummy link as the chain end mark. */ endMark.word = target2; endMark.nextInOrbit = (Link_t *) 0; endMark.neighbor = (Link_t *) 0; /* Build successive orbits from target2 until target1 */ /* is reached or the outermost orbit is empty. */ firstInOrbit = &endMark; while (firstInOrbit != (Link_t *) 0) { lp = firstInOrbit; firstInOrbit = (Link_t *) 0; while (lp != (Link_t *) 0) { wp = lp->word; lp2 = wp->link; for (n = 0; n < length; ++n) { if (lp2 != lp) { lp3 = lp2->trace->link; while (lp3 != (Link_t *) 0) { if ((lp3 != lp2) && (lp3->neighbor == (Link_t *) 0)) { if (lp3->word == target1) { printf("%s", target1->data); lp2 = lp; while (lp2 != (Link_t *) 0) { printf(" > %s", lp2->word->data); lp2 = lp2->neighbor; } putchar('\n'); return; } lp3->neighbor = lp; lp3->nextInOrbit = firstInOrbit; firstInOrbit = lp3; lp3->nextActive = firstActive; firstActive = lp3; } lp3 = lp3->nextSameTrace; } } ++lp2; } lp = lp->nextInOrbit; } } } } printf("No way to connect %s to %s.\n", word1, word2); } int main(int argc, char **argv) { int i; while (1) { printf("? "); input[0] = '\0'; fgets(input, MAX_INPUT_LENGTH+1, stdin); verb[0] = '\0'; word1[0] = '\0'; word2[0] = '\0'; sscanf(input, "%32s %1024s %1024s", verb, word1, word2); if (strcmp(verb, "add-dictionary") == 0) { doAddDictionary(); } else if (strcmp(verb, "connect") == 0) { doConnect(); } else if (strcmp(verb, "exit") == 0) { break; } } return 0; }

Partial results (none / 0) (#218)
by jacob on Wed May 15, 2002 at 03:57:06 AM EST

are here.

--
"it's not rocket science" right right insofar as rocket science is boring

--Iced_Up

Final results (5.00 / 1) (#219)
by jacob on Tue May 21, 2002 at 11:42:38 PM EST

are here. Thanks to everyone who entered!

--
"it's not rocket science" right right insofar as rocket science is boring

--Iced_Up

Programming Fun Challenge 4 | 219 comments (219 topical, 0 editorial, 0 hidden)

All trademarks and copyrights on this page are owned by their respective companies. The Rest © 2000 - Present Kuro5hin.org Inc.
See our legalese page for copyright policies. Please also read our Privacy Policy.
Kuro5hin.org is powered by Free Software, including Apache, Perl, and Linux, The Scoop Engine that runs this site is freely available, under the terms of the GPL.
Need some help? Email help@kuro5hin.org.
My heart's the long stairs.