The Prolog approach
Prolog is, essentially, a query language for databases, like
SQL. However, unlike SQL, which is a limited query language
for relational databases, (tables of rows and columns, much like a
spreadsheet) Prolog is a query language for matching complicated
patterns against a database of simple facts.
Thus, all Prolog programs consist of three parts: a list of
facts, a list of pattern matching rules (sometimes
also called predicates in Prolog jargon) and a list of
queries. (Also sometimes called goals in Prolog
jargon.)
Prolog facts
Facts, in Prolog, are pre-defined patterns that get stored in
Prolog's internal database, usually in a manner to make searching
them more efficient.
There are, essentially, three types of values in Prolog:
- Numbers.
Ex: 1
, -2
, 0.5
,
-0.776
.
- Symbols, which are, for all intents and purposes,
immutable strings of lower-case letters without special characters or
spaces. (We'll explain why symbols must be lower-case later.)
Ex: hello
, world
,
this_is_a_symbol
, atom3
.
- Linked lists of symbols or numbers. Lists are untyped.
Ex: [hello, cruel, world]
,
[1, 2, 3]
,
[1, hello, 2, world]
, []
.
Modern, practical Prolog implementations define many more useful
datatypes; however, this is implementation-dependent and doesn't
really matter as far as this article is concerned. Look up your
Prolog implementation's manual if you are interested.
Facts, or pre-defined patterns,
are written using the standard notation of functions or procedures in
other programming languages. The symbol before the opening
parenthesis is the pattern's name, with a list of comma-separated
values inside the parentheses.
Ex: f(hello)
,
greeting_message(hello, world)
,
g([hello, world])
,
fac(3, 6)
.
Note that the following is illegal in Prolog: f()
.
Patterns without arguments are written without parentheses, like this:
f
.
Also note that pattern arguments can have any of Prolog's datatypes,
thus symbol, number and list arguments are allowed.
Thus, to define a fact in Prolog, all you need to do is to write a
Prolog program that lists pre-defined patterns
with a period (full-stop) after each entry.
Example of a Prolog program that defines several facts:
hello.
world.
f(hello, world).
g([hello, world]).
standard_greeting([hello, world], 2).
This simple program inserts five pre-defined patterns into Prolog's
internal database. (hello
and world
, two
patterns without arguments; f(hello, world)
, a pattern
f
with two arguments; g([hello, world])
, a
pattern g
with one argument, a list; and
standard_greeting([hello, world], 2)
, a pattern
standard_greeting
with 2 arguments, a list and a number)
When several patterns are defined with the same name and the same
number of arguments, Prolog will run through them, one after another
in a top-to-down fashion, when trying to match them. (You can think of
this as a short-circuited "OR" of pattern matching rules.)
Defining pattern-matching rules
Defining pattern-matching rules in Prolog is equally simple:
f(hello, world) :- g([hello, world]).
Whenever Prolog sees the special symbol :-
, Prolog
creates a new pattern-matching rule. The basic meaning of
:-
is very simple: to match whatever is to left of
the :-
, the part to the right must be matched. This
allows to "decompose" complex patterns into smaller, more manageable ones.
To make the task practical, Prolog defines many operators that help
us in the task of composing pattern-matching rules. Some of the more
important and useful are:
,
: A comma denotes sequential matching of patterns;
this is equivalent to a "short-circuited AND" in many imperative
programming languages. (C
's &&
, for
example.)
Ex:f :- a, b, c.
This means that to match the
pattern f
, we need to match, in order, patterns
a
, b
and c
.
;
: A semi-colon denotes choice; this is equivalent
to a "short-circuited OR" in many imperative programming
languages. (C
's ||
, for example.)
Ex:f :- p; q; r.
This means that to match the
pattern f
, either p
must be matched, or, if
Prolog fails to match p
, try to match q
; if
matching q
fails, finally try matching r
.
Note that the semi-colon is essentially equivalent to listing patterns
on separate lines; thus,
f :- (p; q).
is equivalent to
f :- p.
f :- q.
->
: An arrow denotes a conditional pattern rule, in
other words, an "if-then-else" rule.
Ex:f :- (g -> h; i).
This code means that Prolog
first tries to match pattern g
; if the pattern can be
matched, try to match the pattern h
. If g
cannot be matched, try to match the pattern i
.
Note that the construct must be enclosed in parentheses, due to
strangeness in Prolog's syntax rules.
\+
: This is equivalent, in a sense, to the negation
operator in many programming languages. (Like the C
!
.)
Ex:f :- \+ g.
This code means that the pattern
f
matches whenever the pattern g
cannot be
matched.
Variables
This is all very easy to understand, but is, unfortunately, useless in
real-world applications. What we lack are
variables. Variables in Prolog are symbols that begin with a
capital letter; for example: Var
, A
,
Q
, MATCH_ME
.
Whenever Prolog comes upon a variable, it starts searching in its
internal database of facts and pattern-matching rules for a
substitution such that substituting a value for the variable matches
some fact.
A simple example will illustrate the concept better. Consider the
following Prolog program:
i_know(hello).
i_know(world).
is_phrase(A, B) :- i_know(A), i_know(B).
is_greeting(A, B) :- is_phrase(A, B), A = hello.
This program defines two facts (i_know(hello)
and
i_know(world)
) and two pattern-matching rules.
is_phrase(A, B) :- i_know(A), i_know(B).
This rule,
which is named is_phrase
and which accepts two arguments,
tries to find substitutions for the two variables it uses
(A
and B
) such that the pattern on the
right side of the :-
matches against Prolog's internal
fact database.
In this particular case, Prolog will find the following substitutions:
A=hello, B=hello
A=hello, B=world
A=world, B=world
A=world, B=hello
is_greeting(A, B) :- is_phrase(A, B), A = hello.
Again,
a pattern of two arguments. As before, Prolog will try to find
substitutions for the two variables A
and B
such that the pattern rule matches.
The new concept here is the operator =
. This operator is
Prolog's equivalent of variable assignment.
P=Q
means that Prolog will find
substitutions for variables such that the two arbitrary patterns to the
left and to the right of the =
are exactly equal. Note
that in this operator Prolog doesn't care whether or not the two
patterns can be matched against its internal database; what matters
is that the two patterns become equal after =
finished
its work. The =
operator is commutative; thus,
A=B
and B=A
mean the same thing. If such a
substitution cannot be found, the =
operator will fail
to match. For example, hello=world
will always fail to match.
Thus, after executing i_know(A)=i_know(foo)
A
will be substituted with foo
even though
i_know(foo)
does not match against Prolog's internal
database. (By the way, this procedure is often called
unification in Prolog jargon; thus, A=hello
means that A
will be unified with hello
.)
Finally, we can figure out what the pattern is_greeting(A,
B)
does. Here, Prolog searches for substitutions for
A
and B
such that a match against the known
facts is found.
Expanding all the pattern-matching rules, Prolog will find the
following substitutions:
A=hello, B=hello
A=hello, B=world
As you can see, using just a few basic Prolog operators and Prolog's
advanced search engine, we can build pattern-matching rules of
arbitrary complexity. It is here that Prolog's power really shines
though: Prolog allows to build very complicated matching rules very
easily. Thus, Prolog is designed for use in applications where we have just
a few very simple facts, but a lot of very complex search rules.
Contrast this with SQL, which has been designed for the opposite
situation: a great amount of very complex data with very simple, very
basic search rules.
Programming in Prolog
While the basic Prolog engine is enough to perform arbitrarily
complex searches, it is not enough to use Prolog as a general-purpose
programming language.
At the very least, what we miss is a way to do basic input/output, a
way for handling arithmetic and a way for doing loops.
In true Prolog spirit, the designers of the language decided to not
complicate the language with unnecessary constructs and unnecessary
syntax, but instead to write simple, basic hacks that integrate very
well with the basic Prolog query language. (Contrast this with
Oracle's PL/SQL, for those that know it.)
Input and output in Prolog is done with special pattern rules that
always match, and produce output or return input as a side effect.
Since there are a great number of implementation-specific input and
output functions, I will describe two of the most basic, to give the
general idea. If you want to learn more, consult the manual of your
Prolog implementation.
Outputting a value is very simple: the pattern-matching rule
write(A)
will output its argument. This pattern-matching
rule always matches. For example, write_greeting(A, B) :-
is_greeting(A, B), write(A), nl, write(B), nl.
will output two
words on the screen, provided that the two words are a greeting. (The
nl
pattern simply outputs a newline character.)
Basic input is equally simple: the pattern-matching rule
read(A)
will ask the user to input a value, using Prolog
syntax, substitute the typed value for A
and match
successfully. For example, this simple pattern-matching rule of zero
arguments will simply output whatever value the user typed:
echo :- read(A), write(A), nl.
For arithmetic, Prolog uses a special operator called is
.
This operator is just like the =
operator, except that
on the right side must be an arithmetic expression, not a pattern.
For example, A is B + 1
will substitute a value for
A
that is equal to B + 1
. Due to the
special syntax of the is
command, it is not
commutative. Thus, B + 1 is A
will give an error. This
means that there is always a variable to the left of the is
.
Note, however, that the is
operator is still nothing but
a special pattern-matching rule; so, for example, A is A +
1
is an invalid expression that will likely give an error,
since no number can be substituted such that the number becomes equal
to itself incremented by one. If this makes little sense, try making
some simple substitutions in your head: 5 is 5 + 1
violates
the basic rules of arithmetic.
Loops in Prolog are very simple; like other well-known programming
languages, looping is done by writing recursive pattern-matching rules.
An example: infinite_loop(A) :- write(A), nl,
infinite_loop(A).
This rule will run an infinite loop that
prints its argument infinitely many times. Using recursive
applications of pattern-matching rules, any looping construct can be
expressed. Recursion is equivalent to looping, as any programmer of
functional languages knows.
Prolog queries
Queries, or goals, are a way for the user to interact with
Prolog's internal database and pattern-matching rules. Almost all
Prolog implementations include an interactive shell where the user
can type arbitrary patterns; Prolog then tries to match the given
pattern, outputting whatever variable substitutions are needed to
match the pattern. This provides an easy and powerful way to interact
with Prolog in real-time, typing in search queries and getting back
results immediately.
Note, however, that the majority of Prolog implementations do not allow
defining new patterns interactively; a Prolog program must be written
using a separate editor and loaded into the interactive Prolog shell.
Query syntax, like almost everything in Prolog, is elegantly simple.
Here is an example of an interaction, based on the previously shown program:
Ciao-Prolog 1.9 #44: Mon Dec 30 16:47:15 2002
?- ensure_loaded('i:/ivan/foo.pl').
yes
?- f(hello).
yes
?- f(foo).
no
?- is_phrase(hello, world).
yes
?- is_phrase(hello, _).
yes
?- is_greeting(A, B).
A = hello,
B = hello ? ;
A = hello,
B = world ? ;
no
?- is_greeting(_, B).
B = hello ? ;
B = world ? ;
no
?- hello.
yes
?- foo.
{ERROR: user:foo/0 - undefined predicate}
no
?-
The ?-
is the standard Prolog prompt; it means that the user is
invited to type in a Prolog pattern, ending with a period, as all Prolog
statements. Note, also, that Prolog returns either a yes
or a
no
after each query; a yes
means that Prolog was
able to match the query against its internal database; a no
means,
respectively, that Prolog was unable to find a match. (Notice that trying to
use an undefined pattern foo
caused a Prolog error.)
When Prolog encounters variables in the query (like A
and
B
in this example) Prolog prompts us before returning each
found value for the variable, one by one. Finally, there is a special
variable called _
, which means that we do not care about
the values of this variable and do not want them printed.
An extended example
To solidify your understanding of the abstract underpinnings of
Prolog, here are a few very simple programs in Prolog. (In Prolog, the %
character denotes comments.)
Calculating the factorial:
% fac(N,R) : R is the factorial of N.
fac(0,1).
fac(N,R) :- P is N-1, fac(P,Q), R is N*Q.
Using this pattern is simple: typing the query fac(5,R).
, for
example, gives the result: R = 120
.
When playing around a little, though, many deficiencies start becoming
apparent, including unwanted infinite loops and the fact that
fac(N,120)
doesn't give the expected result. The reason for this
is the fact that Prolog is very ill-suited for numerical computation;
arithmetic in Prolog is a hack, and doesn't integrate well into the Prolog
search engine too well.
Addendum: Here is a tail-recursive version of the factorial pattern.
% fac2(N,A,R) : R is the factorial of N; A is the "accumulator" value.
fac2(0,R,R).
fac2(N,Acc,R) :- P is N-1, Q is N*Acc, fac2(P,Q,R).
Usage: fac2(5,1,R).
Reversing a list:
% cat(A, B, C) : C is the concatenation of lists A and B.
cat([], R, R).
cat([H|T1], Z, [H|T2]) :- cat(T1, Z, T2).
% rev(A, B) : B is the list A when reversed.
rev([], []).
rev([H|T], Q) :- rev(T,P), cat(P,[H],Q).
(Here we first encounter the special Prolog syntax for handling lists; the
special system pattern [H|T]
matches any list whose first
element is H
with T
being the rest of the list
without the first element.)
Using this pattern is simple: rev([1, 2, 3],R).
In other words,
find all such R
that rev([1, 2, 3],R)
matches the
Prolog pattern-match rule database. Notice, however, that
rev(R,[1,2,3])
gives us the same result! This is obvious
if you think a little bit about the nature of Prolog's pattern-matching
engine.
As an example, here is a second, simpler version of rev
that
doesn't rely on concatenating lists:
% rev2(A, B, C) : C is the reversal of the list A, C is the "accumulator"
% for growing lists.
rev2([], R, R).
rev2([H|T], Z, R) :- rev2(T, [H|Z], R).
Here, the second parameter of the pattern-matching rule is used as an
"accumulator" for holding intermediate values. Using this rule is simple:
rev2([1, 2, 3], [], R)
. Look at how smart the Prolog
pattern-matching is when it processes lists and other symbolic data:
not only does rev2(A, [], [1, 2, 3])
produce the expected
results, but rev2([1, 2, 3], B, [3, 2, 1])
produces
B = []
, and rev2([1, 2, 3], A, [3, 2])
returns
us a pattern-match failure.
Addendum
Wait one second, Prolog is a logic-based language!
No, it is not. Prolog has several system patterns that were (very) loosely
inspired by formal logic; these were designed to ease the use of Prolog for
those people that are already familiar with formal logic. However, if you
persist with your delusion that Prolog is a language for "handling logic
predicates", you will get bitten sooner or later! (And probably sooner
than later.)
Standard Prolog
Prolog is a very established, industrial-strength, popular language. As
such, there is a very clear and formal ISO standard for Prolog interpreters
and compilers. You can view the standard for ISO Prolog
here,
for example.
Strange language
Prolog programmers and implementors simply love using non-standard,
confusing and sometimes plain wrong language. Do not be afraid of this
peculiar trait; when encountering strange or confusing terms, be aware
that 90% of the time a very simple and down-to-earth concept is hiding
behind it.
The "cut" operator
The so-called "cut" operator (written as !
) is a
pre-defined operator in Prolog that allows the programmer to subtly tweak
the search strategy used by the Prolog matching engine; while this operator
allows some neat programming tricks and optimisation techniques, I do not
advise anyone ever to use this operator. It is a dirty, very confusing and
dangerous hack that will inevitably make your program impossible to read and
introduce subtle bugs. In short, avoid the "cut" like the plague.
Open-Source Prolog
Prolog is a programming language with a great deal of choice as far as
implementations are concerned; there are lots and lots of good, high-quality
implementations of Prolog that are Open-Source.
Try it yourself
You can try interacting with a Prolog shell without installing a Prolog
implementation here.
(The link points to tuProlog,
an Open-Source Prolog written on top of the JVM and embeddable into standard
Java applications.)
For fun
Just in case you are wondering how Prolog can be used in the "real world", take a look at this simple rouguelike game written entirely in Prolog: Caves of Golorp.
Though of course, Prolog is very useful for non-toy applications as well. Browsing through the site of any commercial Prolog vendor, (and there are lots of them) you will inevitably stumble upon a page listing many serious, industrial-grade applications in Prolog.
Warning to Americans
Prolog is decidedly a European language; not only was it originally invented
by a Frenchman, the majority of Prolog implementations are developed and
used in Europe; and even in a purely academic setting, Prolog is much more popular
as a teaching language in European universities when compared to
higher education in the U.S.
Despite this, do not be afraid; Prolog is a very useful tool in its own
right. Good luck.