Perl & Linguistics
Larry Wall is the author of Perl.
He has a background in linguistics,
and brings an interesting perspective to language design.
Subject: Re: Linguistics and Perl?
From: lwall@netlabs.com (Larry Wall)
Date: 1995/07/27
Organization: NetLabs, Inc., Los Altos, California.
Newsgroups: comp.lang.perl.misc
Thomas Dunbar <tdunbar@gserver.grads.vt.edu> wrote:
: In Larry Wall's slides at a VHLL meeting, there are some
: very interesting allusions to linguistic features/considerations
: in Perl (esp the "Natural Language Concepts" slide). is this
: expanded upon anywhere? especially related to Perl but also
: wrt programming languages in general?
Not really, but I can expand on it a little right here.
Learn it once, use it many times
You learn a natural language once and use it many times. The lesson
for a language designer is that a language should be optimized for
expressive power rather than for ease of learning. It's easy to
learn to drive a golf cart, but it's hard to express yourself in
one.
Learn as you go
You don't learn a natural language even once, in the sense that you
never stop learning it. Nobody has ever learned any natural
language completely. Unfortunately, in the interests of
orthogonality, many computer languages are designed so that every
degree of freedom (dimension) is available everywhere. This has
its good points if you understand the whole language, but can lead
to confusion if you don't. You'd like to ignore some of the
dimensions to begin with. You'd like to be able to talk baby talk
and be understood. It's okay if a language is difficult to learn,
as long as you don't have to learn it all at once.
Many acceptable levels of competence
This is more of a sociological feature, compared to "learn as you
go", which is a psychological feature. People don't mind if you
speak a subset of a natural language, especially if you are a child
or a foreigner. (Except in Paris, of course.) If a language is
designed so that you can "learn as you go", then the expectation is
that everyone is learning, and that's okay.
Multiple ways to say the same thing
This one is more of an anthropological feature. People not
only learn as they go, but come from different backgrounds,
and will learn a different subset of the language first. It's
Officially Okay in the Perl realm to program in the subset of
Perl corresponding to sed, or awk, or C, or shell, or BASIC, or
Lisp, or Python. Or FORTRAN, even. Just because Perl is the
melting pot of computer languages doesn't mean you have to stir.
No shame in borrowing
In English (and other languages not suffering an identity crisis),
people don't mind swiping ideas from other languages and making
them part of the language. Efforts to maintain the "purity" of a
language (whether natural or artificial) only succeed in
establishing an elite class of people who know the shibboleths.
Ordinary folks know better, even if they don't know what
"shibboleth" means.
Indeterminate dimensionality
Scientists like to be able to locate things by giving a "vector",
that is, a list of coordinates in a space of known dimensionality.
This is one of the reasons they like orthogonality--it means the
various components of the vector are independent of each other.
Unfortunately, the real world is not usually set up to work that
way. Most problems, including linguistics problems, are a matter
of "getting from here to there", and the geography in-between has a
heavy influence on which solutions are practical. Problems tend to
be solved at several levels. A typical journey might involve your
legs, your car, an escalator, a moving sidewalk, a jet, maybe some
more moving sidewalks or a tram, another jet, a taxi, and an
elevator. At each of these levels, there aren't many "right
angles", and the whole thing is a bit fractal in nature. In terms
of language, you say something that gets close to what you want to
say, and then you start refining it around the edges, just as you
would first plan your itinerary between major airports, and only
later worry about how to get to and from the airport.
People thrive on ambiguity, as long as it is quickly resolved.
Generally, within a natural language, ambiguity is resolved rapidly
using recently spoken words and topics. Pronouns like "it" refer
to things that are close by, syntactically speaking. Perl is full
of little ambiguities that people never even notice because they're
resolved so rapidly. For instance, many terms and operators in
Perl begin with identical characters. Perl resolves them based on
whether it's expecting to see a term or an operator, just as a
person would. If you say 1 & 2
, it knows that the &
is a bitwise
AND, but if you say &foo
, it knows that you're calling subroutine
foo
.
In contrast, many strongly typed languages have "distant"
ambiguity. C++ is one of the worst in this respect, because you
can look at a + b
and have no idea at all what the +
is doing, let
alone where it's defined. We send people to graduate school to
learn to resolve distant ambiguities.
Punctuation by prosody and inflection
Natural language is naturally punctuated by the pitches, stresses
and pauses we use to indicate how words are related. So-called
"body language" also comes into play here. Some of this
punctuation is written in English, but much of it is not--or is
only approximated. The trend in recent electronic communications
has been to invent various forms of punctuation. :-)
Some computer language designers seem to think that punctuation is
evil; I doubt their English teachers would agree.
Disambiguation by number, case and word order
Part of the reason a language can get away with certain local
ambiguities is that other ambiguities are suppressed by various
mechanisms. English uses number and word order, with vestiges of a
case system in the pronouns: "The man looked at the men, and they
looked back at him." It's perfectly clear in that sentence who is
doing what to whom. Similarly, Perl has number markers on its
nouns; that is, $dog
is one pooch, and @dog
is (potentially) many.
So $
and @
are a little like "this" and "these" in English. Perl
also uses word order: sub use
means something quite different
from use sub
. Perl doesn't do much with case distinctions,
unlike the shells, which make use-vs-mention distinctions using a $
prefix. Though I guess if you allow that, you could count Perl
quotes as a form of case marker. On a slightly more abstruse
level, Perl 5's \
operator is a sort of case marker or preposition
indicating mention rather than use. But as with most computer
languages, prepositional notions are usually expressed by position
within an argument list. (Though it's certainly possible to
write calls using named parameters in Perl, and keys of hashes
sometimes function as prepositions.)
move $rook from => $qr_pos, to => "kb3";
Topicalization
With regard to topicalization, I should point out that this
sentence starts with one. A topicalizer simply introduces the
subject you're intending to talk about. There are several
syntactic forms in English, the simplest one of which is simply a
noun: "Carrots, I hate 'em." Pascal has a "with" clause that
functions as a topicalizer. Topicalizers can sometimes give a list
of topics, at which point you see words like "for BLAH and BLAH, do
BLAH". In Perl, there are various things that work as topicalizers.
You can say
foreach (@dog) { print $_ }
This can even be used singularly:
for ($some_long_name) { s/foo/bar/g; tr/a-z/A-Z/; print; }
Pattern matches (and indeed any conditionals) tend to function as
topicalizers in Perl:
/^Subject: (.*)/ and print $1;
Discourse structure
Discourse structure is how an utterance longer than a sentence is
put together. Different languages and cultures have different
rules for how to tell a joke or a story, for instance, or how to
write a book about Perl. Some computer languages have rather fixed
rules for larger structures. COBOL and Pascal come to mind. Perl
tends to be pretty free about what order you put your statements,
except that it's rather Aristotelian in requiring you to provide
an explicit beginning and end for larger structures, using curlies.
But you could almost claim that #!/usr/bin/perl
corresponds to
"Once upon a time", while __END__
means "And they lived happily
ever after."
Pronominalization
We all know about pronouns and their uses. There are a number of
pronouns in Perl: $_
means "it", and @_
tends to mean "them". (But
$1
, $2
etc. are also pronominal references back to antecedent
substrings in the last pattern match, which we mentioned can function
as topicalizers.) Within a foreach loop or a grep, $_
is not just
a copy of the item in question, but an alias for it. Similarly, @_
is a list of references to the function's arguments, and the
arguments can be modified by changing elements of @_
.
No theoretical axes to grind
Natural languages are used by people who for the most part don't
give a rip how elegant the design of their language is. Except for
a few writers striving to make a point in the most efficient way
possible, ordinary folks scatter all sorts of redundancy throughout
their communication to make sure of being understood. They use
whatever words come to hand to get their point across, and work at
it till they beat the thing to death. Normally this ain't a
problem. They're quite willing to learn a new word occasionally if
they see that it will be useful, but unlike lawyers or computer
scientists, they feel little need to define lots of new words
before they say what they want to say.
In terms of computer languages, this argues for predefining the
commonly used concepts so that people don't feel the need to make
so many definitions. Quite a few Perl scripts contain no
definitions at all. I dare you to find a C++ program without a
definition.
Style not enforced except by peer pressure
We do not all have to write like Faulkner, or program like
Dijkstra. I will gladly tell people what my programming style is,
and I will even tell them where I think their own style is unclear
or makes me jump through mental hoops. But I do this as a fellow
programmer, not as the Perl god. Some language designers hope to
enforce style through various typographical means such as forcing
(more or less) one statement per line. This is all very well for
poetry, but I don't think I want to force everyone to write poetry
in Perl. Such stylistic limits should self-imposed, or at most
policed by consensus among your buddies.
Cooperative design
Nobody designs a natural language by themselves, unless their name
happens to be Tolkien. We all contribute to the design of our
language by our borrowing and our coinages, by copying what we
think is cool and eschewing what we think is obfuscational. The
best artificial languages are collaborations--even with a language
like Perl where one person seems to be in charge of it. Most of
Perl's good ideas were not original with me. Some of them came
from other languages, and some of them were suggestions made by
various folks as we went along. If you consider the language to
include the various cultural trappings (libraries, bin directories)
that go along with the language, then even languages like C, or
Ada, or C++, or even the Unix shells are collaborations by many,
many people. Perl is no exception to this.
"Inevitable" Divergence
Because a language is designed by many people, any language
inevitably diverges into dialects. It may be possible to delay
this, but for any living language the forces of divergence are
nearly always stronger then the forces of convergence. POSIX tried
to unify System V and BSD, and as soon as they squeezed things
together in that dimension, the number of Unix variants exploded in
several other dimensions. The lesson for a language designer is to
build in explicit mechanisms so that it's easy to identify which
variant of the language is being dealt with. Perl 5 has an
explicit extension mechanism for which you specify, using "use"
clauses, which kinds of special semantics or "dialects" you're
going to be relying on. Perl 4 didn't have this, and there was
considerably more pressure to put various things into the language
that didn't belong in the core language. Hopefully now we can
stabilize "basic" Perl so that there is less need to invent
oraperl, sybperl, isqlperl, etc.
Hope you find this useful.
Larry
Translations
Czech translation courtesy of tr-ex.me
Portuguese translation courtesy of Artur Weber
Steven W. McDougall /
resume /
swmcd@theworld.com /
1997 November 27