Learning Histories LO4586

John Conover (john@johncon.johncon.com)
Wed, 3 Jan 1996 17:28:40 -0800

Replying to LO4560 --

GaltJohn22@aol.com writes:
> HyperText is, in a deliberate sense, NOT hierarchical. It is rather a
> "Sparse Matrix" where depth and breadth are supposedly indeterminate and
> irrelevant over time. That is it can grow in all diections or a few or
> several at once with such growth niether planned nor managed.

HyperText has the disadvantage that it requires the data's structural
links to be defined in anticipation of the queries that will be asked
in the future when the documents are stored in an information
retrieval system. In this sense, it is more like a table of contents
than an index. It is a one way street, since once you structure the
data, it is very difficult to change the structure, (and the way you
look at the data, BTW.) A good example of this is the yellow pages in
a telephone phone book; to find something by subject is very fast and
expedient-however, given the phone number of a business, finding the
subject that corresponds the phone number degenerates into an
exhaustive search. The Unix man pages are another good example-if you
know the command you are looking for, it is very easy, if you don't,
it is very difficult.

John

BTW, Hypertext is a rather new name for what was called Memex as proposed
by Vannevar Bush (Bush, V. (1941) "Memorandum regarding Memex," [Vannevar
Bush Papers, Library of Congress], Box 50, General Correspondence File,
Eric Hodgins,) and the issues involved in a priori structuring of the data
(technically known as "content data,") were known and understood at that
time. A better alternative is probably to order the documents by relevance
-- at search time -- as opposed to structuring the documents when the
documents are stored. The technical name for data provided in such a
manner is "context data."

Just in case you were curious ...

References:

James M. Nyce and Paul Kahn, "From Memex to Hypertext, Vanaver
Bush and the Mind's Machine", Academic Press, New York, New York,
1991.

William B. Frakes and Ricardo Baeze-Yates, "Information
Retrieval", Prentice-Hall, Englewood Cliffs, New Jersey, 1992.

(Note: The sources for the many of the algorithms presented in
Frakes are available from ftp.vt.edu:/pub/reuse/ircode.tar.Z
via anonymous ftp.)

Charles T. Meadow, "Text Information Retrieval Systems", Academic
Press, San Diego, California, 1992.

Carol Tenopir and Jung Soon Ro, "Full Text Databases", Greenwood
Press, New York, New York, 1990.

Susan Jones, "Text and Context", Springer-Verlag, London, England,
1991.

Freely available information retrieval programs that support relevance
ordering of documents and are available via anonymous ftp on the
Internet (these ftp addresses are for the program source codes):

Wais, think.com:/wais/wais-8-b5.1.tar.Z.

Lq-text, cs.toronto.edu:/pub/lq-text1.10.tar.Z.

Qt, ftp.uu.net:/usenet/comp.sources/unix/volume27.

Rel, ftp.uu.net:/usenet/comp.sources/unix/Volume 28, Issue 212.

-- 

John Conover, 631 Lamont Ct., Campbell, CA., 95008, USA. VOX 408.370.2688, FAX 408.379.9602 john@johncon.com