Département de littérature comparée
Université de Montréal
CP 6128, Succursale Centre-ville
Montréal, Québec H3C 3J7
Designing a viable electronic scholarly journal is a complex endeavour. It meshes a series of objectives that are largely cognitive, institutional and social, with technical means which, in principle, have been designed with no such considerations in mind. Thus, the use of Unix servers and of TCP/IP protocols affects the way one can design an electronic scholarly journal. At the same time, the traditions of scholarly publishing, some of which date back to the 17th century, the need to archive documents that are viewed as durable bricks belonging to the enduring building of knowledge and the demands of the scholars to act as authors so as to reap the benefits of their efforts in terms of career trajectories, all these cannot be easily dismissed. In fact, they should not be.
The document that follows will keep on evolving as long as technology changes, solutions are invented and newer needs are identified. Broadly, it relies on an experience dating back to about 1990; more narrowly, it reflects the work done since the moment when a contract was signed between the Université de Montréal and Industry Canada to design a state of the art electronic scholarly journal that could act as a model for all similar endeavours in Canada and beyond. It is based on all that has been learned in the design, evolution and redesign of Surfaces in all of this period.
The basic philosophy of electronic scholarly journals rests on two premises:
1. All the needs of the user base, readers as well as authors, that a normal, paper and print journal would address, should be answered and respond to as well.
2. All the changes that the new medium is bound to bring about in the very dynamics of scholarly communication should be anticipated to the widest extent possible. Unwanted effects should be avoided as much as possible, but consequences that appear positive should be supported gently. Designing a new medium necessarily involves redesigning the communication system itself over the medium or long range.
With regard to the first point, paper journals claim to fulfill three needs:
Printing articles does help disseminate ideas over a wide territory, sometimes the whole planet. However, time has become of the essence and printing papers, in general, is turning out to be too slow and cumbersome a process for the communication and information needs of scholars and scientists alike. In the latter case (scientists), this is so patently true that e-mail, pre-prints and various other form of communication (fax, phone) support the actual communication of results at the research front. The Los Alamos server of preprints for the high energy physics community that was designed by Paul Ginsparg, is but the most famous example of a series of initiatives that already cover a fairly large number of specialties and even disciplines.
Journals quickly lose much of their usefulness if they are not carefully stored and just as carefully indexed in good, well-designed, indices and bibliographies. In other words, if it is true that the editing work put on any given article adds value to it, as private publishers are quick to underscore, the more discrete work of the librarians and of the bibliographers adds even more value to these publications over the long. In the case of bibliographies, learned societies and librarians themselves have most often shouldered this task.
Organizing and indexing journal articles in a print and paper form are difficult, exacting and costly exercises. The space needed is enormous and keeps growing, leading libraries to various strategies ranging from building new libraries - an option ever more unrealistic in the current economic scene and financial state of public institutions - to the selective disposal, including destruction, of older collections that are deemed to be of little use. The costs involved is also enormous.
Creating and maintaining a collection requires constantly classifying physical volumes, making inventories, identifying lost or stolen documents, etc. This too is very costly. Also, because paper journals are potentially manipulated by many people, the various issues are often bound into volumes, another important expense. Catalogs have to be maintained, updated and they only cover the bare essentials: journal holdings essentially. Individual articles are rarely indexed in a library catalog nowadays because it is too demanding a task.
Bibliographies give access to individual authors and articles of learned journals. However, with few exceptions, they run several months, if not years, behind actual publication dates, constantly leaving the potentially most important section of learned publishing beyond the reach of most readers. Moreover, they are always selective, trying to cover the "more significant" journals either from a global or a local perspective. But what appears significant to someone's eye, can leave somebody else completely indifferent. Moreover, the relative "babelization" of scholarly studies (i.e. specialities growing ever narrower) means that the more esoteric the language used, the less probable it will be indexed outside its own linguistic ambit, if at all.
In short, the paper and print medium does not lend itself gracefully to archival functions; rather, one has to use a fair amount of brute (and costly) force to pummel it, so to speak, into shape and place. By contrast, electronic texts are never removed from their "place" by a user; they are merely copied. They can be indexed, mostly automatically, down to the level of words and they can be catalogued just as easily and in a variety of fashions, all simultaneously present. Through hypertextual links, they can be presented to the end user in a variety of ways that the fixed, hierarchical structure of print could not easily emulate, if at all.
One of the main functions of scholarly printed journals is to provide professional, institutional and disciplinary visibility, as well as recognition and prestige to scientific authors. Complex systems of "pecking orders" are based on the ranking of journals and one's relationship to them. The quality of the editorial board counts for much, of course, but the typography, quality of the paper used, quality of the illustrations, etc. all play their role. In short, a correlation of sorts exists between the professional touch of a learned journal and the authority it commands on the intellectual market. One can indeed make the argument, without pushing it too far, that a truly excellent journal regularly garnering papers from well-established authors will tend to secure a larger number of institutional and individual subscriptions; in turn this advantage will allow to improve the physical appearance.
The prestige question is presently among the most difficult to solve for electronic journals as they tend to be new comers in the field or they tend to be journals that need to mutate because of financial trouble. In either case, one may suspect that they do not yet belong to the select group of elite publications. As a result, their electronic form can connote second-class status. Counter-measures exist that go some way toward alleviating this potential stigma, but it will take time and continuity of efforts to see this crucial point solved satisfactorily. Recent decisions by research councils, such as the MRC in Canada, to grant equal recognition to articles published in electronic refereed journals and traditional print journals is a step in the right direction, but many more will have to be made before electronic scholarly publishing attains full maturity.
The question of legitimacy and authority just sketched is important because it clearly underscores an oft-forgotten fact -- namely that the design of an electronic journal does not merely rest on using some good technical tools. While these cannot be dismissed, to be sure, they alone will not ensure success. A good understanding of various ethos also has to be kept in mind: ethos of the scholar or researcher, the librarians' ethos and the presses' ethos. Traditionally, in the print world, the author has had to deal mainly with the publisher, but very little with the librarian and the latter's relationship to the publisher has been on the whole purely commercial: librarians bought subscriptions that publishers sold directly or through brokerage houses such as Faxxon. With the advent of electronic publishing, a good deal of institutional shuffling lies in the wings. To whit:
Librarians may discover they had better take on themselves the task of diffusing scholarly journals to each other and to their various constituencies.
Authors, as a result, may find themselves dealing a lot more closely with librarians-turned-servers.
Presses and particularly university presses may find themselves negotiating new agreements with their libraries to take advantage of complementary skills within this digitized world that modifies so many things. In this regard, it is interesting to note that the Muse Project from Johns Hopkins involved a close association between the library and the JHU Press. OCLC, of course, has its roots within the library community. Already, several signs point to the real possibility of this transformation of the libraries' role in the future.
The possible realignment of the libraries, university presses and scholars raises a new question: what role should commercial publishers occupy. Converting journals to the digitized media may be accompanied by a shift in the power centres of publishing. In short, taking advantage of the deep changes brought about by the advent of the digital world and a global network such as the Internet, scholars and researchers, allied with their colleagues in the libraries and the university presses, may try to regain a greater measure of control over the production and dissemination of scholarly materials.
On the financial or economic front, essentially no regular financial aid exists to support electronic publication on a stable basis. Those that make room for electronic journals (such as SSHRC in Canada), generally do so inside a budget devoted to all learned periodicals. In effect, this pits the electronic newcomers against the established journals and it does so in the general context of rapidly diminishing resources. Needless to add, this creates a sense of exacerbated competition where electronic journals stand little chance to receive support from the adjudicating committees. All jury members are related to paper publishing and they look upon electronic publishing with great anxiety. It will be interesting to see how policies for the funding of electronic scholarly journals will evolve in the near future.
The present financial situation of our governments is not conducive to the creation of new programs for electronic journals; on the contrary, granting agencies tend to explore for ways of reducing their outlays because their own budgets are being gradually reduced to a fraction of what they used to be. As for editors of existing paper journals, only the direst of economic pressure brings them reluctantly to examine the electronic perspective. Meanwhile, no commercial house has exhibited a viable economic model for electronic publishing that could draw a consensus across all the industry. Certainly Elsevier has not succeeded with its Tulip project. OCLC ran straight into the unexpected development of HTML publishing with the World Wide Web and its Guidon project had to be scuttled. Only the Muse project at Johns Hopkins University is showing some promise by being innovative in the ways in which journals are packaged. In effect, Muse offers to sell bulk subscription to all of its reviews at a rate that appears very attractive compared to traditional rates for paper journals.
Another dimension of the financial question must be broached here. It provides another strong incentive in favour of taking back the control of scholarly publishing within research circles, such as universities, public laboratories and scientific associations. The price of scholarly journals has increased by 140% over ten years, while that of scholarly monographs has increased only by 40%. This discrepancy, although it cannot be ascribed to technical factors, can nevertheless be explained relatively easily. A small piece of news was published in the Parisian newspaper Le Monde on July 20th 1995. Its general thrust was supported by an article that appeared in Forbes in December of the same year. According to both of these articles, Reed-Elsevier maintains a profit margin well above 30% with its scientific periodicals. Let us remember that Elsevier alone, before it joined forces with Reed, already controlled over 1100 titles or about 3 times what Canada publishes. As the average price of science journal subscriptions stands around $ 800.00 US and even $900.00 in physics, it is clear that fewer and fewer institutions can afford maintaining adequate collections for their researchers. But as libraries cut on their subscriptions, they, sensibly enough, maintain the most prestigious, most used journals, those precisely that large commercial companies tend to acquire. The paradox is that the present economic crisis has so far benefited the large commercial publishers; on the other hand, the spread of the documentation placed at the disposal of the researchers by the libraries is constantly decreasing. Yet, scientists and scholars place so much importance to the added prestige accruing to those publishing in the right journal that they compete for access while forgetting the effects on their university or library budgets. As for third world countries, the situation is much graver as these countries find themselves effectively cut off from the journals that count in today's science.
Another important point must also be kept in mind while designing an electronic journal. To the extent that the "medium is the message" as Marshall McLuhan argued nearly three decades ago, we can expect that the use of electronic publishing will achieve results in excess of those pursued, that some of these results will not be anticipated and will bring about consequences that may not be deemed desirable by the community of users. One only has to think about Gutenberg's invention to note that he would have been shocked to learn that text such as those of Marquis de Sade would one day enjoy the consecration of status provided by print. He probably would have been surprised, to say the least, at the thought that letters between individuals could one day lead to the design of periodicals as a way to mechanize a process that had become too slow and limited to fulfill the growing needs of the Republic of Letters.
With electronic scholarly journals, we cannot hope simply to substitute a new technology for an older one and do so without modifying many dimensions of one's relationship to knowledge. The increased speed, the added flexibility in retrieval techniques, the very lability of the digitized text all factor in to produce a situation that has the potential of being radically different from the one we are used to.
For example, the digitized text can be easily and quickly cut and pasted into other texts. It can be modified in almost imperceptible ways. In short, there is no reason to treat it any longer as a hard, permanently chiseled, document. On the contrary, it exhibits fluid characteristics that have to be analyzed carefully. Some look positive; others negative. On the positive side, the text more closely espouses the very dynamics of human-to-human interaction. It no longer limits itself to recording statements and multiplying them, but is part of the recording processes, exactly as when a deposition is taken down in writing by a clerk. The sense of presence is enhanced; intellectual dynamics is more of the essence. In the end, the textual or documentary object appears transformed. All this may sound fine and largely innocuous but scholars and researchers stake their career on well-defined products, that is to say objects endowed with stability relative to time and place. Print, through physically tying text to paper and multiplying the result hundreds of times, managed to harden words. In going digital, texts lose this physical stability, this guarantee of permanence. Again, this may look somewhat inconsequential until we stop and think that the whole idea of the author depends on it.
There is more. Printed books and articles are physical products imposing, or at least reinforcing, a batch-like pacing of production to the intellectual process. Scientists do not publish their research results a paragraph at a time and they no longer publish books either; they publish articles. But the length and general characteristics of articles are deeply enmeshed with its physical, print-based, nature.
In designing a scholarly journal, it is important to begin with a simile of traditional journals to ease the acceptance process in the research community. Not doing so exposes the electronic publisher to fierce resistance and rejection. Authors, for example, need to be reassured that a reference version of their article exists and that it cannot be tampered with. Readers want to be able to cite parts of an article in the familiar way and insist on having volumes, numbers and pages even though all of these units are linked to the print world.
In conclusion, designing an electronic scholarly journal is an urgent task that cuts across a number of distinct, yet inter-related issues. What is at stake ultimately is nothing less than redesigning the communication system of the research communities, a process fraught with many difficulties, many competing interests, many uncertainties. In particular, too narrowly conceived a quest for short-term gains may quickly lead to medium or long-term difficult problems. In particular, if governments, to save money, simply (not to say simplistically) cut down the amount of support to scholarly journals, they may see the best journals of the country migrate abroad, to Elsevier, for example, and then be sold back here with a price multiplied by a factor of 3, 5 or more.
A little like the choices of the technologies that have structured the building and growth of Internet, the technical choices made with regard to the design and growth of electronic scholarly journals will affect the ultimate evolution of the communication system used by researchers. The visibility of researchers, the functions of gatekeepers editors play, the prestige of scientific publications all hinge on the way in which the transition to the digitized medium will be conceived and implemented.
Something like an Internet philosophy has emerged through its short history. A spirit of sharing and of open design has constantly accompanied the growth of the Internet. Implementation has always preceded standardization and technical decisions made by large companies or governments have repeatedly been opposed with success. For example, when it was decided that Unix would not be ported to the new 486 chip by Intel, the Internet community rallied around a young Finnish computer scientist and Linux was conceived and implemented in a surprisingly short time.
These lessons should not be lost upon the scholarly communities, particularly when designing scholarly journals. For this reason, we believe that, to the extent possible, the following, general, principles should be followed:
1. Use public, stable, already widely adopted standards, rather than proprietary ones;
2. Use software that is as open and as cheap as possible;
3. The scholarly, electronic, publishing community should strive to produce a library of open, free, accessible software tools that would allow any research community to publish its results with good results, including the poorest communities.
4. Publishing solutions should be as platform-independent as is possible.
5. To the extent possible, publishing solutions should be able to follow the evolution of technology as gracefully as possible so as to ensure continued accessibility in the years and even decades to come. Scientific and scholarly papers must remain accessible for as long as possible, preferably several decades.
6. The question of available bandwidth should never be forgotten, particularly if one looks for worldwide accessibility. In Africa and several Latin American and Asian countries, bandwidth is severely limited; yet we should be conscious of the informational needs of the scholars from these countries and accommodate them to the best of our abilities.
There are probably more recommendations that could be added here, but the point is to impart a spirit of sharing and openness rather than prescribing the way to do things all the way to the last detail. As a matter of fact, what follows is proposed as a viable solution to grow a journal, and any reader is encouraged to pick and choose what corresponds to his or her own project.
In designing an electronic journal, one must always keep in mind that the overarching criterion to be respected is documentary quality first and foremost and not use of the latest, jazziest kind of technology. It is extremely easy to be mesmerized by technical details. A good paper written on an old typewriter by an author from a country where computers are not widely available is much more valuable than a so-so paper printed on the best paper with the sharpest laser printer. Mr. Birkerts, a rather successful author if we are to judge him by the public reaction to his Gutenberg Elegies, tells us that he still writes by hand or on a typewriter because he feels it helps him to know that changing words will be difficult and will consume much time. He needs to feel that he commits himself to the very words he puts on paper and writing in this fashion forces him to think up his next sentence before he begins to write it. Writing with a word processor, on the other hand, allows for so much more flexibility that some people have been heard to complain that it is hard to come to any kind of closure on the text with these new tools. Everything can be so easily reshaped, reworked, transformed. All this to say that insisting on documents already in some digitized form may well be desirable from the editor's standpoint, but that it does not necessarily correspond to the best interest of a journal bent on quality.
The kinds of computers people use differ from discipline to discipline and from institution to institution. In the humanities and the social sciences, the great majority of the users nowadays tend to use the "Wintel" combination. This means the presently popular combination of an Intel-based computer with some flavour of Microsoft Windows as operating system. A significant minority remain attached to the Macintosh from Apple. In the hard sciences, the same two basic options remain present but with the presence of a strong third voice -- namely some flavour of the Unix operating system. But compared to the situation of several years ago, a general movement toward convergence has emerged, aided by the fact that operating systems have tended to migrate more and more from machine to machine and across various kinds of CPU's.
We shall limit ourselves here mainly to the case of humanities and social science journals, although the case of the hard sciences would not differ that significantly from what we say here. How do we translate the received document in some suitable digitized format.
Basic equipment and software. Funneling various forms of documents into one single format.
We recommend to have access to both a "Wintel" machine and a Macintosh, if at all possible. If not, it is probably preferable nowadays to buy the former than the latter, but nothing more will be said on this count as religious wars have started for less than that. A physical link should be set up between the two machines and translation software such as MacLink Plus should be used. In this particular case, we seem to depend on a proprietary solution as no open solution exists, but pointing to this situation may stimulate groups of users to come up with free, open solutions. Of course, and to the extent possible, MacLink Plus software should be kept as current as possible to translate versions of various word processors that keep changing (because the only way to stimulate the sale of software has been through a largely artificial policy of frantic updates). MacLink Plus appears to the Surfaces team as the best converter available and wherever we have met problems, we have always had good support from the company.
With this simple set up of two (cheap) computers and MacLink Plus, it is possible to receive practically any kind of digitized document, and to convert into one particular format that can be used as a pre-publishing format for text correction and basic formatting.
Few documents are received only on paper these days, and their numbers keep on decreasing, but, should one arrive, it must be scanned and treated by OCR software. This requires further equipment, of course, and a beginning journal may well decide to forfeit this possibility. OCR scanning works better with typewritten documents than with print because it was originally designed for the conversion of older office documents. Scanning typewritten documents, while certainly not the kind of thing one wishes to do all the time, turns out to be quite manageable on occasions and vetting the resulting text is less onerous than what one would be led to believe at first sight. Of course, if the typewritten document includes hand-made correction, then the situation becomes hopeless and manual keying-in of the text will become necessary. Obviously, this is the exact kind of situation that must be avoided
Our experience is that OmniPage for the Macintosh is a solid, quite reliable workhorse for this kind of endeavour. We therefore recommend to complete the set up described above (two different computers with a MacLink Plus translator installed) with the addition of a scanner of good quality as it will be used for graphics as well, preferably endowed with colour capability and OmniPage as software.
In the case of the hard sciences, it would probably be very useful to have a Unix capability, either by dedicating a machine to this operating system, or by using Linux on a Pentium-based Wintel machine. In the latter case, the machine would effectively cover two bases: some flavour of Windows plus the Linux of Unix. We suggest Linux as it is a readily available Unix operating system and because it is essentially free or sold on a CD-ROM for a paltry sum. However, using Linux does require a good knowledge of the Unix operating system - a demanding requirement.
The conversion system should converge toward one basic format used to distribute papers to referees and then to move on to fully formatted papers. In this regard, we recommend MS-Word as it has come to occupy a commanding position in the world of word processors for personal computers. Other word processors are undoubtedly better than MS-Word, but they general falter because they are never completely and transparently compatible with the market leader. There is no need to fight the powers that be when designing an electronic scholarly journal, or rather, if there are reasons to do so, one might as well pick the battles and battlefields carefully.
MS-Word is widely used in many countries and, as a result, using it does simplify the task of making incoming documents converge into one single type of file format. Moreover, RTF (Rich Text Format), provides a convenient means of translating many files into MS-Word. If your authors can save their files in the RTF format, ask them to do so. It may save a few steps upon receiving the file. RTF yields a pure ASCII file. This means these files can travel through the SMTP format of electronic mail over the Internet without any problem. They can also be digested directly by any recent flavour of MS-Word, be it for a "Wintel" machine or for a Macintosh. Recent versions of WordPerfect also sport a RTF capability.
In the scientific disciplines, other standards prevail that must be taken into account. Chief among those is Latex (pronounced La-tech). As Latex was optimized to fit the requirements of mathematical notations in particular, and as the scientific communities rely heavily on Latex, it is probably best to use this format as the reference format. In the case of Surfaces, based as it is in the humanities, it would have made no sense at all to use Latex because this format is essentially unknown in the humanities and most of the social sciences.
When Surfaces began publishing in 1991, the electronic publishing scene was, to say the least, very murky indeed. This explains the extremely pragmatic solution used at the time, based on the recognition that most people in the humanities used either MS-DOS machines (Windows was not yet very credible back in 1991) or Macintosh computers. In the Macintosh world, the situation was very simple: MS-Word dominated the scene without any real competition so that all other word-processors had had to add translation modules that minimally could read MS-Word documents correctly. On the MS-DOS side of things, the situation was a little more complex. At the time, WordPerfect dominated the scene and MS-Word was the upstart. As a result we decided to use WordPerfect as the standard for MS-DOS machines.
Other commercial solutions exist that aim at working across platforms, such as Postscript and Acrobat, both from Adobe. Because they are both proprietary, we have consistently resisted using them, but other reasons weighed against their use as well. Postscript, for example, was designed to solve incompatibility problems between computers and printers. As a result the programmers did not pay much attention to bandwidth restrictions as it is not much of a problem when a good cable links a compute with its printer. Generated files can be allowed to grow to a fairly large size. The situation is quite different, however, when networks with limited bandwidth are involved in the process.
For analogous reasons, we have also eschewed Acrobat. Acrobat can be an attractive solution for a commercial publisher who wants to retain control over its documents so as to sell them more easily. However, it too generates fairly large files, almost as large as Postscript files. Moreover, Acrobat files, (and Postscript files) behave very much like page images. As a result, quoting from a Postscript or Acrobat file cannot be done through a simple cut and paste process. Finally, Acrobat has not achieved the popularity of Postscript. As a result, it could be dropped by the parent company much more easily than Postscript ever could. For all of these reasons, these solutions do not appear suitable for electronic scholarly publishing.
ASCII was then the solution most commonly used by the few electronic journals then in existence (for example Psycholoquy, PACS Review, etc ). Yet, we decided to provide that option only as a "last recourse", stressing that it was not the preferred form of access to surfaces. Two reasons militated strongly against strict ASCII and we now believe we were right in acting as we did.
1. ASCII does not allow for diacritics unless extended ASCII is used. However, extended ASCII, based on 8 bit coding (unlike strict ASCII that requires only 7 bit coding) does not fit within the basic SMTP protocol of e-mail over the Internet. As a result, the only safe way to transmit documents over the Internet indeed is strict ASCII, which means that the extended ASCII has to be encoded so as to appear only as strict ASCII.
2. Articles are meant to be read but modes of reading are many and varied. In the humanities, slow or deep reading, pencil in hand, remains crucial so that we can expect readers to print their articles on paper. If they print, there is no reason to forfeit more than five centuries of experience in this form of reading ergonomy otherwise known as typographic art. In other words, printers and publishers have painstakingly developed wide varieties of fonts not only to enhance the visual elegance of documents, but their legibility as well. It also provides an aura of professionalism to the publication that reflects positively on its legitimacy and prestige. Appearance is medium-independent.
Notwithstanding what has just been said, ASCII files should be maintained all the same. Some users, because of their equipment, cannot handle anything but ASCII texts on the networks; others do not know how to handle anything else. The decision is not offered without s degree of ambivalence as it runs contrary to the basic philosophy of electronic publishing presented here. The typography is awful. The formatting is minimal. Languages such as French, Spanish, Portuguese and German lose their diacritical signs. Yet, it provides an easy access to the text and its notes in a way that becomes truly platform and software independent. It also does so while sparing bandwidth - a point that may be crucial in some parts of the world. Practically all computers can interpret ASCII correctly, as do all word processors.
To summarize, the received articles, once they have gone through evaluation and revisions, are formatted with any proprietary software suitable for this purpose. In this regard, common, high-level word processors, such as MS-Word or WordPerfect, are probably sufficient. Some may prefer software more oriented toward page composition, such as Quark Express or Page maker. This is largely a matter of taste and of the strategy to lead that text up to on-line publishing. If one uses the normal World Wide Web, many tools exist that allow moving from these proprietary formats to the current level of HTML. However, we will argue that a better, more rational route exists, involving the use of a standardized SGML DTD and will address this question of choice again a bit later. Once this document is ready, the end of each page is then marked within the text and it becomes the reference digitized copy.
This apparently inconsequential step is actually quite important because the displays provided by various machines or pieces of software do not necessarily maintain a coherent description of a given "page". Lines may gradually drift from one page to the next, leading to some ambiguity as to the place of a given phrase or sentence. When citing, scholars need a firm "location" and they are used to a reference system that is no finer than the page unit. For this reason we have come to distinguish "pages" marked within the text and used for citation purposes, from folios that are numbered automatically by the word-processing software according to the size of the physical sheet of paper used (A4 in Europe, for example).
The pagination/foliotation distinction that was just drawn is sometimes criticized as keeping too close to the print journal model. However, it is not so much the print journal that is preserved here as are the users' habits. To the largest extent possible, we feel that the user should feel at ease, at home so to speak, with the new medium. Moving from a paper volume on a shelf to a digitized volume on a server is enough of a jolt for most academics these days. Mastering on-line skills is not particularly exciting to them, but it can be quite daunting. Besides, there is no real need to make a strange environment even stranger. Asking academics to change habits that have been drilled into them years ago, and telling them they must henceforth quote by line, or even character number, is asking for trouble. The primary end of an electronic publisher is not to perturb cultures gratuitously; on the contrary, his or her main objective is to ease users into a world that, in any case, will be vastly different from the old.
This is in effect the heart of this whole document. Here we propose a path to scholarly publishing that can reconcile the accessibility of the web with the stability and the quality of tools that scholarly publishing needs to ensure a degree of durability to its documents.
Although invented in 1989 by CERN physicists, the hypertextual, distributed system of virtually gathering documents named the WWW (World Wide Web) did not really take off until some computer scientists at the NCSA (National Center for Supercomputing Applications) developed a viewer capable of composing multi-media screens from the WWW-encoded documents. The marriage of HTML encoding with Mosaic, the multimedia browser turned out to be explosive. Among other results, it quickly displaced Gopher that had been another success story since 1991. It also imposed itself as the favoured mode of (self)-publishing on the Net.
Let us recall that WWW works with documents that are tagged in a special way. This tagging is based on a series of rules that are known as HTML (Hypertext Markup Language). HTML actually is nothing but a simple DTD (Document Type Description) organized according to meta-rules of tagging known as SGML (Standard Generalized markup Language). When people say they know HTML, but not SGML, they show how little they really know about HTML
The very success of HTML led many users to express frustration with the limitations of a DTD that had been originally designed for simplicity's sake rather than sophistication. The need for order led to the creation of a WWW Consortium in charge of leading and controlling the evolution of HTML. However, the ambitions of companies such as Netscape and Microsoft led them to try and attract users to their side through the use of non-standard extensions to HTML. The end result was a difficult agreement that led to accelerating the pace at which HTML evolved. In less than three years, we have already moved from version 1.0 to 3.2.
The very rapid pace at which HTML evolves reflects the intensely competitive context of web developments. It allows to transpose the philosophy of frequent updates that has so well served the software industry to the world of web publishing. It allows forcing web servers to update their presentation constantly for fear of being outdistanced by a competitor. However, it does not serve the needs of a community far more sedate and stable in its ethos - namely that of scholars and scientists. Such people innovate with their minds, not with their appearances!
On the one hand, the emergence of HTML was greeted by enthusiasm in many quarters since it bore the promise of a unified solution to the old question of publishing on the Net in a way that would be truly platform independent. The burden of dealing with various platforms was moved to server and browsing software. Documents could be encoded once and for all in HTML. However, the success of the HTML solution was immediately marred by its instability - a somewhat paradoxical situation which is rarely pointed out.
For scholarly journals, HTML's instability means that every so often retrospective markup must be done to bring back the older volumes up to snuff. The burden of doing so obviously is unbearable. But this is not all. Because HTML is a relatively simple DTD, the print appearance of an HTML document is not entirely satisfactory either. To be sure, it is much better than pure ASCII, but it is not as good as what can simply be achieved with a simple, proprietary, word processor. Once again, we refer back to a vision that is central to this whole undertaking - namely that the design of electronic scholarly journals should not neglect the possibility to print. Furthermore, the printed version of an electronic journal should try and approach the quality of a printed journal that is moderately well produced. We are not calling here for perfection and fancy results, simply for a pleasant, solid, professional-looking layout with a choice of attractive fonts.
At the same time, the success of WWW precludes eschewing it altogether. In other words, a way had to be found to take advantage of its wide accessibility without falling prey to its potentially crippling frenzy.
This leads us to three recommendations:
Scholarly journals should use a much more powerful DTD than HTML as their pivotal publishing tool.
This more powerful DTD should be the ISO 12083.
Simultaneously, we recommend publishing the HTML version out of the more powerful SGML DTD, possibly on the fly and on demand, through a program that could be updated with each change in HTML.
Before proceeding further, it is necessary to justify the choice
of ISO 12083 a little. Many powerful DTD's exist, some proprietary, some open (like the "Text Encoding Initiative" (TEI) for example, optimized for the editing of literary documents).
As its title indicates, the chosen DTD has been granted the standard
of a status by the International Standards Organization. It has
been adopted by various organizations such as the American Association of Publishers, the American Physics Society, the Électricité de France, etc. In short, it is a solid standard that will no
disappear tomorrow, that is endorsed and used by a growing number of powerful organizations, including many from the print world, that will not evolve wildly either.
1. Steps toward SGML publishing with the ISO 12083 DTD.
The section that begins here is the one requiring the greatest attention as it involves a little bit of technical details. We have decided to bite the bullet and offer a preview of what SGML publishing entails, both to show that it does require a bit of time investment and a bit of technical upscaling, but that, at the same time, it is not all that complex. The point of that section is that if you understand what HTML publishing entails, then moving up to SGML is simple. If you do not understand what HTML publishing entails, then you cannot do serious HTML publishing anyway. While it is very simple to put a web page based on the copying of some other person's recipe, serious scholarly publishing does require a bit of preparation, even in HTML.
A fuller SGML DTD is not going to be all that much more complicated to handle than HTML, especially if we approach this task in a collaborative fashion, drawing on all the bits and pieces of skills that are spread all over our campuses and university presses. To be sure, building a complete chain of production exclusively based on SGML and using SGML-oriented tools could prove taxing, not so much because of SGML, but rather because these are tools designed for a small category of specialists. As a result, concern for ease of use and intuitive handling has not been as strong as with products destined to reach a wide audience. However, tools exist that permit creating good SGML documents with relatively simple means. Here is the path we recommend following as a first approach to the problem.
1. Once files have been transformed into a MS-Word format, the texts are analyzed as to their structure and styles within MS-Word are designed to correspond to the kind of analysis done on the texts. To do so, it is useful to refer to the on-line help within MS-Word.
As an example, here is a series of styles that could be useful for scholarly articles and that could be easily adapted to any particular application:
This simple example clarifies greatly what is both meant by document analysis and by style. To each style is linked a definition of margins, of indentation, of justification, of font name and size and its characteristics (bold, italics, etc.) and so on. In effect, by calling the name, we call a formatting environment.
Structuring a document in this fashion is quite important because it provides a rational organization of the document that can be automatically converted later into the corresponding DTD. It is in fact a crucial first step which, by the way, would also be very useful for good HTML conversion.
2. The next stage involves building a template for the SGML DTD. Here is an example of such a template for an article.
REFERENCE TO A CITATION INCLUSIONS
REFERENCE TO A FOOTNOTE:
REFERENCE TO A NOTE
<!DOCTYPE article PUBLIC "ISO 12083:1993//DTD Article//EN">
<emph type=ìnî> texte </emph>
1 - bold
2 - italics
3 - bold and italics
4 - underlined
5 - non-proportional
6 - small caps
<p> ... </p> </footnote>
<p> ... </p></note>
REFERENCE TO A CITATION
REFERENCE TO A FOOTNOTE:
REFERENCE TO A NOTE
All this may look daunting at first, but those familiar with HTML will recognize the beginning and end tags, < >, </ > that bound a particular bit of text. Indented tags refer to a tag embedded within another tag, and so on. Most tags are self-explanatory and it will not be difficult to see how such tags correspond to the logical categories that form the basis for the Styles within MS-Word.
3. Adapting ISO 12083 to one's own needs.
One more obstacle must be overcome before proceeding. ISO 12083 is a general DTD and one has to apply it to one's particular case, as has been done above. It is possible to modify ISO 12083 to make it conform to one's needs, but this path is not recommended as it amounts to destroying the norm and to build one's own. It is better to stick with the norm, but to list modifications within the document type declaration subset that is found immediately after the formal public identifier between the [ ]. In this fashion, the DTD will be adapted to one's special needs, yet conform to the general ISO 12083 norm.
As an example, we provide the DTD subset designed for Surfaces articles:
<!-- Le sous-ensemble de déclaration de type de document (document type declaration subset)
qui suit comporte les modifications nécessaires à la DTD ISO 12083:1993 pour son application à la
revue Surfaces. Ces déclarations ont été faites par Guylaine Beaudry (firstname.lastname@example.org)
en juillet 1996. -->
<!ATTLIST bq type NAME #IMPLIED>
<!ENTITY % a.id "id ID #IMPLIED" -- ID attribute definition -->
<!-- FIGURE -->
<!ENTITY surfaces SYSTEM "../../surfaces.gif" NDATA GIF -- Logo Surfaces -->
<!DOCTYPE article PUBLIC "ISO 12083:1993//DTD Article//EN"
<!ENTITY % bib "(author|title|corpauth|msn|sertitle|location|date|pages|subject|othinfo|editor|publishr|pubplace| volid|confname|confdate|URL|series|emph|#PCDATA)*" -- bibliographic, date is the publication date -- >
<!ELEMENT (editor|publishr|pubplace|volid|confdate|series|URL) - - (#PCDATA) >
<!ENTITY % m.bib "(no?, title*, (%bib;)*)" -- bibliographic entry -- >
<!-- Ces modifications permettent de décrire de façon plus précise les références bibliographiques -->
<!ELEMENT biblist -o (head?, citation, pages*)* >
<!-- Cette modification permet d'intégrer l'élément "pages" dans la bibliographie -->
<!-- Cette modification permet d'identifier les épigraphes -->
<!ATTLIST section %a.id;
SEPA NAME #IMPLIED
SDABDY NAMES #FIXED "title h2"
SDAPART NAMES #FIXED "title h3" >
<!--Cette modification permet d'intégrer des séparateurs -->
<!NOTATION GIF PUBLIC "+//ISBN 0-7923-9432-1::Graphic Notation//NOTATION CompuServe Graphic Interchange Format//EN">
<!ENTITY % ISOlat1 PUBLIC "ISO 8879:1986//ENTITIES Added Latin 1//EN" >
<!ENTITY % ISOlat2 PUBLIC "ISO 8879:1986//ENTITIES Added Latin 2//EN" >
<!ENTITY % ISOgrk1 PUBLIC "ISO 8879-1986//ENTITIES Greek Letters//EN">
<!ENTITY % ISOnum PUBLIC "ISO 8879:1986//ENTITIES Numeric and Special Graphic//EN" >
<!ENTITY % ISOpub PUBLIC "ISO 8879:1986//ENTITIES Publishing//EN" >
<!-- Le sous-ensemble de déclaration de type de document (document type declaration subset)
qui suit comporte les modifications nécessaires à la DTD ISO 12083:1993 pour son application à la
revue Surfaces. Ces déclarations ont été faites par Guylaine Beaudry (email@example.com)
en juillet 1996. -->
<!ATTLIST bq type NAME #IMPLIED>
<!ENTITY % a.id "id ID #IMPLIED" -- ID attribute definition -->
<!-- FIGURE -->
<!ENTITY surfaces SYSTEM "../../surfaces.gif" NDATA GIF -- Logo Surfaces -->
Once more, the notation may look a bit daunting, especially to those who have never even looked at HTML coding. However, pick up the lines beginning with <!-- and followed by French text. These indicate the function of these additions, such as making bibliographies more precise, identifying epigraphic elements, etc... The point of the exercise is to show that it is feasible and can be imitated quite simply by copying the model, studying it a bit, studying a bit of SGML and adapting anything that needs to be adapted to one's own particular situation. With the development of several journals in SGML ISO 12083, potential users will have access to a variety of adaptations that should help design one's own according to the particular needs of a particular journal.
Note in passing that SGML solves the language question once and for all. One only has to add the character sets one needs to have them available. For example, the Greek alphabet can be included without having to paste in images of Greek words and, as a result, these Greek texts can be later searched as character chains, just like any other character chains. In effect, publishing in SGML immediately opens up publishing possibilities well beyond ISO Latin 1, the current standard level of HTML publishing. This means that most alphabetic languages are included without difficulties, in particular Greek and Cyrillic.
At this stage of the development of SGML publishing, let us summarize the main steps that must be followed.
1. Analyzing the documents to be published and translate the various parts of these documents into Style within MS-Word.
2. Rework the same elements using the parameters of ISO 12083.
3. Adapt ISO 12083 without losing the norm, by using a document type declaration subset. This step is required because ISO 12083, for unknown reasons, is a bit poor on descriptors of bibliographic references.
4. Translate the word processing styles into the descriptive elements of ISO 12083.
We have not yet approached this particular step, and it obviously a very important step. The solutions adopted appear to depend very much on the scale of the operation as the costs involved can vary greatly. For a couple of journals, we recommend saving the structured MS-Word document in RTF (Rich Text Format). The RTF version is then transformed into ISO 12083 through a series of macros written in WordBasic, the programming language of MS-Word. The best way to proceed is to record the particular maneuvers needed to insert the relevant markup tags and label it accordingly. Once this is done, one can examine the programming text itself by selecting the recorded macro and clicking on edit in the MS-Word menu. The WordBasic programming sequence can then be read directly. From there it is relatively easy to refine these macros by adding some loops where needed and by examining the conditions under which this or that is to be executed.
For larger projects involving a good number of journals, there exist tools that do the conversion from structured documents to a SGML template, such as FastTag by Avalanche Labs (related to Interleaf). However, this software is pricey ($6,000 CDN) and it does require programming routines to move from one format to the next. FastTAG claims that the time devoted to markup can be reduced by 95% but this would have to be tested. This said, the State of Oklahoma is using FastTag to manage its legal documents in SGML and to convert various proprietary formats into SGML. The point here is not to push FastTag for larger projects, but simply to point out that some private vendors have already identified this niche as a potentially interesting market. This means that other tools will appear and prices will come down. However, a better route for university presses would be to band together and develop an open, free conversion tool that could swallow a variety of structured documents and translate easily into SGML templates. A distributed, collaborative project of that nature, similar in scope to the Linux project for UNIX, would ensure the existence of powerful and cheap (in fact free) tools. We will later suggest a similar strategy for browsers, not to undercut the playground of private companies, but rather to provide a set of basic tools that could, on the contrary, help define a market where instruments with bells and whistles could later be developed. Such strategies have been witnessed before, viz with Stuffit and Eudora and could be repeated here.
Once all of this is done, it is time to dirty one's hands with a few sample texts and prepare a stock of well chosen verbal expletives to cushion all possible feelings of frustration... :-)
We recommend you take three to five texts that will include all of the needed characteristics needed to describe the widest variety of the documents you will be dealing with. In this way, you will be able to test your adaptation of the DTD to your needs.
You proceed with the markup of your documents with the appropriate macros and you finally save everything in text format (i.e. ASCII), adding to the file name the extension .sgm.
Once the markup process is completed, it is important to check whether this process has been done correctly and to do so, one uses a parser to validate the resulting tagged document. In the case of Surfaces, we have used SP, a public parser designed by James Clark and working in a DOS environment. SP and its documentation can be found at: http://www.jclark.com/sp.html. To validate a file named adams.sgm, you place it with the DTD in the same directory as SP and you use the following command:
nsgmls.exe -fadams.err adams.sgm
nsgmls.exe calls up SP. -fadams.err requests SP to write a file named adams.err describing the mistakes found in the markup process. adams.sgm, of course, is the name of the file being checked. The mistakes identified by the parser must be corrected, but it will be first necessary to check whether they come from modifications you have made here and there, or whether they have anything to do with the way in which you have adapted the DTD.
Now comes the time of rewards! After all these steps, you reach the point when you can visualize the result. Visualizing a correctly organized SGML document can be done on-line with an SGML browser. Presently, few exist, but more are appearing all the time. We have used Panorama Pro from Softquad. We are just a few inches away from the result.
Panorama Pro offers you the possibility of adjusting various visual elements with a style sheet that will determine the appearance, size, colour and a few other characteristics of the documents to be visualized. The style sheet is a file with a .ssh extension. Likewise, you will edit a navigational tool that will appear in a small window to the left of the main document window. The navigational tool is a file bearing the extension .nav. Most of the time, the navigational tool will hold a table of contents of the articles published, and it can include a list of sections and sub-sections that will vary from one journal to another according to its interest in this or that element of its visual presentation. Note in passing that several navigation tools can be associated to a single document, for example to offer a separate access to graphic elements or to tables.
It is to be expected that SGML browsers will all offer analogous functions, but in varying fashion. In the case of Panorama Pro, the user is referred to the manual. Style sheets and navigational tools are well explained in it.
It is now time to look at the result and check whether everything does appear as intended. This phase of visual inspection is very important as the product inspected is the one that will be encountered by your readers. The professional image of your publication will be very much dependent upon the care with which you will do this. Note in passing that this step would have to be done in HTML as well, for the same reasons. In effect, each article must be proofread through Panorama or a similar SGML browser. In the print world, this step roughly corresponds to checking the galley proofs.
2. Producing HTML from SGML files.
One of the great advantages of publishing with SGML is that it allows to produce HTML versions very simply. All that is needed to do, if one uses Panorama, is to organize a new style sheet after saving the text presented to the screen in its ASCII format. This trick was invented by Guy Teasdale, a librarian at Laval University and we should want to thank him for allowing us to use it and publish it.
With Panorama Pro, a character chain can be added before and after every element of the document (see section 13.9 of the Panorama Pro manual). This capability is used to add HTML markup before and after the elements that were already identified within SGML For example the tag <TITLE> found within the element <TITLEGRP> could be preceded by the tag <H1> and followed by </H1>. Furthermore, recall that you can assign a style related to other elements, etc. In this fashion, it is quite simple to build interesting HTML documents.
Once the style sheet has been edited, every text is recovered from within Panorama Pro and saved in ASCII form with the extension. htm. As for the current specifications of HTML, one should follow those found at the following URL: http://www.w3.org/pub/WWW/MarkUp/.
Ultimately, the best solution would be to design a translator that could be easily updated even as HTML evolves. No HTML text would be produced and stored ahead of time. Only when a request would be made would the HTML version be produced on the fly. This would both save storage space, save production time and would greatly rationalize the translation from ISO 12083 to HTML. Once again, software already exists to do so, such as Dynaweb from EBT, but the cost of such software limits it to undertakings that can justify the use of such costly tools through volume production. Again, an open, collaborative project between universities would be very useful and would simplify the universities' task to publish electronically on a rationalized basis.
3. Configuring the web server for SGML.
Unless you control or own your own server, the system administrator of your server must be involved before placing your documents on-line.
To configure the NCSA Web http for SGML, you must edit the mime-type file. Normally, this file is found in the /usr/local/lib/httpd/mime.types.
Find the line:
Immediately underneath, insert:
text/x-sgml sgml sgm
To configure the servers CERN Web http for SGML, you must edit the configuration file. Normally it is the httpd.con file located in the server directory.
Add the following two lines:
AddType .sgm text/x-sgml 8bit
AddType .sgml text/x-sgml 8bit
These details are also found in the Panorama Pro manual in section 9.1
4. Editing the catalog and entityrc files.
If you want to have all the files related to the reading of a SGML file to be linked together, you can proceed in two ways.
a. The first way makes use of the document declaration subset once more (the part located between [ ] at the beginning of the document. In it you declare the names of the .nav and .ssh files and where they are located. Although easy, this approach may pose problems over the long range because if you decide to add or remove one of these files, you must then remove this declaration from all the texts concerned by this transformation, which can be a fairly tedious and time-consuming task.
b. The second way is preferable. It involves using the catalog and entityrc files.
Instructions to declare these files are found in section 11.3.1 of the Panorama Pro manual. However, be careful because there is a mistake in this section. What you find in it is:
<?STYLESHEET "style sheet name" "style sheet system identifier" >
It should be:
<?STYLESPEC "style sheet name" "style sheet system identifier" >
For the navigator, declare it as follows:
<?NAVIGATOR "navigator name" "navigator system identifier">
Here is how the second solution works:
The catalog file links up the formal public identifier and located at the beginning of the document and the location as well as the name of the file containing the DTD. The entityrc file, declares the style sheets and navigators by naming and locating them. These two file must place within the same directory where you find the .sgm files that contain the published texts.
Imagine a reader clicking from a Netscape browser. The .sgm file will be downloaded and placed within the Netscape cache. Panorama then will instruct Netscape to go and fetch the catalog file to identify and locate the DTD. This DTD will then be downloaded in the Netscape cache. The same process will accompany the downloading of the entityrc file and the .nav and .ssh files. With these files, Panorama will be able to display the files as intended, i.e. with the intended layout, typography, etc. All this takes but a few seconds and is done automatically and transparently as soon as Panorama is linked to Netscape as a plug-in. Ways to edit the entityrc and catalog files are detailed within the Panorama manual in sections 11.1.and 11.2.
viii. Setting up the directory structure of the files to be published.
Now that all the files are duly tagged and ready, you must create the tree structure of the journal. In the case of Surfaces, here is the basic structure:
The files surfaces.nav and surfaces.ssh as well as the DTD are located within the sgml directory. The .sgm files (i.e. the articles themselves), as well as the catalog and entityrc files are found in the sub-directories bearing volume numbers and within the general sgml directory. As for the .htm files (articles in HTML) are located in the volume number sub-directories that are located within the general directory Surfaces.
5. Uploading the catalog, entityrc files, and the article files.
The catalog and entityrc files must be uploaded and copied within each of the directories that will harbor .sgm files. Then all the .sgm and .htm files will be uploaded in their appropriate directories.
All that remains to be done then is to test all the links within the site and you are home.
As in the case of texts, graphic documents should preferably stored in open, non-proprietary formats. We will review here the two main formats that can be used to store any graphic document, GIF and JPEG.
GIF or Graphic Interface Format was originally conceived by CompuServe to allow for the transfer of graphic elements, particularly photos, independently of the platforms used. Although a proprietary format, it has acquired, de facto, the trappings of an open standard. It is an 8 bit coding protocol, meaning that each pixel of the image can be defined in any of 256 ways (28). It is also its limitation as GIF effectively limits all graphics to 256 colours. However, for most practical purposes, this is already quite satisfactory. A proprietary compression scheme (that belongs to Unisys corporation) allows to bring down the size of each image to about half of what it would be without its presence.
More recently, a new format, this time completely public, JPEG (Joint Photographic Expert Group), has surfaced. Unlike GIF, it can support up to 24 bit colour, i.e. over 16 million colours. Its compression scheme is not the usual LZW scheme (found in Zip and Stuffit software, for example). Rather, it relies on identifying and removing low priority information from the stand point of the eye (so-called "lossy" compression).
JPEG has a number of advantages over the GIF format. Once again, it is entirely in the public domain. JPEG is better for colour photographs. JPEG files are also generally smaller than GIF files. On the other hand it is a little slower to decompress and at higher compression levels, quality begins to deteriorate noticeably.
Transferring the GIF format to the JPEG format is remarkably simple to achieve thanks to a very good shareware called GraphicConverter. This shareware is readily available on all good sites harbouring Macintosh software, such as Info-mac. It is of German origin (its author is Thorsten Lemke) and it costs US$ 35.00 to register and own legally.
The attempts at creating integrated texts with photographs have been achieved easily but have turned out to be essentially uninteresting. With the advent of HTML and SGML forms of publishing, this question of integrated documents has been entirely transformed.
The question of animated sequences is another point that we can only approach theoretically at this stage as no author has offered articles corresponding to this particular kind of capability.
Essentially three standards presently compete for animated sequences: QuickTime that emerged from the Apple Macintosh community but that has gone cross-platform to a fair extent. AVI that finds support in the MS-Window community and no one can deny the great importance of that community. Finally MPEG was designed as an animated extension of JPEG.
SGML can easily integrate sound and animated files. However, such files are more part of the future developments of scholarly journals than part of their present needs. As we develop expertise in these areas, we will share them with the community.
The web site for Surfaces was structured and it can be observed at two different URLs.
The importance of mirrors cannot be underestimated. They can fulfill at least three functions.
1. Bandwidth is a precious resource and in some parts of the world, it can be very rare and costly indeed. In other parts of the world, bottlenecks exist. For example, good bandwidth is generally available in the USA and increasingly in Canada as well. Likewise, bandwidth is improving all the time inside Europe. However, until extremely recently, all the connections between Europe and North America transited through very narrow pipes indeed. Measurements done between Paris and Montreal by the "Traceroute" technique have yielded disastrous results, nearly half a second between the computers in the RENATER offices in Paris and the computer presently harboring Surfaces at the Université de Montréal. Jacques Prévost, technical director of RENATER commented upon this result in the following manner: such delays pretty well limit communications to the e-mail level. And indeed, trying to access Surfaces through W3 with a graphic-capable browser, rapidly turned into an exercise in fortitude. Establishing mirrors contributes to savings in bandwidth and much better access time locally.
2. The existence of mirrors also provides increased safety for an electronic publication. Not all sites can fail at the same time and not all sites will succumb simultaneously to a wave of destructive hackers. The head site of Info-Mac was raided a few months ago and its site was destroyed. However, thanks to the existence of half a dozen mirrors around the world, Info-Mac was back in business within less than 24 hours. The same logic applies to any electronic journal.
3. A more hidden advantage comes with mirror sites. Sites change equipment at rates that vary from place to place, which means that the digitized documents find themselves stored on a wide variety of platforms ranging from relatively old to state-of-the art. As a result, as sites modernize, Surfaces' article find themselves located on new and different media. Thus, the digital foundation of the documentation stored here and there is regularly refreshed through the uneven evolution of sites across countries and continents. At the same time, older equipment give access to types of media that may have been abandoned on newer machines. This provides a measure of temporal flexibility for documents that, otherwise, could rapidly be legible only with near state-of-the-art equipment.
i. We will carefully distinguish here between controlling text to ensure a better economic return on the investment and guaranteeing the integrity of a text.
ii. One of the great worries of the authors when they publish their papers electronically relates to the integrity of their text. To be sure, the site security ensures that anyone taking a text from the site itself can be assured of having the real thing and not some (potentially maliciously) manipulation of this text. For this reason, we have always recommended our users to refer back to an original copy from the Surfaces site itself when it comes to citing any part of a text into another text. Should any problem arise even with this method, we keep copies of all of our articles off-line against which the on-line files can be proofed, should the need arises.
Although in a real sense the question of text integrity is largely solved by the solution above, there is something like a dimension of trust or of faith, something more psychological than rational, that is also at stake here. The user of an electronic article must feel as confident with his file than the owner of a printed article can feel confident with the material pages of the print world.
In a first analysis of the problem, we had imagined creating an algorithm, based on a checksum of some kind that would be embedded in the text and a small piece of software was supposed to allow for a quick and easy check of the fact that no one had tampered with the file at hand. However, as is often the case in such circumstances, events overtook us and a much more secure and, at the same time, general solution has emerged which is the one retained at Surfaces. It relies on public key encryption; it uses the famous software PGP (Pretty Good Privacy) that Phil Zimmermann released a while back on the network, a decision that brought him a fair amount of trouble until the US Government dropped all charges against him last January.
The idea is the following. The publisher of the journal and only he or she, owns a private and a public key. If a text is encrypted with the private key, only the public key can decipher it and, as a result, the ability to decipher such a document amounts to the certainty that this document is authentic. To tamper with public key encryption would take enormous resources in machines, enormous talent and at least fifteen years of arduous work. In short, it is nearly inviolable.
Consequently, all the ASCII versions of the articles of Surfaces are in the process of being encrypted. The public key of Surfaces will be placed in its site and the mirror sites as well. As a result, anyone feeling distrust with the system will be able to pull out the encrypted file and decrypt it herself, knowing that neither the text, nor the key can be tampered in any way in such a scheme.
Expectations are that this system will rarely be used, but that it will stand as a visible guarantee of the integrity of digitized texts. As such, it will affect the public perception of e-journals because of the high symbolic value of such a strategy. As PGP is available in a variety of versions for practically all the majors kinds of computers, this solution solves another problem which the checksum scheme outlined above would have had to address too.
In designing a search engine, it is necessary to be able to use the cgi-bin technology. However, this technology is ticklish as it can create security holes that system administrators dread like the plague. As a result, in a number of sites such as the Université de Montréal, cgi-bin is essentially off limits. As a result, before embarking on the design of a search engine, check with your system administrators what the local procedures are for the use of cgi-bin.
SGML is very useful to design a search engine as the tagging allows to search not only by words, but also by document functions. One can look for a word, but limit the search to titles. Several tools exist to index full SGML texts, but again, the task is quite different if one wants to index only one journal or many. In the latter case, it is almost unavoidable that index files will be created to speed up the search process. We are in the process of implement SGrep, for Surfaces, but this is a solution suited only for individual journals as SGrep does not build index files and, as a search engine, would become very slow as soon as the bank of article begins to grow beyond a few dozen texts.
The main point to remember from all of this is that, at the present stage of history, the best route to follow for publishing an electronic journal is the SGML route (ISO 12083), followed by some form of publishing in the HTML form, either pre-organized (as we do it with Surfaces), or on the fly (as could be done with DynaWeb, for example, or similar other, costly pieces of software). We have provided some details about the main points to follow in this regard, but many details remain to be worked out beyond what has been presented here. Ways to accelerate refereeing could be integrated by the systematic use of electronic, closed forums. Capability to publish in more than one language is another promise held by electronic publishing. Hypertextual structuring, both among articles from the same journal, and between articles from various electronic journals will greatly contribute to reveal the cumulative, but also the polemical, dimensions of research. All of these dimensions are being studied and developed around Surfaces and elsewhere and as we come up with credible solutions, we will add them to the present document. The point is to make this document a growing thing, nourished by the whole community. So let the advice fly, and the criticisms too. The point is not ego building, but building a toll that the community of scholars and scientists will find useful whenever they meet the problem of designing a new electronic scholarly journal.