• THE ARTFL PROJECT
  • PHILOLOGIC USER MANUAL
  • SUBSCRIPTION INFORMATION
  • UNIVERSITY OF CHICAGO
  • ATILF - CNRS

ARTFL Encyclopédie Project - Robert Morrissey, General Editor

  • search form
  • user manual
  • contributors
  • supplément
Home

Table of Contents

  • Editor's Introduction
    • Version Française
    • Encyclopédie Corrections
    • Facts and Numbers
  • Front Matter
  • The Encyclopédistes
  • Research & Archival Materials
  • The "18th" Volume
  • Encyclopédie User Manual
  • Encyclopédie Collaborations
  • Citing the Encyclopédie
  • Development Team
  • Contact Us

ARTFL Links

  • The ARTFL Project
  • PhiloLogic User Manual
  • Subscription Information
  • University of Chicago
  • ATILF - CNRS

Editor's Introduction

Introduction to the ARTFL Encyclopédie
                               - Robert Morrissey

On this, the 10th anniversary of the ARTFL Encyclopédie Project, I am very pleased to present a new version of the digital Encyclopédie and to announce its release to the general public. Undertaking an electronic edition of the Encyclopédie represented a daunting task. Its structure is very complex; the typographical conventions used for textual elements - from article headwords to classifications and cross-references - varied to a significant degree from volume to volume; the relationship between articles and the plate images is in no way clear or systematic. All this notwithstanding, the computer offered a host of new possibilities both for making the work accessible to the scholarly community and for navigating within the work itself. In addition, the digital medium allowed us to think in terms of a "living edition" that could be corrected, developed and improved over time. Our initial choice was to make the work accessible as quickly as possible and progressively to correct it. In order to compensate for the errors introduced during the original data capture process, we chose to make page images of the volumes available for comparison and verification. As we undertook to correct the text, we also strove to improve the search and retrieval capacities. All too often our users limit themselves to simple word and phrase searches, yet these do not always yield the most fruitful results. Using our new search and reporting features can significantly improve the user's ability to move through what Diderot himself described as the "tortuous labyrinth" that is the Encyclopédie. Looking at frequency of occurrence by article or collocation tables, for example, can provide more useful paths into the Encyclopédie than simple word searches alone.

While we have steadily made improvements over the years, this new version marks an important stage in the unfolding development of the electronic edition of Diderot and d'Alembert's monumental work. The author attributions have been verified and corrected; new searching functions have been introduced; new research and archival materials have been made available.  This version includes not only the four volume Supplement to the Encyclopédie, but also the proofs of censored articles and legal documents bound together in the so-called "18th volume." For the first time, our user community will be able to participate in the correction and improvement of the edition by using our "report error" link to inform us of errors they encounter.  All these factors have contributed to our decision to make most elements of this site available not just to the scholarly community of ARTFL subscribers, but to the public at large. In the following paragraphs, I will briefly describe the evolution of ARTFL's digital Encyclopédie.

In the Beginning: Choosing an Edition

From the outset of the Encyclopédie project there were several important editorial decisions that greatly affected the initial construction and dissemination of the database. First, there was the choice of the edition. There were many editions of the Encyclopédie in various formats. We chose the first printing of the Paris edition - see our comparison of Encyclopédie editions. Richard Schwab then kindly agreed to expertise the microfiche version produced by IDC (Leiden, The Netherlands) and confirmed that it reproduces a good copy of the first edition - it was from these microfiches that our contractor performed the data entry of the Encyclopédie. We were aware that many typographical errors had been introduced into the text during the data capture procedure. Unfortunately, due to the size of the Encyclopédie and its great semantic diversity, it was impossible to correct these errors by any normal spell-checking procedure. Additionally, given that fact that all identifications of textual elements - articles, authors, cross-references, etc.- were made using automated procedures based on typographical patterns, we were aware that many problems - unidentified or misidentified articles, missing author attributions, incomplete information about grammatical and knowledge categories, malfunctioning cross-references, etc. - would need to be addressed through a large-scale corrections project. These reservations aside, we thought it best to release a largely uncorrected version of the database and to work on progressively integrating both text and metadata corrections as they were made. For more on our corrections effort, see the Encyclopédie Corrections page.

Author Attributions

One of the most complex problems we encountered in establishing this edition was in properly attributing authors to their respective articles. In the beginning, we simply tried to identify authors automatically using the authorial marks that occur in the text - e.g., (*) for Diderot, (S) for Rousseau, (O) for d'Alembert, etc. - an approach which, while mostly successful, still left many articles unattributed. Articles with multiple authors, unsigned articles, and articles by authors with no authorial mark all posed significant problems for our automatic recognizers. To address these issues we consulted the Schwab Inventory 1) to identify unsigned articles whose authorship was attributed by Schwab and 2) to correct any missed authorship information that was not included in our metadata (see below). The more than 1,500 author attributions to unsigned articles that resulted from this process are indicated by the number "5" after an author's name, e.g., Holbach5, Saint-Lambert5, Voltaire5, etc. For Diderot's articles, we have followed the Hermann edition of Diderot's complete works (Lough and Proust Eds.) in establishing the "Diderot" "Diderot2" and "Diderot3" designations. We have also verified d'Alembert's articles using outside expertise - for more, see our Author Attributions page.  

Database Corrections
 
The new version of the Encyclopédie database (Revision 2.8, 11/2008) contains more than 500,000 modifications made to the original 1998 source files, these corrections were made using a variety of approaches, both automatic and by hand. Over the past few years we have also worked to improve and correct the Encyclopédie metadata - Article Headwords, Author Attributions, Classes of Knowledge, etc. We are aware, however, that many small textual errors - artifacts of the original data capture project - still remain in the database. In an effort to track these errors down users can now use the "Report Error" link at the top right hand corner of the results page to report errors directly to the ARTFL Project. These errors will be collected and applied on a quarterly basis.

Text Corrections - We have corrected errors in the text using a two-step process: First, we completed an automatic recognition/correction process that fixed most of the high-frequency errors, many of which were of the result of the long-s character in 18th-century typography, which was frequently confused with "f" (e.g., semme for femme, etc.). Other commonly misrecognized characters included "er" for "cr" (deseription for description), "e" for "c" (done for donc), and "c" for "e" (cst for est). Then, using our own spell-checking mechanism, we identified possible remaining errors in the text which we then compared to the Encyclopédie page images and hand-corrected. From 1999 to 2006 this process yielded over 450,000 corrections to the database. See our Text Corrections page.

Metadata Corrections - Over the past 2 years we have systematically checked and corrected the Encyclopédie metadata - Article Titles, Classes of Knowledge, Authorship, etc. - verifying our original metadata against Richard Schwab's Inventory of Diderot's Encyclopédie. Any discrepancies in article title, author attributions, class of knowledge, etc. were then checked against the page images of the Encyclopédie and corrected or added where appropriate. To date more than 8,000 additions and countless corrections have been a made - for more, see our Metadata Corrections page. 

Further Corrections - We are aware that many textual errors still exist and invite users to submit any error they encounter using the "Report Error" link at the top of result pages. Moving forward, we will begin to address the remaining structural errors - mis-recognized headwords, etc. - that we have collected over the years. We will also look to correct the Greek characters (now translated automatically from Betacode to Unicode without verification) and to think about the very complex issue of establishing links from plate references in the text to the appropriate plate volumes. In addition, there is the issue of mathematical formulae and various tables. While the text in these tables is searchable, the best way to visualize graphically these elements is by consulting the page images. For the moment we see no coherent way to represent the mathematical formulae in the digitized text. While this may change as technology evolves, presently these formulae are represented only on the page images. 

Cross References (Renvois) - The system of cross-references in the Encyclopédie represented one of the thorniest issues we encountered while establishing this digital edition. From the very outset, it was clear that the renvois were in no way systematically distributed in the original text - i.e., authors would often include a cross-reference to an article that had yet to be written (and perhaps never would be written), resulting in many renvois that lead to non-existent articles or articles with different headwords. We attempted to identify the renvois automatically using typographic conventions ("Voy. ART" at the end of an article for example), leading to some misrecognized links (e.g., an author names and other information at the end of articles that appear as a renvois) which can be corrected. Corrections to cross-references (i.e. misrecognized or misspelled renvois) can be submitted using the "Report Error" link at the top of the search results page. 

For more see our Encyclopédie Corrections page.

New Search and Reporting Capabilities

The Encyclopédie database uses a modified version of the ARTFL Project's full-text search and retrieval engine, PhiloLogic. With this new version comes several new search and reporting features such as collocation tables, frequency by headword reports, and a sortable keyword in context (KWIC) function. 

While word and phrase searching still remain the backbone of the PhiloLogic interface, making use of these new reporting features can offer alternate ways in which to navigate the sometimes overwhelming number of word/phrase occurrences that are returned. These reports are especially important to students working on the Encyclopédie and can provide them with more varied paths into this highly complex work. 

The frequency by article report indicates the number of occurrences by article title in descending order of frequency with a link to the article and a link to the occurrences found within that article. For example, if you search for "Newton" you will notice that 45 of the 783 occurrences of "Newton" occur in the article "Wolstrope" - this may seem inconsequential until one realizes all of the biographical information about Newton is found in this article about his home town, a fact which may have eluded some users looking for an article about Newton with a different title. 

Additionally, the context and relational aspect of search terms can be examined globally using the collocation table and keyword in context (KWIC) reports. Collocation tables provide users with a simple way of seeing the words with which the search terms most often co-occur, and the sortable KWIC reports allow users to sort their line-by-line results alphabetically, either to the right or left of the highlighted keyword - both reports can help users move away from examining single word occurrences and towards a broader understanding of term usage over the entire Encyclopédie.  

For a full description of these and other available search capabilities, see the Encyclopédie User Manual. 

    Research and Archival Materials

    In support of the Digital Encyclopédie, the ARTFL Project has begun to build an archive of eighteenth-century documents relating to the production and reception of the work, as well as several chronologies and publication histories. These include several of Diderot's letters from his internment at Vincennes, documents pertaining to the controversial publication history of the Encyclopédie, and a high-resolution version of the Encyclopedic "Arbre généalogique." In bringing these documents together in one central location, we hope to provide our users with convenient access to extensive information that will enrich their research within the work itself. We are constantly looking for new resources to enhance our site and we invite scholars to Contact Us with ideas and materials they would like to contribute.

    The "18th" Volume: A New Resource

    Working with the University of Virginia's Small Special Collections Library, we are pleased to offer, for the first time, online access to Douglas Gordon's famous "18th Volume" of the Encyclopédie. This extra volume, which includes some of the earliest title pages and prefatory material of the Encyclopédie project, also includes some 284 pages of corrected article proofs, comprising 46 articles submitted by Diderot which were presumably censored or altered by the publisher Le Breton before the final printing. The existence of these proofs, along with the collected legal documents pertaining to Luneau de Boisjermain's lawsuit against the Encyclopédie's publishers, has led many to believe that this volume may have belonged to Le Breton's personal collection. We have included both the transcribed text (with indications of what was censored, added, etc.) of the censored articles as well as image links to the page proofs; from the page image interface users can also browse the entire volume. The extent of the censorship varies greatly among the 46 articles, from excised words and phrases to whole paragraphs (see "SARRASINS") and even entire articles, such as  Jaucourt's "TOLERANCE." See the 18th Volume page. 
      

    Future Research and Development

    The Digital Encyclopédie has been at the forefront of ARTFL's current research into data mining and machine learning techniques, serving as a test-bed from which to experiment with new techniques designed to explore large-scale digital collections. These approaches can help us better understand the rich classification scheme of the Encyclopédie as well as the dialogic construction of its content, connected to articles and outside sources through a complex system of cross-references and intertextual relations. Using bayesian classifiers, much like the ones at work in your everyday spam filter, we have leveraged the classification scheme of the Encyclopédie (which in today's computer science and information retrieval terminology would be called its "ontology") to predict the classification of the 22,000 articles that were originally left with no class of knowledge, either by the editors or due to errors in the data entry. We expect to integrate these machine generated classes into the database in the near future, as well as the reclassification of existing classes that the computer suggests may better represent the content of an article. More recently, we have used a variety of text similarity measures (such as the Vector Space Model and K-Nearest Neighbor) to detect the presence of "borrowed" articles from two of the Encyclopédie's Jesuit predecessors - the Dictionnaire de Trévoux and Louis Moréri's Grand dictionnaire historique. Our newest experiment uses sequence alignment algorithms borrowed from bio-informatics in an effort to find discrete text sequences, from several words to entire articles, that occur in the Encyclopédie and earlier works such as Montesquieu's De l'esprit des lois. It is our hope that by expanding these techniques we can come to a better understanding of the intertextual nature of the Encyclopédie, gauging not only to what extent its authors used previous sources, but also how the philosophes were themselves received and appropriated in the decades following the Encyclopédie's publishing. For more on this ongoing research see the ARTFL-PhiloMine bibliography.

    Ongoing Collaborations

    Collaborations have been an important part of the Encyclopédie Project's development, and we continue to welcome any opportunity for further collaborative enterprises in the future. Our most successful collaborations have all contributed to the various elements outlined above - bringing us new resources (University of Virginia and the"18th Volume"); translations and classifications (University of Michigan); contributions to our research and archival material, corrections and editorial advice (CNRS); and collaborative research and development (Stanford University). The collaborative atmosphere of this "living edition" will only increase in importance as this edition of the Encyclopédie will reach a far greater audience. All users are encouraged to think about ways to ameliorate this resource, whether simply by alerting us to errors using the "Report Error" link, or through a more engaged reflection on its development. For more, see our Encyclopédie Collaborations page.    

    Acknowledgements

    None of this would have been possible without the collaboration of a remarkable group of young humanist scholars with considerable technical capabilities. The unique makeup of this team has allowed us to strike a balance between technical innovation, textual improvement, and editorial judgment. I would like to express my enduring gratitude to the entire Development Team for all of their work. 

    • Version Française
    • Encyclopédie Corrections
    • Facts and Numbers
    Version Française ›
    • search form
    • user manual
    • contributors
    • supplément

    The ARTFL Project
    Department of Romance Languages & Literatures
    University of Chicago
    1115 East 58th Street Chicago, IL 60637
    tel: 773-702-8488 | email: artfl[dot]project[at]gmail[dot]com