Table of Contents
ARTFL Links
Editor's Introduction
Introduction to the ARTFL Encyclopédie
The Autumn 2017 release under PhiloLogic4 offers many new features, functionalities and improvements.
- The powerful new faceted search and browse capabilities offered by PhiloLogic4 allow users better to leverage the organizational structure of the Encyclopédie -- classes of knowledge, authors, headwords, volumes, and the like. Further it gives them the possibility of exploring the interesting alternatives offered by algorithmically or machine generated classes. The collocation searches generate word-clouds or word lists that are clickable to obtain immediately concordances for any of the words.
This release also contains:
- A beautiful new set of high-resolution plate images that can be viewed in clickable thumb-nail versions leading to larger images that can be viewed in much greater detail than was previously possible. We would like to thank the University of Chicago Library for providing these images.
- Biographies of the encyclopédistes directly accessible by simply clicking the name of the author of any given article. This information is drawn directly from Frank and Serena Kafker’s The Encyclopédists as Individuals: A Biographical Dictionary of Authors of the Encyclopédie made available to us as part of our collaborative relationship with the Voltaire Foundation of the University of Oxford.
- Improvements such as some new author attributions, various corrections and better cross-referencing functionality.
Project Overview
Undertaking an electronic edition of the Encyclopédie represented a daunting task. Its structure is very complex; the typographical conventions used for textual elements - from article headwords to classifications and cross-references - varied to a significant degree from volume to volume; the relationship between articles and the plate images is in no way clear or systematic. All this notwithstanding, the computer offered a host of new possibilities both for making the work accessible to the scholarly community and for navigating within the work itself. In addition, the digital medium allowed us to think in terms of a "living edition" that could be corrected, developed and improved over time. Our initial choice was to make the work accessible as quickly as possible and progressively to correct it. In order to compensate for the errors introduced during the original data capture process, we chose to make page images of the volumes available for comparison and verification. As we undertook to correct the text, we also strove to improve the search and retrieval capacities. All too often our users limit themselves to simple word and phrase searches, yet these do not always yield the most fruitful results. Using our new search and reporting features can significantly improve the user's ability to move through what Diderot himself described as the "tortuous labyrinth" that is the Encyclopédie. Looking at frequency of occurrence by article or collocation tables, for example, can provide more useful paths into the Encyclopédie than simple word searches alone.
From the outset of the Encyclopédie project there were several important editorial decisions that greatly affected the initial construction and dissemination of the database. First, there was the choice of the edition. There were many editions of the Encyclopédie in various formats. We chose the first printing of the Paris edition - see our comparison of Encyclopédie editions. Richard Schwab then kindly agreed to expertise the microfiche version produced by IDC (Leiden, The Netherlands) and confirmed that it reproduces a good copy of the first edition - it was from these microfiches that our contractor performed the data entry of the Encyclopédie. We were aware that many typographical errors had been introduced into the text during the data capture procedure. Unfortunately, due to the size of the Encyclopédie and its great semantic diversity, it was impossible to correct these errors by any normal spell-checking procedure. Additionally, given that fact that all identifications of textual elements - articles, authors, cross-references, etc.- were made using automated procedures based on typographical patterns, we were aware that many problems - unidentified or misidentified articles, missing author attributions, incomplete information about grammatical and knowledge categories, malfunctioning cross-references, etc. - would need to be addressed through a large-scale corrections project. These reservations aside, we thought it best to release a largely uncorrected version of the database and to work on progressively integrating both text and metadata corrections as they were made. For more on our corrections effort, see the Encyclopédie Corrections page.
Database Corrections
The new version of the Encyclopédie database (Revision 3.5, 5/2013) contains more than 650,000 modifications made to the original 1998 source files, these corrections were made using a variety of approaches, both automatic and by hand. Over the past few years we have also worked to improve and correct the Encyclopédie metadata - Article Headwords, Author Attributions, Classes of Knowledge, etc. We are aware, however, that many small textual errors - artifacts of the original data capture project - still remain in the database. In an effort to track these errors down users can now use the "Report Error" link at the top right hand corner of the results page to report errors directly to the ARTFL Project. These errors will be collected and applied on a quarterly basis.
In November of 2009 we began the process of converting the text of the Encyclopédie into standard Unicode (UTF-8) using a light TEI-XML encoding scheme. This move is significant in two ways: First, we can coherently represent and associate an article’s metadata (author, classifications, part of speech, etc.) with the article itself, i.e., in a TEI-XML header for each article entry, rather than storing them in external databases as we have done in the past. This will additionally allow us to manipulate the metadata in the future, adding machine classifications, similar article lists, a notes section, or any other relevant information on an article-specific basis. Secondly, the move to the Unicode standard has finally made correction of the Greek passages in the Encyclopédie possible - see our Greek Corrections page.
We have also corrected many, if not all, of the structural problems that have long been an issue with the database - missing articles or mis-recognized headwords, badly formatted front matter (e.g., the "Avertissements,"), etc. - resulting in more than 300 new articles and sub-articles. We have also merged each of the various data "chunks" that made up the volumes (essentially seven 1MB sections for each volume) into 28 discrete TEI files, thus obviating the long-standing "overlap" problem that prohibited moving from one page to the next if occurring between two volume parts.
New Search and Reporting Capabilities
The Encyclopédie database uses a modified version of the ARTFL Project's full-text search and retrieval engine, PhiloLogic. With this new version comes several new search and reporting features such as collocation tables, frequency by headword reports, and a sortable keyword in context (KWIC) function.
Research and Archival Materials
The "18th" Volume: A New Resource
Most recently, using a combination of bayesian and k-NN (k-Nearest Neighbor) classifiers -- much like the ones at work in your everyday spam filter -- we have leveraged the classification scheme of the Encyclopédie (which in today's computer science and information retrieval terminology would be called its "ontology") to predict the classification of the 13,000 (15,000 with plate legends) articles that the editors left with no class of knowledge. This same process was enacted to then "reclassify" the 61,000 remaining articles, those with classes of knowledge originally assigned by the Encyclopédie’s editors but which, for our purposes, were hidden from our classifiers. The resulting machine-generated classes for all 74,000 articles have been added to the metadata of each article for search and display purposes. A little over 73% of the classified articles came back with their original classes, an astounding feat considering the size and complexity of the Encyclopédie's ontology. Thus, the remaining 27% of articles have been assigned "new" classes that may, or may not, represent the content of their articles better. They will most certainly, we hope, generate a fair amount of debate and dialogue amongst our users. To that end, we are exploring ways in which users could comment or evaluate the machine-generated labels as well as the "Similar Article" lists outlined below.
Using the same Vector Space and k-NN similarity approach from above, we have identified the 50 most "similar" articles for nearly 40,000 of the Encyclopédie’s entries (those with 60 or more words). Users are thus able to consult a select number of articles related (via the k-NN calculations) to the article they are reading, as well as a list of shared features (word stems) between any two "similar" articles. This will perhaps allow users to discover related themes, authors, articles, etc. independently of word or metadata searching; in effect "navigating" through the Encyclopédie via similar articles rather than the traditional Google-style point-and-click method of searching. With this same notion of navigation versus searching we are also experimenting with ways of representing the system of cross-references (the renvois mentioned above) independent of the text in which they occur. This level of abstraction can offer a new perspective on how articles are related via the network of links that connect them to each other, a network that according to Diderot was the most "philosophic" of the editors' organizational schemes for the Encyclopédie.
Our newest experiment uses sequence alignment algorithms borrowed from bio-informatics in an effort to find discrete text sequences, from several words to entire articles, that occur in the Encyclopédie and earlier works such as Montesquieu's De l'esprit des lois. It is our hope that by expanding these techniques we can come to a better understanding of the intertextual nature of the Encyclopédie, gauging not only to what extent its authors used previous sources, but also how the philosophes were themselves received and appropriated in the decades following the Encyclopédie's publishing. For more on this and other ongoing research see the ARTFL-PhiloMine bibliography and the ARTFL Research Blog.
Collaborations have been an important part of the Encyclopédie Project's development, and we continue to welcome any opportunity for further collaborative enterprises in the future. Our most successful collaborations have all contributed to the various elements outlined above - bringing us new resources (University of Virginia and the"18th Volume"); translations and classifications (University of Michigan); contributions to our research and archival material, corrections and editorial advice (CNRS); and collaborative research and development (Stanford University). The collaborative atmosphere of this "living edition" will only increase in importance as this edition of the Encyclopédie will reach a far greater audience. All users are encouraged to think about ways to ameliorate this resource, whether simply by alerting us to errors using the "Report Error" link, or through a more engaged reflection on its development. For more, see our Encyclopédie Collaborations page.