Table of Contents
Editor's Introduction
Introduction to the ARTFL Encyclopédie
From the outset of the Encyclopédie project there were several important editorial decisions that greatly affected the initial construction and dissemination of the database. First, there was the choice of the edition. There were many editions of the Encyclopédie in various formats. We chose the first printing of the Paris edition - see our comparison of Encyclopédie editions. Richard Schwab then kindly agreed to expertise the microfiche version produced by IDC (Leiden, The Netherlands) and confirmed that it reproduces a good copy of the first edition - it was from these microfiches that our contractor performed the data entry of the Encyclopédie. We were aware that many typographical errors had been introduced into the text during the data capture procedure. Unfortunately, due to the size of the Encyclopédie and its great semantic diversity, it was impossible to correct these errors by any normal spell-checking procedure. Additionally, given that fact that all identifications of textual elements - articles, authors, cross-references, etc.- were made using automated procedures based on typographical patterns, we were aware that many problems - unidentified or misidentified articles, missing author attributions, incomplete information about grammatical and knowledge categories, malfunctioning cross-references, etc. - would need to be addressed through a large-scale corrections project. These reservations aside, we thought it best to release a largely uncorrected version of the database and to work on progressively integrating both text and metadata corrections as they were made. For more on our corrections effort, see the Encyclopédie Corrections page.
Database Corrections
The new version of the Encyclopédie database (Revision 3.0, 3/2011) contains more than 550,000 modifications made to the original 1998 source files, these corrections were made using a variety of approaches, both automatic and by hand. Over the past few years we have also worked to improve and correct the Encyclopédie metadata - Article Headwords, Author Attributions, Classes of Knowledge, etc. We are aware, however, that many small textual errors - artifacts of the original data capture project - still remain in the database. In an effort to track these errors down users can now use the "Report Error" link at the top right hand corner of the results page to report errors directly to the ARTFL Project. These errors will be collected and applied on a quarterly basis.
New Search and Reporting Capabilities
The Encyclopédie database uses a modified version of the ARTFL Project's full-text search and retrieval engine, PhiloLogic. With this new version comes several new search and reporting features such as collocation tables, frequency by headword reports, and a sortable keyword in context (KWIC) function.
Research and Archival Materials
The "18th" Volume: A New Resource
Most recently, using a combination of bayesian and k-NN (k-Nearest Neighbor) classifiers -- much like the ones at work in your everyday spam filter -- we have leveraged the classification scheme of the Encyclopédie (which in today's computer science and information retrieval terminology would be called its "ontology") to predict the classification of the 13,000 (15,000 with plate legends) articles that were originally left with no class of knowledge, either by the editors or due to errors in the data entry. This same process was enacted to then "reclassify" the 61,000 remaining articles, those with classes of knowledge originally assigned by the Encyclopédie’s editors but which, for our purposes, were hidden from our classifiers. The resulting machine-generated classes for all 74,000 articles have been added to the metadata of each article for search and display purposes. A little over 73% of the classified articles came back with their original classes, an astounding feat considering the size and complexity of the Encyclopédie's ontology. Thus, the remaining 29% of articles have been assigned "new" classes that may, or may not, represent the content of their articles better. They will most certainly, we hope, generate a fair amount of debate and dialogue amongst our users. To that end, we are exploring ways in which users could comment or evaluate the machine-generated labels as well as the "Similar Article" lists outlined below.
Using the same Vector Space and k-NN similarity approach from above, we have identified the 50 most "similar" articles for nearly 40,000 of the Encyclopédie’s entries (those with 60 or more words). Users are thus be able to consult a select number of articles related (via the k-NN calculations) to the article they are reading, as well as a list of shared features (word stems) between any two "similar" articles. This will perhaps allow users to discover related themes, authors, articles, etc. independently of word or metadata searching; in effect "navigating" through the Encyclopédie via similar articles rather than the traditional Google-style point-and-click method of searching. With this same notion of navigation versus searching we are also experimenting with ways of representing the system of cross-references (the renvois mentioned above) independent of the text in which they occur. This level of abstraction can offer a new perspective on how articles are related via the network of links that connect them to each other, a network that according to Diderot was the most "philosophic" of the editors' organizational schemes for the Encyclopédie.
Our newest experiment uses sequence alignment algorithms borrowed from bio-informatics in an effort to find discrete text sequences, from several words to entire articles, that occur in the Encyclopédie and earlier works such as Montesquieu's De l'esprit des lois. It is our hope that by expanding these techniques we can come to a better understanding of the intertextual nature of the Encyclopédie, gauging not only to what extent its authors used previous sources, but also how the philosophes were themselves received and appropriated in the decades following the Encyclopédie's publishing. For more on this and other ongoing research see the ARTFL-PhiloMine bibliography and the ARTFL Reasearch Blog.
Collaborations have been an important part of the Encyclopédie Project's development, and we continue to welcome any opportunity for further collaborative enterprises in the future. Our most successful collaborations have all contributed to the various elements outlined above - bringing us new resources (University of Virginia and the"18th Volume"); translations and classifications (University of Michigan); contributions to our research and archival material, corrections and editorial advice (CNRS); and collaborative research and development (Stanford University). The collaborative atmosphere of this "living edition" will only increase in importance as this edition of the Encyclopédie will reach a far greater audience. All users are encouraged to think about ways to ameliorate this resource, whether simply by alerting us to errors using the "Report Error" link, or through a more engaged reflection on its development. For more, see our Encyclopédie Collaborations page.