Sequence Alignment/PAIR: Pairwise Alignment for Intertextual Relations
In 2009, ARTFL celebrated an open source software release of PAIR (Pairwise Alignment for Intertextual Relations) with an alpha version of PhiloLine available for download at Google Code. PAIR is designed as powerful search tool to help scholars tackle and better understand the widespread problem of literary text reuse.
While PAIR was developed in response to the fairly specific phenomenon of similar passages across literary works, the sequence analysis techniques employed in PAIR were developed in widely disparate fields, such as bioinformatics and computer science, with applications ranging from genome sequencing to plagiarism detection. PAIR generates a set of overlapping word sequence shingles for every text in a corpus, then stores and indexes that information to be analyzed against shingles from other texts.
Common shingles across texts indicate many different types of textual borrowings, from direct citations to more ambiguous and unattributed usages of a passage. Using the below search form, the user can quickly identify similar passages shared between the Encyclopédie and the 3,500+ works included in the ARTFL-FRANTEXT database (Note: ARTFL-FRANTEXT is a subscription database, and as such full-text results cannot be displayed. For more information on ARTFL subscriptions services, please visit our Subscription Details page.
Interested parties are encouraged to consult the release site for more documentation, including technical details, PhiloLine source downloads, and a freestanding Perl module.
Find similar passages between Diderot and d'Alembert's Encyclopédie and the ARTFL-FRANTEXT database. By selecting a "Match Size" parameter, the user can further narrow the search results to look for shared passages of specific lengths.