The research curated on the Dragoman Renaissance Research Platform is over thirteen years in the making. Most of the archival and secondary research was conducted using various bibliographic software management systems, including Procite, Endnote, and Zotero. The resulting data was in five separate schemas, with related and duplicate information in three languages. Other source material came from unstructured data (such as documents).
The first step, the final step, and at every step, we revised a data model that tried to describe the ontology of the research that had been conducted and how objects were related (see Figure 1 and Figure 2). Simultaneously, we worked to get the databases into a shared, simple schema. This wound up resulting in a series of spreadsheets. To get to the spreadsheets, XSLTs and a tool called OpenRefine were utilized to split concatenated fields, normalize spellings (into, for example, English place names) and otherwise clean the data and get it into a structured format. Natalie and her students also worked to manually divide and enrich information that could not be programmatically updated. We faced numerous issues during this time related to text encoding and partial dates. The data represented in these spreadsheets is rich, but recognized as incomplete, both because of the nature of archival research and because of the remaining work that Natalie would like to complete or collaborate on in order to get a richer picture of the people, texts, and institutions that are her object of study.
Spreadsheet columns were mapped to open metadata schemas MODS and MADS, and XML files were created programmatically from the files and ingested into a Fedora repository using Islandora, Drupal, and Solr for index and display. As part of this work, relationships were recorded between key objects in the database. For example, dragomans were related to the documents they translated, documents were related to their folios, persons were related to their relatives, and citations were related to documents and persons - these relationships are complex and ongoing, and in future phases of the project, we aim to make these easier to author and edit.
Finally, the spreadsheets were also loaded into Palladio, a visualization platform, to create a series of interactive maps and timelines that would illustrate key features of the dataset. These are currently static on the main website, and in future, will be dynamically created from the evolving dataset in the Fedora repository.
At present, work continues on a Data dictionary, identifying major taxonomies and their relationships to our existing objects (See Figure 2)
Figure 1. Diagram of Dragomans Data Model
Figure 2. Emerging Taxonomy structure related to forms
At present, the server has not been optimized and is slow performing, so we have created a demo video to illustrate key functions, as well as the interactive visualizations. The video is divided into the following sections:
Searching and indexing
The metadata of the objects in the repository are indexed by Solr, which is a server-based Java application that uses the open-source Apache Lucene syntax. Fields in the metadata are indexed and are made searchable through Islandora’s search interface. In the search results, you can further refine a search by faceting, sorting, by categories. Your search results can be exported into CSV.
Browsing objects in the repository
Objects can be arranged and organized as collections in the repository that can be browsed through the Islandora interface. Solr’s powerful indexing functionality allows the creation of unique displays of objects based on search queries and filtering of objects. These displays include object displays and the display of their relationships to other objects in the repository.
Bibliographies can be created from citations objects in the repository. Citations can be imported using Islandora's Scholar module, which includes an RIS importer and a suite of tools to manage citations. Google Scholar is integrated to retrieve citations. Using the official repository for Citation Stylesheet Languages in Github, citations can be created and exported in over 7500 free citation styles.
- Visualizations in Palladio
Palladio is a web-based platform for the visualization of complex, multi-dimensional data. The toolset provides powerful authoring tool to support multiple values and hierarchies in data. Data can be displayed as maps, timelines, lists, and galleries. Palladio supports the import of JSON, spreadsheet (such as CSV) and SPARQL query based data. Palladio is designed by the Humanities + Design Research Lab at Stanford University. The demo shows version 0.9.0, released in January 2015. For more information about Palladio, visit http://palladio.designhumanities.org.