Proposal DHBENELUX CONFERENCE 2016
A pilotstudy in automatically mapping ‘disciplines’ represented in Harry Mulisch’s The Discovery of Heaven (1992)
 Leon van Wissen,  Marieke van Erp,  Ben Peperkamp, VU University, Amsterdam
The literary studies field has a longstanding tradition of detailed analysis of literary works. This results in fine-grained, but usually small-scoped studies. The advent of computational methods makes it possible to scale up the subject of analysis and start, for instance, comparing oeuvres. Before we do so though, it is important to evaluate the precision and impact of such computational methods, for which we have carried out a small ‘experimental’ pilotstudy in which we automatically tried to map ‘disciplines’ in the prologue of Harry Mulisch’s The Discovery of Heaven  using DBpedia Spotlight .
This novel is considered by many as Mulisch’s ‘masterpiece’ , it is a fair body of work (nearly 1.000 pages, containing ~270.000 words) and contains a lot of references to disciplines such as the natural sciences, theology, humanities and politics. The novel embodies and uses ‘encyclopedic knowledge’: it strives to capture ideas and opinions and shows a variety of means to interpret the world [4, 5].
The goal of this small pilot is twofold:
- Map the scientific disciplines (here seen as a set of practices within scientific communities, regarding domains of research and accepted theories and practices), represented in Harry Mulisch’s The Discovery of Heaven.
- Assess the added value of computational resources and semantic web tools such as DBpedia and DBpedia Spotlight in complementing traditional literary analysis.
We wrote a computerprogram that
- takes a (Dutch) text file, scans the file for word combinations that have their referent on Wikipedia, and classifies these words into scientific disciplines;
- outputs a list of terms with their discipline and a network graph, visualizing clusters of knowledge domains that are represented in the text.
The first step is being done by feeding chunks of text into DBpedia Spotlight, which automatically filters out so called Named Entities, for example person names such as ‘Julius Caesar’ or location names such as ‘Cuba’. The program then tries to match this entity to one or multiple disciplines found in DBpedia that we have predefined in a list, such as ‘Physics’ or ‘Biology’. This is done by crawling through DBpedia’s hierarchical structure of semantic data, by listing every hierarchical parent of a Named Entity’s subject. The program then combines this information into a network graph.
See http://kyoto.let.vu.nl/~vanerp/TheDiscoveryOfHeaven/. The first results shows that entities could be linked to categories of disciplines, which should give insight in the way which knowledge is represented and distributed in the novel. The program is able to mark cross-disciplinary entities found in the text, so that a term like ‘amino acid’ is both linked to ‘Biology’ and to ‘Chemistry’. More details will be presented at the Conference.
 Mulisch, Harry. De Ontdekking van de Hemel. 20th ed. Amsterdam: De Bezige Bij, 2006.
 Joachim Daiber, Max Jakob, Chris Hokamp, Pablo N. Mendes ‘Improving Efficiency and Accuracy in Multilingual Entity Extraction’. Proceedings of the 9th International Conference on Semantic Systems (I-Semantics). Graz, Austria, 4–6 September 2013.
 Brems, Hugo. Altijd weer Vogels die Nesten beginnen: Geschiedenis van de Nederlandse Literatuur, 1945-2005. B. Bakker, 2006. 556-557.
 Mendelson, Edward, ‘Encyclopedic narrative: from Dante to Pynchon’. Comparative literature 91.6, 1976. 1267-1275.
 Van Ewijk, Petrus. ‘Encyclopedia, network, hypertext, database: The continuing relevance of encyclopedic narrative and encyclopedic novel as generic designations.’ Genre 44.2, 2011. 205-222.