Mining and Modeling Text
Digitalisation has made extensive text and data resources increasingly available. The project is rising to address the challenge of efficiently incorporating these digitised resources in the humanities. To maximize the benefits of digitalisation and efficiently use these resources, the academic community requires innovative processes that permit automatic information extraction, and promote the subsequent generation of knowledge.
Against this background, the MiMoText project deals with the automatic extraction, structuring and networking of specialist information from text and data collections, as well as the use of such information networks to answer questions in the humanities. The first application of this project is in the context of the history of German and French literature, but the transferability of the methods to other disciplines has been intended from the project’s outset. MiMoText takes into account different types of texts, from lightly structured texts (e.g. bibliographical indexes) to non-fiction texts in the humanities (e.g. literature history) to literary texts (e.g. novels).
As as central goal, interdisciplinary solutions that combine conceptual, humanities-focused, informatics, legal and infrastructural questions and procedures are developed.
Within the framework of the legal support for the project, legal topics are identified that arise in the context of the project. These are then prepared in abstract form in the form of handouts. The handouts are published in the IRDT's PAPERSERIES and aim, among other things, to present the legal framework for the use of text and data mining in the humanities beyond the context of the project.
In 2021, the Projekt MiMoText was selected to be presented at the virtual annual conference of the Digital Humanities in the German-speaking world (vDHd2021), which had the overarching theme of "Experiments". In six stations, insights into the MiMoText project were provided in an interactive, virtual format on 24.03.2021.
As an introduction to the project presentation, the individual sub-projects were presented by the respective project leaders in six videos. On the one hand, these impulse videos were intended to introduce the sub-projects and, on the other hand, to facilitate the entry into the dialogue in the virtual space with its project stations. In the videos, the sub-projects "bibliography", "corpus of novels", "secondary literature", "modelling", "law" and "infrastructure" were presented in depth. Concrete examples and a project pilot were used to illustrate the sub-projects and their approaches. A virtual room (wonder.me) made it possible to implement a concept for discussions at six flexibly changeable stations. This created a realistic and interactive communication platform.
The law sub-project and the mode of interdisciplinary cooperation with the digital humanities were presented at Station 5. The impulse video first explained the mode of interdisciplinary cooperation along the iterative procedural steps in the use of text and data mining in the humanities. Then the example of the use of scientific editions in text analyses was discussed and the ancillary copyright according to § 70 Coryright Act (UrhG). In this way, it was made clear how legal topics are identified from the ongoing project work that are also relevant for the digital humanities beyond the project context.
The IRDT contributes its legal expertise to the project. Prof. Dr. Raue and Frau Erler-Fridgen from the Institute are involved in the project.