Learning how to do Stylometry with style at the European Summer University in Digital Humanities 2018 in Leipzig

I was honored to attend the European Summer University in Digital Humanities “Culture and Technology” at the University of Leipzig in July 2018 as a CLARIN-D Fellow. During the two weeks, some hundred students and teachers from all over the world attended and taught workshops on various topics in Digital Humanities, ranging from Project Management to Reflected Text Analysis and from XML-TEI to Computer Vision. I participated in the workshop on Stylometry, a method to study the similarities and differences between (literary) texts that is often used for authorship attribution, that is, to answer questions such as who is Elena Ferrante and whether Robert Galbraith is actually J.K. Rowling.

In the workshop, I learned how to use the stylo package in R and became familiar with concepts such as Delta and Manhattan Distance as well as oppose() and rolling.classify(). Although stylometry is based on statistics and the stylo package is written in the programming language R, the workshop is suitable also for beginners as the stylo package has a graphic interface just like any computer software and the teachers explain the mathematical equations behind the statistical analyses. 

The workshop – like the whole Summer University ­– was very intensive, meaning that there was much to be learned in a short amount of time, but also that I got to return home having gained skills that will allow me to do my own stylometric analyses. Additionally, the workshop contained a small, yet important, sidenote to learn how to generate data visualizations using software such as Gephi and jMol. It was great to learn about suitable tools for visualizing stylometric results effectively and aesthetically, especially since two talks at the Summer University focused on data visualization and the social program contained a visit to the Places & Spaces: Mapping Science Exhibit which was full of examples of beautiful and interesting visualizations of research findings.

Other than data visualization, the Summer University lectures covered topics such as the history of the Text Encoding Initiative (TEI) and the story of the founding father of Digital Humanities, Father Busa, providing for a means to gain a bigger picture on the foundations of Digital Humanities. Similarly, the social program provided food for thought by beautifully reflecting the theme of the Summer University, “Culture and Technology”, as it included excursions to places such as the Museum of the Printing Arts and an old yarn spinning factory now turned into artists’ studios and galleries.

The Summer University presented the perfect opportunity for me to take a first step into the realm of Digital Humanities. It gave me the tools to start my own stylometry project, in which I will seek to establish whether stylometry can be applied to detecting the mediating source language of an indirect translation, that is, a translation made from a translation (e.g., whether stylometry can notice the French mediating source language of the Finnish translation of a Greek novel translated indirectly Greek>French>Finnish). I hope this first Digital Humanities project will be followed by others: I am eager to learn more after the inspiring start this summer made possible by the generous CLARIN-D Fellowship.

When culture and technology met at the ESUDH 2018: Some of the printing presses at the Museum of the Printing Arts that the Summer University participants got to visit.


The Bibliotheca Albertina where some of the Summer University lectures took place.

