Seminarium i datorlingvistik
- Datum: 2017-11-10 kl 13:15 – 15:00
- Plats: Engelska parken
- Föreläsare: Lasse Mårtensson and Fredrik Wahlberg
- Kontaktperson: Ali Basirat
- Telefon: 0184717006
Predicting the production dates of medieval Swedish charters
During the last two years, we have worked on predicting the production years for a collection of medieval charters. The collection, called “Svenskt diplomatarium” (SDHK), consists of 43000+ posts of which about 11000 are photographed and published on the website of the Swedish national archive. We have developed several image-based approaches to estimating the production dates of these charters, based on both hand-crafted and learned image features. In addition to this, we hypothesized that changes in pronunciations during the studied era should affect spelling and decided to try our wings in computational linguistics. By creating feature vectors for each charter based on their transcriptions (5300 charters) and n-grams, we could improve on the earlier image-based dating. The different types of feature spaces used were been connected to a timeline using Gaussian process regression.
At the seminar, we will discuss the need for digital paleography, briefly describe the image-based methods, the charter collection and our n-gram based approach. After this, we want to end with a discussion/brainstorming on potential ways of continuing this research track of dating and finding historical characteristics in text.