Content analysis of biographical data supported by computational linguistics

Our web application "Textual Analysis Migration" is a first example of how facts on emigrations can be explored. The facts are extracted by computational linguistics processing by CLARIN infrastructure from large corpora. An important aspect is that the results can be visualized in various ways (table view, geographic map view, and personalized view) and keep the reference to the original text position. Thus, the user may assess the quality of individual facts at any time and actively report improvements.

Especially relevant for

  • Historians
  • Politic scientists
  • literary scholars

Staring point:

German texts, which contain migration information. For example, an article in which is mentioned that someone has moved to another country.

Task:

representation of migration flows from and to a country.

Solution:

Web Interface Computational Linguistics supported content analysis of biographical data (textual Migration Analysis).

Related CLARIN-D tools and services

Short guide on how to use computational linguistics supported content analysis of biographical data (textual Migration Analysis)

  1. go to this website:
    • Selective Display: random, basic, UN data
    • mouse click on a country: selects that country
    • migration roots are displayed as arrows
    • a table shows aggregate information regarding the selected country, emigration in blue and immigration in red
  2. Selection of other sources:
    • click on the left tab ('Official Data Set (United Nations 2010)'):
      • Choose: German Wikipedia Edition (02/2011)
        • Data relating to the automatic extraction from Wikipedia in the German edition, February 2011
        • Additional column of the table: "Details" allows investigation of all text passages recognized
        • Errors can be reported through the system, the messages are incorporated into new versions of the tool
        • The link leads to the Wikipedia article in the current version
      • Choose: ÖBL (Austrian Biographical Dictionary 1815-1950)
        • Additional column of the table: "Details" allows investigation of the source articles
        • Errors can be reported through the system, the messages are incorporated into new versions of the tool
      • Choose Frech Wikipedia-Edition (02/2011)
        • Data relating to the automatic extraction from Wikipedia in the French edition, February 2011
        • Additional column of the table: "Details" allows investigation of all text passages recognized
        • Errors can be reported through the system, the messages are incorporated into new versions of the tool
        • The link leads to the Wikipedia article in the current version
  3. Export the result
    • click on „export“ next to the title of the table
    • the table content can be downloaded
  4. Migration analysis on your own data:
    • click on "Text Input“ in the top menu
    • type in your own text, for example using Copy&Paste
    • Migration patterns based on the data are displayed

Additional information

A technical and methodological description can be found in:

  • Blessing, Andre; Kuhn, Jonas (2014): Textual Emigration Analysis (TEA) In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14) European Language Resources Association (ELRA), Reykjavik, Iceland.

The web application allows the visualization and exploration of migration processes. Different data sources can be included. In the current version the following sources are supported:

  • Official Data Set (United Nations 2010)
  • German Wikipedia-Edition (02/2011)
  • ÖBL (Austrian Biographical Dictionary 1815-1950)
  • Frech Wikipedia-Edition (02/2011)

The application also offers the possibility to extract migration information from your own texts.

  • Handling the user interface
  • geographic exploration of the UN Migration dataset
  • geographic exploration of the German Wikipedia Migration dataset
    • qualitative and quantitative analysis through tables and linked textual sources
  • geographic exploration of the ÖBL Migration dataset
    • qualitative and quantitative analysis through tables and linked textual sources
    • reporting annotation errors
  • geographic exploration of the French Wikipedia Migration dataset
    • qualitative and quantitative analysis through tables and linked textual sources
  • Export all the migration dataset of one country to a csv file
  • automatic extraction of migration patterns from your own texts
 

Tutorial - Video