CLARIN-D for Content Analysis in Social Sciences
CLARIN-D supports content analysis methods in the Social Sciences by providing services to explore language resources, analyse large amounts of written texts, and archive corpora and research results. The Working Group 7 »Content Analysis in Social Sciences« is a network of scholars within CLARIN-D whose aim is to support the Social Sciences by providing a digital research infrastructure.
Data for research
The Virtual Language Observatory (VLO), a dedicated search engine for language based research data, provides access to many resources for research, e.g. Newscorpora in German. Parliamental records of German Federal State Parliaments and the Lower House of Parliament are provided within the F7 Curation Project PolMine. Furthermore, the WG supports creating and providing sources for new corpora, such as the retrodigitalized collections of German newspapers of the last decades.
A dedicated full text search, the Federated Content Search (FCS) of CLARIN-D, allows researchers to search the full text of many of the CLARIN community's resources. In this way, it is possible to find examples for the usage of terms. The search results and the source documents can be extracted and saved as a corpus for further analysis.
Software for research projects
CLARIN-D provides software and web services for the analysis and preparation of language data. This includes WebAnno for the manual and semi-automatic annotation of texts or WebLicht for the automatic annotation of texts with a variety of tools. These can be combined according to the needs and preferences of the user.
For Political Scientists, the Software polmineR, an R-Package with access to text and analysis tools for German plenary sessions, might be of interest.
Providing your own research data
Apart from tools for the analysis of language data, the CLARIN network allows for archiving one's own research data and providing it to the research community for reuse. By cooperating with a CLARIN centre, the data can be prepared in a way that it is precisely described. For example, one tool for describing data is the CMDI-Maker, which creates descriptions that allow easy access for the research community. Besides, it is possible to archive data and provide it via a CLARIN centre (data owners and archiving centre are both secured by a deposit contract and the license for further use of the data by a third party is determined). A tool for creating a data management plan supports projects from a preliminary stage onwards.
Contacts in the disciplines
Within CLARIN-D, the disciplines are organised in Working Groups (WGs). WG7 aims to build connections between the often isolated research in corpus linguistics in Social Sciences and to make existing applications known to more users.
Therefore, the activities of WG7 have the following three fundamental aims:
- Firstly, WG7 wants to generate formal and informal possibilities of connection and exchange, which make it possible for young researchers and members to learn about analysis in corpus linguistics and Social Sciences and to easily identify possible partners
- Secondly, the connection between the experience from existing research and best practices of the community should be improved and advanced. That should create an intense discussion about quality standards of methods in corpus linguistics
- Thirdly, the aims of the methods of content analysis in the Social Sciences should be identified and the demand for technical development should be communicated to the computational linguistics community
- Eva Barlösius und Axel Philipps (University of Hannover)
- Andreas Blätte (University of Duisburg-Essen)
- Sebastian Haunss (University of Bremen)
- Jeannette Hofmann (WZB Berlin)
- Christian Rauh (WZB Berlin)
- Bernd Schlipphak (University of Münster)
Heads and contacts
- Cathleen Kantner (University of Stuttgart),
- Gary S. Schaal (University of d. Bundeswehr Hamburg)
Resources from the Discipline for the Discipline
During the implementation phase of CLARIN-D, the WG identified important resources and tools, which have been developed and prepared for reuse. These small projects are called curation projects within CLARIN-D.
Of special interest for political scientists with programming experience is the software polmineR, an R package that enables access to texts and analysis tools for German plenary protocols.
Within the project ePol the Leipzig Corpus Miner (LCM) was developed. LCM is an infrastructure for the analysis of text mining for large document collections (the ePol-Corpus covers 3.5 million newspaper articles). The range of functions covers full-text search, frequency and co-occurrence analysis, topic models and classification of text paragraphs. A free download of a virtual machine is provided here. The follow-up project iLCM consolidates the analysis software and adds multiple new functions.