WebLicht services

Your NLP tool can be easily integrated into the WebLicht tool-chaining. Wrap it by a RESTful webservice that accepts TCF input and/or produces TCF output and you are done!




TCF format

The TCF format (Text Corpus Format) is used by WebLicht services as a machine-readable format for representing and exchanging linguistically annotated texts. It enables interoperability of linguistic tools. WebLicht developer manual provides detailed information on the TCF format and its background.

  • Downloads:
    NG Relax Schema specifications for:
    • the latest TCF schema and document example with text corpus annotation layers, TCF specifications

    Since version 0.4.5, TCF specifications have been hosted on Github. Please check it for latest updates and more information.

  • Previous versions, change log:

    TCF 0.4, 01 Feb 2016 (in MetaData allow xsi:schemaLocation on parents of the CMD element)

    TCF 0.4, 11 Feb 2015 (in TextCorpus: textSource layer is added.)

    TCF 0.4, 16 May 2014 (in TextCorpus: textstructure layer extended with character offsets information in textspanS)

    - TCF 0.4, 08 May 2014 (in MetaData: incorporated CMD chain defenitions from http://catalog.clarin.eu/ds/ComponentRegistry/rest/registry/profiles/clarin.eu:cr1:p_1320657629623/xsd)

    - TCF 0.4, 09 Dec 2013 (in TextCorpus: wsd (word senses) layer added; in Lexicon: entries layer replaces lemmas layer, syllabification, cooccurrences and synonyms layers added, minor modifications to other layers)

    - TCF 0.4, 26 Apr 2013 (in external data namedentitymodel layer added)

    - TCF 0.4, 14 Mar 2013 (in textstructure layer textspan element changed to represent a tree structure: textspan can optionally contain a value or nest another textspan elements)

    - TCF 0.4, 20 Dec 2012 ( TextCorpus lang attribute is made required; in references layer value of target attribute of reference element is changed from xsd:IDREF to xsd:IDREFS)

    - TCF 0.4, 26 Nov 2012 ( textstructure layer extended)

    - TCF 0.4, 31 July 2012 ( external data section added, geo layer extended, discourseconnectives layer added, minor change in matches layer)

    - TCF 0.4, 25 Jun 2012 (coreferences layer is changed into references and extended to allow relations between references; parsing layer extended to allow edge labels; textstructure, orthography, geo layers added)

    - TCF 0.4, 30 Mar 2012 (change in coreferences layer; change in matches layer; change in most layers to allow for empty layers, in case tool is run but no annotations are found)

    - TCF 0.4, 27 Jan 2012 (change in dependency layer: govIDs made optional)

    - TCF 0.4, 08 Dec 2011

    - TCF 0.4, 29 Sep 2011

    - XML Schema and NG Relax Schema specifications for TCF 0.3, 03 Jul 2009

  • Validation:

    - Online validator for TCF

Before you develop a WebLicht service check the latest TCF specification and decide what layers of annotation your service will consume, and what layers it will produce. In case you are developing a service that produces an annotation layer that is not part of the latest TCF specification, contact the Clarin-D development group (info AT d-spin DOT org). It is possible to integrate new linguistic annotation layers in TCF and the specification will be updated.

After you develop a WebLicht service, you can check using the online validator whether your service output complies with the corresponding TCF schema. The compliance with the TCF schema ensures that your service is interoperable with other WebLicht services and tools.




Working with TCF documents


Create/read/write

We offer the library for TCF - Java transformation, so that working with TCF data on client and server sides becomes an easy task. You may want to use this library if you want to integrate your tool into the WebLicht tool-chaining using Java.

  • Tutorials:

    - Up-to-date tutorial that shows how to use WLFXB library to read, create and write TCF documents can be found at the WebLicht developer manual in TCF section

    - Old tutorialTCFXB 0.3 for TCF 0.3 shows how to use TCF0.3 - Java objects binding library to consume and produce TCF0.3 documents using Java

  • Downloads:

    - WLFXB library for TCF0.4-Java objects binding can be found at Clarin EU repository. There you can also find WLFXB test case sources, where you can find examples of how to read/write layers from/into TCF0.4 using WLFXB

    - library for TCF0.3-Java objects binding TCFXB0.3 and its API



Visualize

Here you can find applications for viewing NLP tools processing results contained in TCF documents in a user friendly graphical interface.

  • Tutorials:

    - TIEWER TCF0.3 tutorial shows how to view and edit linguistic data in TCF0.3 documents with the help of Tiewer desktop application

  • Downloads:

    - TIEWER for TCF0.3 pilot desktop application for viewing and editing of linguistic data in TCF0.3

  • Online:

    - visual 0.3 web application for viewing linguistic data in TCF0.3

    - visual 0.4 web application for viewing linguistic data in TCF0.4