The GeCeG manual pages explain in detail the corpus annotation schemes. There are two main sections. The first one covers the part-of-speech tagging. Specifically, it deals with general principles regarding the morphological annotation, word classes and subcategories, affixation and other word-formation processes as well as inflectional features and syncretism. The second section is about the syntactic annotation of the corpus. It provides information on general principles and terminology regarding syntactic labelling, detailed explanations and examples of the grammatical functions used, the treatment of displaced constituents, guidelines regarding disfluencies as well as additional material made available for every token. In addition, there is an index page for quick navigation. It is a register of keywords, which link directly to the sections where the respective concepts are discussed.

Comparison with Other Corpora

The core annotation properties of the GeCeG are similar to other corpora of the CorpusSearch family. For example, like other CorpusSearch corpora, it parses sentences as relatively flat trees whose leaves are pairs of part-of-speech labels and actual word forms, or represents displaced constituents through the use of numerical indices. However, there are also fundamental differences between the annotation schemes of the GeCeG and other CorpusSearch corpora. These differences affect both general design decisions as well as parses of specific structures. The manual includes specially designated paragraphs in strategic places which point out the most important divergences between the GeCeG and two CorpusSearch corpora, the York Corpus of Old English (YCOE) and the Penn-Parsed Corpus of Middle English (PPCME). These remarks are introduced with a symbol that shows an exclamation mark framed by a red triangle. It looks as follows:

The purpose of these comparisons is twofold. Firstly, they are meant to help scholars familiar with the YCOE or PPCME to quickly acquaint themselves with the GeCeG annotation scheme, to avoid pitfalls and to use the GeCeG productively. Secondly, they are hoped to facilitate the composition of search queries that find syntactically analogous structures in early German and other languages, like medieval English, despite the divergences in corpus annotation.


The terminology of the corpus annotation is based on main-stream generative syntax. For example, it uses terms like "subject," "complement," "adjunct," "gapping," or "control." The technical language is related primarily to representational ("filler-gap") but also to derivational ("movement") frameworks. The terminology was chosen on account of its prevalence among linguists even of different schools and persuasions, and because of its potential to conveniently describe a very wide range of different syntactic structures. However, while it is clear that the primitive terms of the technical vocabulary have influenced some basic design principles of the GeCeG, its annotation schemes are theory-neutral enough to not force any practical commitments on researchers. In particular, the corpus will allow researchers to collect data for any kind of morphosyntactic study, including traditionalist, non-generative analyses.


The annotation analyses of inflectional endings are based on paradigms provided by the standard grammar of Old High German, Braune’s Althochdeutsche Grammatik. Where appropriate, the book is referenced in the manual with its page and paragraph numbers. The following edition was used:

Braune, Wilhelm & Hans Eggers (1987)14 Althochdeutsche Grammatik. Tübingen: Max Niemeyer.

Furthermore, information on lexis was obtained from Schützeichel’s dictionary (for instance for the English glosses or the resolution of gender syncretism). The following edition was used:

Schützeichel, Rudolf (2006)6 Althochdeutsches Wörterbuch: überarbeitet und um die Glossen erweitert. Tübingen: Max Niemeyer.


