The corpus provides all previously existing in NKJP1M layers of annotation including: morphosyntactic layer, syntactic groups and named entities. On top of that a layer of quantifying expressions was added.
The basic layer of the corpus is the morphosyntactic layer in which the text is tokenized and each token is interpreted morphologically by their lemma, grammatical class and values of grammatical categories. The syntactic groups layer indexed single- or multi-token units with a specific syntactic classification: nominal (NG), numeral (NumG), adjectival (AdjG), prepositional (PrepNG), prepositional-adjectival (PrepNG), prepositional-numeral (PrepNumG), adverbial (AdvG) as well as subordinate sentences (CG) or interrogative subordinate sentences (KG) groups. Syntactic groups can be queried using a <g /> tag with optional group type specification, i.e. <g=”NumG” />. A named entity layers may be queried with using a <ne /> tag. The NE classification was described among others in the Korpusomat's user manual in section 3.8 (Polish language only).
The quantificational layer consists of single- or multitoken units classified according to their semantic properties. A quantifier or quantifying expression is a word or phrase expressing a number or quantity that can be formally described as the relationship between two sets. The well-known examples of quantifying expressions are those from logic: whether from Aristotle's syllogistics or first order logic such as each, some, no
Semantic properties of quantifiers described in the corpus:
The quantifying expressions can be queried using a <q /> tag with optional properties specification as described above and with the same order. The values of those properties need to be separated with a colon. For example:
will return instances of a D-type proportional quantifier which is left non-monotonic and right upward monotonic and of positive comparative type (that is: non-comparative). The majority of hits in this case will return instances of the most quantifier or similar.
It is possible to use regular expressions for leaving some properties unspecified in the query. For example, a query:
will return instances of D-quantifiers regardless of their remaining properties. Another example:
will return instances of existential quantifiers.
It is also possible to use in the queries all the relations between various units indexed in all corpus layers. The relations are described in the MTAS search engine specification. For example it is possible to query for all the instances of quantifiers containing a token lemmatized as żaden (no):
<q /> containing [base="żaden"]
or instances of quantifiers consisting of tokens that form a numeral group in the syntactic layer:
<q /> fullyalignedwith <g="NumG" />