Results

Task 1 (A)

System nameLAS (F1 score)MLAS (F1 score)BLEX (F1 score)
COMBO 86.11 76.18 79.86
IMS 83.82 69.27 60.88
Poleval2k18 77.7 61.21 70.01
Drewutnia 27.39 18.12 25.24

Task 1 (B)

System nameELAS (F1 score)SLAS (F1 score) 
IMS 81.9 65.98
COMBO 80.66 77.3
Poleval2k18 66.73 67.84

Notes:

The predicted analyses are evaluated with the script poleval2018_cykle.py. It is a modified version of the evaluation script prepared for CoNLL 2018 shared task on Multilingual Parsing from Raw Text to Universal Dependencies. The most important modification consists in adding two measures, ELAS and SLAS, for the purpose of evaluating dependency graphs with enhanced edges and semantic labels, respectively. The second modification is motivated by the fact that some of the participating systems predicted trees with cycles, multiple root nodes, etc. We decided not to reject such submissions, but only to score incorrect trees with 0.

There are two main metrics: LAS in the task 1(A) and ELAS in the task 1(B). The systems are ranked according to these scores.

The gold standard data is available below:

PDBUD_A_test.conllu.gz

PDBUD_B_test.conllu.gz

Task 2

The table below shows results of evaluation. It contains system names, initials of submitting authors, micro average F1 scores computed   for overlap and exact matches. We computed final scores as weighted means: 0.8 * overlap + 0.2 * exact. The reason for this are (1) combining both measures while (2) giving strong premium for overlap matches. Hopefully, both overlap and exact matches produce very similar orderings of competing systems, even without using combined weighted scores as in the column Final.

Golden answers may be found here (brat format). The evaluation script can be downloaded here.

SystemAuthor   Exact     Overlap   Final 
Per group LSTM-CRF with Contextual String Embeddings [Ł. B.] 0.826 0.877 0.866
PolDeepNer [M. M.] 0.822 0.859 0.851
Liner2 [M. M.] 0.778 0.818 0.810
OPI_Z3 [S. D.] 0.749 0.805 0.793
joint [M. L.] 0.748 0.789 0.780
disjoint [M. L.] 0.747 0.788 0.779
via_ner [P. P.] 0.692 0.773 0.756
kner_sep [K. W.] 0.7 0.742 0.733
Poleval2k18 [M. Z.] 0.623 0.743 0.719
KNER [K. W.] 0.681 0.719 0.711
simple_ner [P. Ż.] 0.569 0.653 0.636

None of the systems has been declared as using other named entity annotated corpus than the official 1M NKJP, therefore we present the competing systems in just one table.

Thank you for participation and congratulations for everyone, especially the winners!

UPDATE: annotation guidelines for the test set can be found here.

Task 3

System namePerplexity  
ULMFiT-SP-PL (model) 117.6705  
AGHUJ (model) 146,7082  
PocoLM Order 6 208.6297