PolEval 2018 :: Results

Results

Task 1 (A)

System name	LAS (F1 score)	MLAS (F1 score)	BLEX (F1 score)
COMBO	86.11	76.18	79.86
IMS	83.82	69.27	60.88
Poleval2k18	77.7	61.21	70.01
Drewutnia	27.39	18.12	25.24

Task 1 (B)

System name	ELAS (F1 score)	SLAS (F1 score)
IMS	81.9	65.98
COMBO	80.66	77.3
Poleval2k18	66.73	67.84

Notes:

The predicted analyses are evaluated with the script poleval2018_cykle.py. It is a modified version of the evaluation script prepared for CoNLL 2018 shared task on Multilingual Parsing from Raw Text to Universal Dependencies. The most important modification consists in adding two measures, ELAS and SLAS, for the purpose of evaluating dependency graphs with enhanced edges and semantic labels, respectively. The second modification is motivated by the fact that some of the participating systems predicted trees with cycles, multiple root nodes, etc. We decided not to reject such submissions, but only to score incorrect trees with 0.

There are two main metrics: LAS in the task 1(A) and ELAS in the task 1(B). The systems are ranked according to these scores.

The gold standard data is available below:

PDBUD_A_test.conllu.gz

PDBUD_B_test.conllu.gz

Task 2

The table below shows results of evaluation. It contains system names, initials of submitting authors, micro average F1 scores computed for overlap and exact matches. We computed final scores as weighted means: 0.8 * overlap + 0.2 * exact. The reason for this are (1) combining both measures while (2) giving strong premium for overlap matches. Hopefully, both overlap and exact matches produce very similar orderings of competing systems, even without using combined weighted scores as in the column Final.

Golden answers may be found here (brat format). The evaluation script can be downloaded here.

System	Author	Exact	Overlap	Final
Per group LSTM-CRF with Contextual String Embeddings	[Ł. B.]	0.826	0.877	0.866
PolDeepNer	[M. M.]	0.822	0.859	0.851
Liner2	[M. M.]	0.778	0.818	0.810
OPI_Z3	[S. D.]	0.749	0.805	0.793
joint	[M. L.]	0.748	0.789	0.780
disjoint	[M. L.]	0.747	0.788	0.779
via_ner	[P. P.]	0.692	0.773	0.756
kner_sep	[K. W.]	0.7	0.742	0.733
Poleval2k18	[M. Z.]	0.623	0.743	0.719
KNER	[K. W.]	0.681	0.719	0.711
simple_ner	[P. Ż.]	0.569	0.653	0.636

None of the systems has been declared as using other named entity annotated corpus than the official 1M NKJP, therefore we present the competing systems in just one table.

Thank you for participation and congratulations for everyone, especially the winners!

UPDATE: annotation guidelines for the test set can be found here.

Task 3

System name	Perplexity
ULMFiT-SP-PL (model)	117.6705
AGHUJ (model)	146,7082
PocoLM Order 6	208.6297

POLEVAL 2018

Results