## Results

**Task 1 (A)**

System name | LAS (F1 score) | MLAS (F1 score) | BLEX (F1 score) |
---|---|---|---|

COMBO | 86.11 | 76.18 | 79.86 |

IMS | 83.82 | 69.27 | 60.88 |

Poleval2k18 | 77.7 | 61.21 | 70.01 |

Drewutnia | 27.39 | 18.12 | 25.24 |

**Task 1 (B)**

System name | ELAS (F1 score) | SLAS (F1 score) | |
---|---|---|---|

IMS | 81.9 | 65.98 | |

COMBO | 80.66 | 77.3 | |

Poleval2k18 | 66.73 | 67.84 |

Notes:

The predicted analyses are evaluated with the script poleval2018_cykle.py. It is a modified version of the evaluation script prepared for CoNLL 2018 shared task on Multilingual Parsing from Raw Text to Universal Dependencies. The most important modification consists in adding two measures, ELAS and SLAS, for the purpose of evaluating dependency graphs with enhanced edges and semantic labels, respectively. The second modification is motivated by the fact that some of the participating systems predicted trees with cycles, multiple root nodes, etc. We decided not to reject such submissions, but only to score incorrect trees with 0.

There are two main metrics: LAS in the task 1(A) and ELAS in the task 1(B). The systems are ranked according to these scores.

The gold standard data is available below:

**Task 2**

The table below shows results of evaluation. It contains system names, initials of submitting authors, micro average F1 scores computed for overlap and exact matches. We computed final scores as weighted means: 0.8 * overlap + 0.2 * exact. The reason for this are (1) combining both measures while (2) giving strong premium for overlap matches. Hopefully, both overlap and exact matches produce very similar orderings of competing systems, even without using combined weighted scores as in the column Final.

Golden answers may be found here (brat format). The evaluation script can be downloaded here.

System | Author | Exact | Overlap | Final |
---|---|---|---|---|

Per group LSTM-CRF with Contextual String Embeddings | [Ł. B.] | 0.826 | 0.877 | 0.866 |

PolDeepNer | [M. M.] | 0.822 | 0.859 | 0.851 |

Liner2 | [M. M.] | 0.778 | 0.818 | 0.810 |

OPI_Z3 | [S. D.] | 0.749 | 0.805 | 0.793 |

joint | [M. L.] | 0.748 | 0.789 | 0.780 |

disjoint | [M. L.] | 0.747 | 0.788 | 0.779 |

via_ner | [P. P.] | 0.692 | 0.773 | 0.756 |

kner_sep | [K. W.] | 0.7 | 0.742 | 0.733 |

Poleval2k18 | [M. Z.] | 0.623 | 0.743 | 0.719 |

KNER | [K. W.] | 0.681 | 0.719 | 0.711 |

simple_ner | [P. Ż.] | 0.569 | 0.653 | 0.636 |

None of the systems has been declared as using other named entity annotated corpus than the official 1M NKJP, therefore we present the competing systems in just one table.

Thank you for participation and congratulations for everyone, especially the winners!

UPDATE: annotation guidelines for the test set can be found here.

**Task 3**

System name | Perplexity | | |
---|---|---|---|

ULMFiT-SP-PL (model) | 117.6705 | ||

AGHUJ (model) | 146,7082 | ||

PocoLM Order 6 | 208.6297 |