Friday, 08 April 2022 10:20

Check out MultiParacawl 9, including Ukrainian parallel corpora

 

A new version of the MultiParaCrawl corpus series, including 36 parallel corpora for Ukrainian, has been released. 

The MultiParaCrawl corpus is made of parallel corpora from web crawls collected in the ParaCrawl project and further processed for making it a multi-parallel corpus by pivoting via English. It provides the additional language pairs that came out of pivoting. 

The recently added bonus language, English-Ukrainian, made it into this release of the MultiParaCrawl corpus and, as a result, 36 parallel corpora for Ukrainian paired with all co-official European languages and others, have also been released. 

MultiParaCrawl 9 includes 705 parallel corpora and 41 languages. 

This version is derived from the ParaCrawl original release adjusted for redistribution via the OPUS corpus collection. We thank OPUS for this service. To download data files for all language pairs in different formats and with different kind of annotation (if available), please go to https://opus.nlpl.eu/MultiParaCrawl.php

The data is released under the Creative Commons CC0 license ("no rights reserved").

Last modified on Friday, 08 April 2022 10:42