Third Party Releases

JParaCrawl

JParaCrawl is the largest publicly available English-Japanese parallel corpus created by NTT Communication Science Laboratories. It was created by largely crawling the web and automatically aligning parallel sentences. For more details, see paper.

For more details and to download the corpus please visit JParaCrawl website.


Acknowledgements from JParaCrawl

We have used Bitextor created by the ParaCrawl project. We gratefully acknowledge the ParaCrawl project for releasing the software and fruitful discussions. We also would like to thank Hisashi Itoh and Takumi Asai for their technical support.