• '''The NArabizi Treebank" a treebank for North-African Dialectal Arabic written in Arabizi (transliterated Arabic to Latin script).
  • The Cr#pBank (aka the French Social Media Bank) A French treebank made of Facebook(c), Twitter(c), Doctissimo (medical forum) and JeuxVideos.com (a video game forum). First annotated Facebook data set download v0.9.1 (Constituent trees only).<br>(dependency version out soon, seriously it's done, we just need to publish some results on arXiv first and it's out. I swear.)
  • The Sequoia TreeBank First free treebank for French, containing data from Wikipedia, the Est Republicain newspaper, Europarl and Biomedical texts. v6 (v4 for comparing with (Candito et Seddah, 2012)'s results)
  • The Deep Sequoia Treebank an updated version with an extra annotation layer in deeper syntax. Done in collaboration with the Nancy Semagramme team. Website
  • The Deep FTB an instance of the French Treebank (used for the SPRML Shared tasks) with the same deep syntax annotation layer as the Deep Sequoia. ''(contact me or Marie Candito for the data set, you need a free licence of the original FTB.)


  • The French QuestionBank First treebank made of questions, besides English. First 1800 sentences, out of 2600, are aligned with the English QuestionBank (Judge et al, 2006).Available in const, surface and deep dependencies. web page
  • The SPMRL 2013/2014 Data Set a set of treebanks for 9 morphologically-rich languages (Arabic, Basque, French, German, Hebrew, Hungarian, Korean, Polish, Swedish) that were used for the two SPMRL Shared tasks (2013,2014) and are still used today (especially for constituant parsing evaluation). They were the first shared tasks to propose an end-to-end evaluation with non gold tokens Overview paper shared task page (data set still available). Bonus: both const and dependency trees are aligned at the token level. Also contains a few LCFRS treebanks (const trees with crossing branches).