Ok. This version reads our lattice format, and has two commandline flags for the main class, -externalTokIdsFilename -tokIdsOnLeafs If the first one is present, then the list of token indices is written to the specified file name ('-' will write to stdout). see description of the format at: http://dokufarm.phil.hhu.de/spmrl2013/doku.php?id=shared_task_description If the second one is present, token indices will be added to the leafs as shown below: (TOP (S (NP (SYN_CDT (CDT EFRWT-1)) (NP (SYN_NN (NN ANFIM-2)))) (VP (SYN_VB (BN MGIEIM-3))) (PP (SYN_IN (PREPOSITION M-4)) (NP (SYN_NNP (NNP TAILND-4)))) (PP (SYN_IN (PREPOSITION L-5)) (NP (SYN_NNP (NNP IFRAL-5)))) (SBAR (SYN_IN (TEMP KF-6)) (S (S (NP (SYN_PRP (PRP HM-6))) (VP (SYN_VB (BN NRFMIM-7))) (NP (SYN_MOD (ADVERB K-8)) (NP (SYN_NN (NN MTNDBIM-8))))) (SYN_yyCM (yyCM yyCM-9)) (SYN_CC (CC AK-10)) (ADVP (SYN_RB (RB LMEFH-11))) (VP (SYN_VB (BN MFMFIM-12))) (NP (NP (SYN_NN (NN EWBDIM-13))) (ADJP (SYN_JJ (JJ FKIRIM-14))) (ADJP (SYN_JJ (JJ ZWLIM-15)))))) (SYN_yyDOT (yyDOT yyDOT-16))) ) Both can be specified together. Note: the parser will not look at the coarse-tag, lemma and morph-feats fields of the lattice. The user should enrich the form and pos fields prior to calling the parser, if he wants to get information from the lemma, cpos or morph-feats fields. Note also that the GrammarTrainer class strips everything after the first '-',"=" or "^" in the non-terminals (including pos-tags) prior to training. This can be disabled with the "-noNormalize" flag to GrammarTrainer (but then you probably want to edit the trees yourself to have a reasonable set of non-terminals). If this behavior is not disabled, and the lattice include tags with '-', '=' or '^' in them, you probably want to strip the extra-material from them also, using the -stripTags flag to the main class. example usage: java -jar blatt.jar -lattice -gr grammarFile -externalTokIdsFilename tokenization.out < latticeFile Yoav Goldberg (yoav.goldberg@gmail.com)