Ok.
This version reads our lattice format, and has two commandline flags
for the main class,
-externalTokIdsFilename
-tokIdsOnLeafs

If the first one is present, then the list of token indices is written
to the specified file name ('-' will write to stdout).

see description of the format at:

http://dokufarm.phil.hhu.de/spmrl2013/doku.php?id=shared_task_description


If the second one is present, token indices will be added to the
leafs as shown below:

(TOP (S (NP (SYN_CDT (CDT EFRWT-1)) (NP (SYN_NN (NN ANFIM-2)))) (VP
(SYN_VB (BN MGIEIM-3))) (PP (SYN_IN (PREPOSITION M-4)) (NP (SYN_NNP
(NNP TAILND-4)))) (PP (SYN_IN (PREPOSITION L-5)) (NP (SYN_NNP (NNP
IFRAL-5)))) (SBAR (SYN_IN (TEMP KF-6)) (S (S (NP (SYN_PRP (PRP HM-6)))
(VP (SYN_VB (BN NRFMIM-7))) (NP (SYN_MOD (ADVERB K-8)) (NP (SYN_NN (NN
MTNDBIM-8))))) (SYN_yyCM (yyCM yyCM-9)) (SYN_CC (CC AK-10)) (ADVP
(SYN_RB (RB LMEFH-11))) (VP (SYN_VB (BN MFMFIM-12))) (NP (NP (SYN_NN
(NN EWBDIM-13))) (ADJP (SYN_JJ (JJ FKIRIM-14))) (ADJP (SYN_JJ (JJ
ZWLIM-15)))))) (SYN_yyDOT (yyDOT yyDOT-16))) )


Both can be specified together.

Note: the parser will not look at the coarse-tag, lemma and
morph-feats fields of the lattice. The user should enrich the form and
pos fields prior to calling the parser, if he wants to get information
from the lemma, cpos or morph-feats fields.

Note also that the GrammarTrainer class strips everything after the
first '-',"=" or "^" in the non-terminals (including pos-tags) prior
to training.
This can be disabled with the "-noNormalize" flag to GrammarTrainer
(but then you probably want to edit the trees yourself to have a
reasonable set of non-terminals).
If this behavior is not disabled, and the lattice include tags with
'-', '=' or '^' in them, you probably want to strip the extra-material
from them also, using the -stripTags flag to the main class.

example usage:
java -jar blatt.jar -lattice -gr grammarFile  -externalTokIdsFilename
tokenization.out < latticeFile

Yoav Goldberg (yoav.goldberg@gmail.com)