Kyoto University Participation to WAT 2016 - It works!

Essentially an implementation of (Bahdanau et al., 2015); Implemented in Python with the Chainer library. KyotoEBMT. The important details. Input: ウイスキーは ...
展开查看详情

1.Kyoto University Participation to WAT 2016 Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi Src 本フローセンサーの型式と基本構成,規格を 図示 , 紹介。 Ref Shown here are type and basic configuration and standards of this flow with some diagrams . EBMT This flow sensor type and the basic composition, standard is illustrated , and introduced. NMT This paper introduces the type, basic configuration, and standards of this flow sensor. fabien@pa.jst.jp chu@pa.jst.jp nakazawa@pa.jst.jp kuro@i.Kyoto-u.ac.jp Results EBMT vs NMT EBMT: less fluent NMT: more under/over-translation issues LSTM LSTM LSTM LSTM 私 は 学生 です Source Embedding <620> <620> <620> <620> LSTM LSTM LSTM LSTM <1000> <1000> <1000> <1000> <1000> <1000> <1000> <1000> I am a Student <620> <1000> LSTM <1000> <1000> <1000> Attention Model maxout <500> softmax <30000> <3620> <2620> Target Embedding Encoding of input Encoder Current context Previous state new state concatenation Previously generated word New word Decoder We used mostly the network size used in the original paper as shown in picture above Depending on experiments, we changed (see Result column for details): multi-layer LSTM larger source and target vocabulary size KyotoNMT Example-based Machine Translation Tree-to-Tree Uses dependency trees for both source and target side Regularization weight decay dropout early stopping random noise on previous word embedding Training algorithm ADAM Essentially an implementation of ( Bahdanau et al., 2015) Implemented in Python with the Chainer library KyotoEBMT The important details Input: ウイスキーはオオムギから製造される whisky is produced from barley Output: whisky is produced from barley the 水素 は 現在 天然ガス や 石油 から 製造 さ れる hydrogen is produced from natural gas and petroleum at present ウイスキー を 調査 した We investigated whisky ウイスキー は オオムギ から 製造 さ れる オオムギ barley Beam-search normalizing the loss by length Ensembling ensembling of several models or self- ensembling Segmentation Automatic segmentation via JUMAN or KyotoMorph Or subword units with BPE # layers Source Vocabulary Target Vocabulary Ensembling NMT 1 2 200k (JUMAN) 52k ( BPE) - NMT 2 1 30k (JUMAN) 30k (words) x4 # layers Source Vocabulary Target Vocabulary Ensembling NMT 1 2 52k (BPE) 52k ( BPE) - # layers Source Vocabulary Target Vocabulary Ensembling NMT 1 2 30k (JUMAN) 30k ( KyotoMorph ) - # layers Source Vocabulary Target Vocabulary Ensembling NMT 1 2 30k ( KyotoMorph ) 30k ( JUMAN) x2 NMT 2 2 200k ( KyotoMorph ) 50k (JUMAN) - Ja -> En BLEU AM-FM Pairwise JPO Adequacy EBMT 21.22 59.52 - - NMT 1 24.71 56.27 47.0 (3/9) 3.89 (1/3) NMT 2 26.22 55.85 44.25 (4/9) - En -> Ja BLEU AM-FM Pairwise JPO Adequacy EBMT 31.03 74.75 - - NMT 1 36.19 73.87 55.25 (1/10) 4.02 (1/4) Ja -> Zh BLEU AM-FM Pairwise JPO Adequacy EBMT 30.27 76.42 30.75 (3/5) - NMT 1 31.98 76.33 58.75 (1/5) 3.88 (1/3) Zh -> Ja BLEU AM-FM Pairwise JPO Adequacy EBMT 36.63 76.71 - - NMT 1 46.04 78.59 63.75 (1/9) 3.94 (1/3) NMT 2 44.29 78.44 56.00 (2/9) - Code available (GPL) Random noise on previous word embedding in the hope of reducing cascading errors at translation time we add noise to the target word embedding at training time works well, but maybe just a regularization effect During our experiments, we found that using these settings appropriately had a significant impact on final results: Training done on a NVIDIA Titan X (Maxwell) from 2 days for single-layer model on ASPEC Ja- Zh to 2 weeks for multi-layer model on ASPEC Ja- En KyotoEBMT : http://lotus.kuee.kyoto-u.ac.jp/~john/kyotoebmt.html KyotoNMT : https://github.com/fabiencro/knmt Conclusion and Future Work Very good results with Neural Machine Translation especially for Zh -> Ja Long training times mean that we could not test every combination of setting for each language pair Some possible future improvements: Adding more linguistic aspects Adding newly proposed mechanisms ( c opy mechanism, etc.)