Kyoto University Participation to WAT 2016 - It works!

下载 0

快召唤伙伴们来围观吧
微博 QQ QQ空间 贴吧
文档嵌入链接
<iframe src="https://www.slidestalk.com/u27/kyoto_university_participation_to_wat_2016_it_works_xov9aw?embed" frame border="0" width="640" height="360" scrolling="no" allowfullscreen="true">复制
微信扫一扫分享
已成功复制到剪贴板

Arachchi

发布于

7年前

2013

人观看

#信息技术

Essentially an implementation of (Bahdanau et al., 2015); Implemented in Python with the Chainer library. KyotoEBMT. The important details. Input: ウイスキーは ...

展开查看详情

1 .Kyoto University Participation to WAT 2016 Fabien Cromieres Chenhui Chu Toshiaki Nakazawa Sadao Kurohashi Src 本フローセンサーの型式と基本構成，規格を図示 , 紹介。 Ref Shown here are type and basic configuration and standards of this flow with some diagrams . EBMT This flow sensor type and the basic composition, standard is illustrated , and introduced. NMT This paper introduces the type, basic configuration, and standards of this flow sensor. fabien@pa.jst.jp chu@pa.jst.jp nakazawa@pa.jst.jp kuro@i.Kyoto-u.ac.jp Results EBMT vs NMT EBMT: less fluent NMT: more under/over-translation issues LSTM LSTM LSTM LSTM 私は学生です Source Embedding <620> <620> <620> <620> LSTM LSTM LSTM LSTM <1000> <1000> <1000> <1000> <1000> <1000> <1000> <1000> I am a Student <620> <1000> LSTM <1000> <1000> <1000> Attention Model maxout <500> softmax <30000> <3620> <2620> Target Embedding Encoding of input Encoder Current context Previous state new state concatenation Previously generated word New word Decoder We used mostly the network size used in the original paper as shown in picture above Depending on experiments, we changed (see Result column for details): multi-layer LSTM larger source and target vocabulary size KyotoNMT Example-based Machine Translation Tree-to-Tree Uses dependency trees for both source and target side Regularization weight decay dropout early stopping random noise on previous word embedding Training algorithm ADAM Essentially an implementation of ( Bahdanau et al., 2015) Implemented in Python with the Chainer library KyotoEBMT The important details Input: ウイスキーはオオムギから製造される whisky is produced from barley Output: whisky is produced from barley the 水素は現在天然ガスや石油から製造される hydrogen is produced from natural gas and petroleum at present ウイスキーを調査した We investigated whisky ウイスキーはオオムギから製造されるオオムギ barley Beam-search normalizing the loss by length Ensembling ensembling of several models or self- ensembling Segmentation Automatic segmentation via JUMAN or KyotoMorph Or subword units with BPE # layers Source Vocabulary Target Vocabulary Ensembling NMT 1 2 200k (JUMAN) 52k ( BPE) - NMT 2 1 30k (JUMAN) 30k (words) x4 # layers Source Vocabulary Target Vocabulary Ensembling NMT 1 2 52k (BPE) 52k ( BPE) - # layers Source Vocabulary Target Vocabulary Ensembling NMT 1 2 30k (JUMAN) 30k ( KyotoMorph ) - # layers Source Vocabulary Target Vocabulary Ensembling NMT 1 2 30k ( KyotoMorph ) 30k ( JUMAN) x2 NMT 2 2 200k ( KyotoMorph ) 50k (JUMAN) - Ja -> En BLEU AM-FM Pairwise JPO Adequacy EBMT 21.22 59.52 - - NMT 1 24.71 56.27 47.0 (3/9) 3.89 (1/3) NMT 2 26.22 55.85 44.25 (4/9) - En -> Ja BLEU AM-FM Pairwise JPO Adequacy EBMT 31.03 74.75 - - NMT 1 36.19 73.87 55.25 (1/10) 4.02 (1/4) Ja -> Zh BLEU AM-FM Pairwise JPO Adequacy EBMT 30.27 76.42 30.75 (3/5) - NMT 1 31.98 76.33 58.75 (1/5) 3.88 (1/3) Zh -> Ja BLEU AM-FM Pairwise JPO Adequacy EBMT 36.63 76.71 - - NMT 1 46.04 78.59 63.75 (1/9) 3.94 (1/3) NMT 2 44.29 78.44 56.00 (2/9) - Code available (GPL) Random noise on previous word embedding in the hope of reducing cascading errors at translation time we add noise to the target word embedding at training time works well, but maybe just a regularization effect During our experiments, we found that using these settings appropriately had a significant impact on final results: Training done on a NVIDIA Titan X (Maxwell) from 2 days for single-layer model on ASPEC Ja- Zh to 2 weeks for multi-layer model on ASPEC Ja- En KyotoEBMT : http://lotus.kuee.kyoto-u.ac.jp/~john/kyotoebmt.html KyotoNMT : https://github.com/fabiencro/knmt Conclusion and Future Work Very good results with Neural Machine Translation especially for Zh -> Ja Long training times mean that we could not test every combination of setting for each language pair Some possible future improvements: Adding more linguistic aspects Adding newly proposed mechanisms ( c opy mechanism, etc.)

1点赞

0收藏

0下载