Gideon Maillette de Buy Wenniger

Assistant Professor at Open University of the Netherlands, Department of Computer Science
Department of Computer Science at OU
Guest Researcher in the the Bernoulli Institute - Artificial Intelligence

Postdoctoral researcher in the Bernoulli Institute - Artificial Intelligence AI at the Bernoulli Institute
Supervised by Prof. Lambert Schomaker
Postdoctoral researcher in the ADAPT center at Dublin City University
Marie Skłodowska-Curie EDGE Fellow in the ADAPT center at Dublin City University
ADAPT Centre at DCU
Supervised by Prof. Andy Way
PhD. in computer science from the University of Amsterdam
M.Sc. in Artificial Intelligence, Intelligent Systems

Visiting/Postal Address:

University of Groningen
Faculty of Science and Engineering
Artificial Intelligence - Bernoulli Institute
Nijenborgh 9, Room 0309
9747 AG Groningen
The Netherlands

Office: Room 0309 (third floor)

About me:

I received my Master diploma (Cum Laude) from the University of Amsterdam in November 2008, after completing a innovative Master Thesis project at the EPFL Lausanne. During my Master thesis research in Lausanne I worked on gesture recognition with Hidden Markov Models. Back in Amsterdam I completed another extensive research project on the topic of obstacle and free space recognition for robot navigation. I published this research together with Dr. Arnoud Visser and Tijn Schmits and presented it in September 2009 at the ECMR conference in Croatia.Starting from may 2009 I worked in Germany at the Rheinische Friedrich-Wilhelms-Universität Bonn. We participated in the Robocup at Home competition of the International Robocup, and in our team I was responsible for our visual methods for people detection and people recognition. After the Robocup I worked on visual methods for object recognition using visual features (SIFT) and support vector machines. On 1 June 2010 I started my work on Statistical Machine Translation at the Institute for Logic, Language and Computation with Dr. Khalil Sima'an , in his project "Machine Translation When Exact Pattern Match Fails" funded by NWO Exact Sciences Free Competition . Following my PhD, I worked as a postdoctoral researcher in ILLC until October 2016. In June 2016 I defended my PhD titled "Aligning the Foundations of Hierarchical Statistical Machine Translation". In November 2016 I started a postdoc with Prof. Andy Way in ADAPT, at Dublin City University, working on hierarchical statistical machine translation and neural machine translation. In 2017, I obtained a Marie Skłodowska-Curie EDGE Grant for my project BAIT: Bilingual Association in Neural Machine Translation [ EDGE project ] , and in May 2017 I started working on this project. Over the last year I invested to become a deep learning expert and expert in pytorch programming, which allows me to implement deep learning models, when necessary from the ground up. This investment is currently starting to pay off, opening up new opportunities for multi-modal deep learning for neural machine translation and handwritten text recognition.

Research Interests and Current Work:

My research interests include machine translation (including syntax, morphology and semantics), handwritten text recognition, deep learning, computer vision, scholarly document processing and general machine learning. My current work focuses on developing new models and techniques for scholarly document processing and (neural) handwritten text recognition. I have a special interest in applying multi-modal techniques to take models in both fields to the next level. More information about my EDGE project can be found at [ EDGE project ]

My Master thesis project consisted of two parts. In the first part I automatically analyzed the structure of Hidden Markov Models (HMMs), and used it to automatically segment gesture sequences into the underlying primitive gestures. In the second part of my project I developed a technique to automatically merge or compress gesture models (HMMs).

Publications:

Pieter Floris Jacobs, Gideon Maillette de Buy Wenniger, Marco Wiering, Lambert Schomaker. 2021. Active learning for reducing labeling effort in text classification tasks. Presented at the Joint International Scientific Conferences on AI BNAIC/BENELEARN 2021. [ Joint International Scientific Conferences on AI BNAIC/BENELEARN 2021. ] [ download paper ] New
Gideon Maillette de Buy Wenniger, Thomas van Dongen, Eleri Aedmaa, Herbert Teun Kruitbosch, Edwin A. Valentijn and Lambert Schomaker. 2020. Structure-Tags Improve Text Classification for Scholarly Document Quality Prediction. First Workshop on Scholarly Document Processing (SDP 2020), at EMNLP 2020. pages 158--167. To be presented at the [ First Workshop on Scholarly Document Processing (SDP 2020), at EMNLP 2020. ] [ download paper ]
Thomas van Dongen, Gideon Maillette de Buy Wenniger and Lambert Schomaker. 2020. SChuBERT: Scholarly Document Chunks with BERT-encoding boost Citation Count Prediction. First Workshop on Scholarly Document Processing (SDP 2020), at EMNLP 2020. pages 148--157. To be presented at the [ First Workshop on Scholarly Document Processing (SDP 2020), at EMNLP 2020. ] [ download paper ]
Santanu Bhattacharjee, Rejwanul Haque, Gideon Maillette De Buy Wenniger and Andy Way. 2020. Investigating Query Expansion and Coreference Resolution in Question Answering on BERT. In: Métais E., Meziane F., Horacek H., Cimiano P. (eds) Natural Language Processing and Information Systems. Lecture Notes in Computer Science, vol 12089. Springer. presented at the [ international conference on applications of natural language to information systems (nldb 2020) ] [ download paper ]
Gideon Maillette de Buy Wenniger, Lambert Schomaker and Andy Way. 2019. "No Padding Please: Efficient Neural Handwriting Recognition" 2019 International Conference on Document Analysis and Recognition (ICDAR). Sydney, Australia. pages 355--362. doi: 10.1109/ICDAR.2019.00064. Presented at the [ International Conference on Document Analysis and Recognition (ICDAR 2019) ] . [ Download paper ] [ Download arXiv pre-publication version ]
Alberto Poncelas, Gideon Maillette de Buy Wenniger and Andy Way. 2019. "Adaptation of Machine Translation Models with Back-translated Data using Transductive Data Selection Methods" The 20th International Conference on Intelligent Text Processing and Computational Linguistics, CICLing 2019, France.
Alberto Poncelas, Gideon Maillette de Buy Wenniger and Andy Way. 2018. "Data Selection with Feature Decay Algorithms Using an Approximated Target Side." The 15th International Workshop on Spoken Language Translation 2018, Belgium. [ Download paper ]
Alberto Poncelas, Gideon Maillette de Buy Wenniger and Andy Way. 2018. "Feature Decay Algorithms for Neural Machine Translation." The 21st Annual Conference of The European Association for Machine Translation. Alicante, Spain. [ Download paper ]
Alberto Poncelas, Dimitar Shterionov, Andy Way, Gideon Maillette de Buy Wenniger and Peyman Passban. 2018. "Investigating Backtranslation in Neural Machine Translation." The 21st Annual Conference of The European Association for Machine Translation. Alicante, Spain. [ Download paper ]
Gideon Maillette de Buy Wenniger, Khalil Sima'an and Andy Way. 2017. "Elastic-substitution decoding for Hierarchical SMT: efficiency, richer search and double labels." MT Summit. pages 201--215. September 2017 [ Download paper ] [ Bibtex ] [ Presentation ] [ Code ]
Alberto Poncelas, Gideon Maillette de Buy Wenniger and Andy Way. 2017. "Applying N-gram Alignment Entropy to Improve Feature Decay Algorithms." Prague Bulletin of Mathematical Linguistics 108:245--256. [ Download paper ]
Gideon Maillette de Buy Wenniger and Khalil Sima'an. "Labeling hierarchical phrase-based models without linguistic resources". Machine Translation. pages 1-41. January 2016. [ Download paper ] [ Bibtex ] [ Code ]
Gideon Maillette de Buy Wenniger and Khalil Sima'an. "Bilingual Markov Reordering Labels for Hierarchical SMT ". Eight Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-8). pages 11-21. October 2014. [ Download paper ] [ Bibtex ] [ Code ]
Gideon Maillette de Buy Wenniger and Khalil Sima'an. "Visualization, Search and Analysis of Hierarchical Translation Equivalence in Machine Translation Data". The Prague Bulletin of Mathematical Linguistics. Number 101, pages 43-54. April 2014. [ Download paper ] [ Bibtex ] [ Code ]
Gideon Maillette de Buy Wenniger and Khalil Sima'an. "Hierarchical Alignment Decomposition Labels for Hiero Grammar Rules". Seventh Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-7). June 2013. [ Download paper ] [ Bibtex ] [ Code ]
Gideon Maillette de Buy Wenniger and Khalil Sima'an. "A Formal Characterization of Parsing Word Alignments by Synchronous Grammars with Empirical Evidence to the ITG Hypothesis". Seventh Workshop on Syntax, Semantics and Structure in Statistical Translation (SSST-7). June 2013. [ Download paper ] [ Bibtex ]
Gideon Maillette de Buy Wenniger,Maxim Khalilov and Khalil Sima'an. "A Toolkit for Visualizing the Coherence of Tree-based Reordering with Word-Alignments". The Prague Bulletin of Mathematical Linguistics.Number 94, pages 97 - 106. September 2010. [ Download paper ]

Bibtex

Original source code

Source code embedded in hatparsing repository

Gideon Maillette de Buy Wenniger, Tijn Schmits and Arnoud Visser, "Identifying Free Space in a Robot Bird-Eye View", Proceedings of the 4th European Conference on Mobile Robots (ECMR 2009), p. 13-18, Ivan Petrović, Achim J. Lilienthal Eds., Mlini/Dubrovnik, Croatia, September 2009. [ Download paper ] [ Bibtex ]

PhD thesis:

Gideon Maillette de Buy Wenniger. "Aligning the Foundations of Hierarchical Statistical Machine Translation". PhD thesis. June 2016. [ Download thesis ]

Diploma thesis:

Gideon Maillette de Buy Wenniger. Hidden Markov Model structure analysis and simplification for Gesture Recognition. Master Thesis. University of Amsterdam, 2010.
[ Download thesis ]

Teaching:

Machine Learning Learning course 2017 (CA684), Dublin City University, lead by Prof. Qun Liu [ Qun Liu's website at researchgate ]

Elements of Language Processing and Learning 2012/13 [ Go to ELPL 2012 Lab site ]

Introductory (guest) lecture Machine Translation (22-2-2012) [ Download the slides ]

Elements of Language Processing and Learning 2011/12 [ Go to ELPL 2011 Lab site ]
Elements of Language Processing and Learning 2010/11 [ Go to ELPL 2010 site ]

Software:

Multi-Hare. Software belonging to: "No Padding Please: Efficient Neural Handwriting Recognition." [ Go to project page ]
For an overview of theory functionality, see also our poster presented at ICDAR 2019 in Sydney [ Download ICDAR 2019 poster ]

Labeled Translation. The software belonging to the work on hierarchical translation with labels derived from hierarchical alignment trees, as described in the papers: "Elastic-substitution decoding for Hierarchical SMT: efficiency, richer search and double labels." "Labeling hierarchical phrase-based models without linguistic resources". Machine Translation." "Bilingual Markov Reordering Labels for Hierarchical SMT." (see publications) [ Go to project page ]

HAT Parsing. Visualization, Search and Analysis of Hierarchical Translation Equivalence in Machine Translation Data. [ Go to project page ]
For an overview of theory functionality, see also our poster presented at MT Marathon 2013 in Prague [ Download MTMarathon 2013 poster ]

Tree Alignment Visualizer. Tools for the visualization of the coherence between aligned sentence pairs and their (source) constituency parses. (Source code now part of "HAT Parsing" project) [ Go to project page ]

Selected older reports from Bachelor and Master:

Gideon Maillette de Buy Wenniger and Attila Houtkooper. GOAP - Goal Oriented Action Planning. Master Project (Game Programming : AI for Quake3 Arena). University of Amsterdam, 2008.
[ Download report ]

Sophie Arnoult, Gideon Maillette de Buy Wenniger and Andrea Schuch. Statistical Machine Translation. Master Project (IBM Models). University of Amsterdam, 2007.
[ Download report ]

Attila Houtkooper and Gideon Maillette de Buy Wenniger. Multi-Agengt Quest Learning. Bachelor Thesis. University of Amsterdam, 2005.
[ Download bachelor thesis (Dutch) ]

Attila Houtkooper, Gideon Maillette de Buy Wenniger, Mehmet Oktener and Peter van Kees. Interapy. Second year AI Bachlor project and internship. University of Amsterdam, 2004.
[ Download report (Dutch) ]

Recent Developments:

11-11-2021

We have a new paper on Active Learning, which my student Pieter Jacobs presented today at BNAIC/BENELEARN 2021 in Luxembourg.
[ BNAIC/BENELEARN 2021 ]
[ Our paper on arXiv ]

26-12-2020

I was consulted as a deep learning expert by the Dutch newspaper Dagblad van het Noorden by journalist Koen Marée on the topic of deep fakes. My comments on how we might defend agains deep fakes are available from the resulting article:
[ Article deep fakes 26-12-2020, Dagblad van het Noorden ]
‘De wetenschap werkt al veel langer aan de technologie, vertelt Gideon Maillette de Buy Wenniger. Aan het RUG-instituut voor kunstmatige intelligentie doet hij onderzoek naar deep learning. ,,In 1940 keek men al hoe computers het menselijk brein zouden kunnen nabootsen. Sinds 2006 spreken we van een nieuwe golf van deep learning. Zowel de algoritmes als de mogelijkheden voor computers om berekeningen te maken worden heel snel beter."
In de uitzending van Lubach lag de nadruk vooral op de 'enge' kant van de technologie. Deepfakes geven ruim baan aan oplichters die zich voor kunnen doen als iemand anders om geld af te troggelen, aan het creëren van nepporno waar iemands naam mee kan worden beschadigd, of zelfs het beïnvloeden van politiek door een gedeepfakete president gekke uitspraken te laten doen.
Daar worstelt Maillette de Buy Wenniger ook mee. Zelf promoveerde hij op het gebied van automatische vertaling, een veld wat door de overgang naar deep learning in de laatste paar jaar een revolutie doormaakte en reuzenstappen zette in de kwaliteit van gegenereerde vertalingen. ,,Afgezien van dat het wel een bedreiging vormt voor de baan van mensen in de vertaalindustrie, is het breed beschikbaar komen van hoog kwalitatieve, real-time automatische vertaling een ontwikkeling met veel positieve kanten. Het probleem is dat de onderliggende deep learning technologie veel algemener is en vrij toegepast kan worden. Dat brengt risico's met zich mee.
...
"Een alternatief is daarom een aanpak die zich meer richt op de bron van de video. Je zou een 'hash' aan een video- of afbeeldingsbestand kunnen toevoegen. Dat is een speciale code die vastlegt hoe het originele bestand eruit zag, en die meteen op internet gepubliceerd wordt." In dat geval kan, bij twijfel, gecheckt worden of het bijvoorbeeld om een echte of gedeepfakete uitspraak van een politicus gaat.’

30-1-2020

Presented our work on "Predicting the number of citations of scientific articles with shallow and deep model" at CLIN 2020.
https://clin30.sites.uu.nl/programme/detailed/

23-9-2019

Presented our work "No Padding Please: Efficient Neural Handwriting Recognition"" at ICDAR 2019.

1-3-2019

Our new paper "No Padding Please: Efficient Neural Handwriting Recognition", which proposes new methods for efficient neural handwriting recognition with multi-dimenisional long short-term memories (MDLSTMs) is now on arXiv. This work also involves an efficient reimplementation of MDLSTMs from scratch in PyTorch, and a large number of experiments and comparisons against literature results on the popular multi-writer IAM (handwriting) database.

22-2-2019

Our paper "Adaptation of Machine Translation Models with Back-translated Data using Transductive Data Selection Methods" got accepted at CICLing 2019.

21-2-2019

Presented the continued work on handwriting recognition with minimal padding in an invited talk for the research team lead by Prof. Dr. Ing. Rozenn DAHYOT, at Trinity College Dublin.

31-1-2019

Presented our work on handwriting recognition with minimal padding at CLIN 2019 in Groningen.
[ CLIN 2019 website ]

30-4-2018

Two of our papers got accepted at EAMT 2018 [ EAMT 2018 ].

6-8-2010

Made a fix for loading of the optimizer state for Adam in opennmt_py. For the opennmt neural machine translation project [ OpenNMT main website ]
[ Issue and fix in the opennmt_py open neural machine translation repository ].

21-9-2017

Presented new paper "Elastic-substitution decoding for Hierarchical SMT: efficiency, richer search and double labels" at MT Summit, in Nagoya, Japan. [ MT Summit 2017 ].

17-7-2017 -- 21-7-2017

Attended the International Summer School on Deep Learning 2017 in Bilbao, Spain [ DeepLearn 2017 ].

6-8-2010 - Added support for m-n alignments to the tool

Software:

Based on software developed by Federico Sangati and in close collaboration with him, I developed an extension to his Tree visualization tool to allow the simultaneous visualization of source parse trees and the associated word alignments for SMT.

Tree Alignment Violations

Recently a feature was added that allows visualizing the alignment constraint violations, assuming a reordering model that allows children of every node only to be permuted. Once a given source node n and its descending terminals "claim" a certain range in the target sentence, any source word outside the subtree rooted at n that tries to align within the same range, causes a crossing of alignments and an alignment violation. Alignment violations are indicated by pink, the offending words are drawn in pink and aligned by striped alignment lines for clearity. Furthermore, the words that cause the alignment violations with a certain subtree are indicated behind the root node of this subtree. We are still thinking how to optimize the visualization for clarity and avoiding overlap with parts of the tree.
The original alignment:

The alignment with viualization of constraint violations turned on:

23-2-2010 As a next step I implemented a method to perform and visualize the reordering of the tree by means of child node permutations, bringing the source words as far as possible in the order of the aligned target words under the constraint that only tree child node permutations may be performed. First I restricted the permutations to only non-violated nodes (those whose "claimed" alignment span is not aligned to by other source words outside the subtree rooted at that node), however, as can be seen from the example below, any reordering that is allowed under the child node permutation constraints and improves the order should probably just be done.

28-7-2010 In addition to adding colors to better emphasize the reordered nodes, I added the functionality of batch processing of the sentences for reordering. Furthermore I added a configfile to the system, which makes it possible to automatically load everything without cumbersome manual file selection (which really gets nerving after some time...). The result is shown below.

- Put tool on google code project ,

[ Download Alignment Visualizer Package ]

University of Amsterdam, Science Faculty Institute for Logic Language and Computation