Google MultiModel: a potentially significant advance for artificial intelligence (AI)

Bruce Boyes22 Jun 2017

700 2 minutes read

Deep learning has seen great success across many fields, for example in speech recognition, image classification, and translation. However, the design and tuning effort needs to be repeated for each new task, limiting the impact of deep learning. The current approach is also very different from the general nature of the human brain, which can learn many different tasks and benefits from transfer learning.

In response, a newly published Google study¹ asks, “Can we create a unified deep learning model to solve tasks across multiple domains?”

A step towards positively answering this question has been taken by introducing the “MultiModel” architecture, a single deep-learning model that can simultaneously learn multiple tasks from various domains. Specifically, MultiModel was built using TensorFlow and trained simultaneously across eight domains, being ImageNet, multiple translation tasks, image captioning, speech recognition, and English parsing.

The results were as follows:

MultiModel learns all of the tasks and achieves good performance. This performance is not state-of-the-art at present, but is above many task-specific models studied in the recent past. The model is expected to come closer to state-of-the-art with more tuning.

Two key insights are crucial to making MultiModel work, and are the main contributions of the study: (1) small modality-specific sub-networks convert into a unified representation and back from it, and (2) computational blocks of different kinds are crucial for good results on various problems. (To allow training on input data of widely different sizes and dimensions, such as images, sound waves and text, sub-networks are needed to convert inputs into a joint representation space.)

Adding computational blocks doesn’t hurt performance, even on tasks they were not designed for. In fact, both attention and mixture-of-experts layers slightly improve performance of MultiModel on ImageNet, the task that needs them the least.

The MultiModel performs similarly to single-model on large tasks, and better, sometimes significantly, on tasks where less data is available, such as parsing.

Mixing different computation blocks is in fact a good way to improve performance on many various tasks.

The key to success comes from designing a multi-modal architecture in which as many parameters as possible are shared and from using computational blocks from different domains together.

To enable other people to experiment with the code, it is being made available on the TensorFlow GitHub site.

Article sources: CIO Dive, VentureBeat.

Header image source: Adapted from Google by Carlos Luna, which is licensd by CC BY 2.0.

Reference:

Kaiser, L., Gomez, A.N., Shazeer,N., Vaswani, A., Parmar, N., Jones, L., and Uszkoreit, J. (2017). One Model To Learn Them All. arXiv:1706.05137 ↩

Rate this post

Also published on Medium.

Cookie	Duration	Description
cookielawinfo-checkbox-advertisement	1 year	Set by the GDPR Cookie Consent plugin, this cookie is used to record the user consent for the cookies in the "Advertisement" category .
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Cookie	Duration	Description
_ga	2 years	The _ga cookie, installed by Google Analytics, calculates visitor, session and campaign data and also keeps track of site usage for the site's analytics report. The cookie stores information anonymously and assigns a randomly generated number to recognize unique visitors.
_gid	1 day	Installed by Google Analytics, _gid cookie stores information on how visitors use a website, while also creating an analytics report of the website's performance. Some of the data that are collected include the number of visitors, their source, and the pages they visit anonymously.

Bruce Boyes

Related Articles

AI is closer than ever to passing the Turing test for ‘intelligence’. What happens when it does?

Study reveals ChatGPT performs better on writing than students

The ‘dead internet theory’ makes eerie claims about an AI-run web. The truth is more sinister

AI-based KM features for knowledge co-development and exchange [Generative AI & KM series part 3]