Brain power

Google MultiModel: a potentially significant advance for artificial intelligence (AI)

Deep learning has seen great success across many fields, for example in speech recognition, image classification, and translation. However, the design and tuning effort needs to be repeated for each new task, limiting the impact of deep learning. The current approach is also very different from the general nature of the human brain, which can learn many different tasks and benefits from transfer learning.

In response, a newly published Google study1 asks, “Can we create a unified deep learning model to solve tasks across multiple domains?”

A step towards positively answering this question has been taken by introducing the “MultiModel” architecture, a single deep-learning model that can simultaneously learn multiple tasks from various domains. Specifically, MultiModel was built using TensorFlow and trained simultaneously across eight domains, being ImageNet, multiple translation tasks, image captioning, speech recognition, and English parsing.

The results were as follows:

  • MultiModel learns all of the tasks and achieves good performance. This performance is not state-of-the-art at present, but is above many task-specific models studied in the recent past. The model is expected to come closer to state-of-the-art with more tuning.
  • Two key insights are crucial to making MultiModel work, and are the main contributions of the study: (1) small modality-specific sub-networks convert into a unified representation and back from it, and (2) computational blocks of different kinds are crucial for good results on various problems. (To allow training on input data of widely different sizes and dimensions, such as images, sound waves and text, sub-networks are needed to convert inputs into a joint representation space.)
  • Adding computational blocks doesn’t hurt performance, even on tasks they were not designed for. In fact, both attention and mixture-of-experts layers slightly improve performance of MultiModel on ImageNet, the task that needs them the least.
  • The MultiModel performs similarly to single-model on large tasks, and better, sometimes significantly, on tasks where less data is available, such as parsing.
  • Mixing different computation blocks is in fact a good way to improve performance on many various tasks.
  • The key to success comes from designing a multi-modal architecture in which as many parameters as possible are shared and from using computational blocks from different domains together.

To enable other people to experiment with the code, it is being made available on the TensorFlow GitHub site.

Article sources: CIO Dive, VentureBeat.

Header image source: Adapted from Google by Carlos Luna, which is licensd by CC BY 2.0.

Reference:

  1. Kaiser, L., Gomez, A.N., Shazeer,N., Vaswani, A., Parmar, N., Jones, L., and Uszkoreit, J. (2017). One Model To Learn Them All. arXiv:1706.05137

Also published on Medium.

Bruce Boyes

Bruce Boyes (www.bruceboyes.info) is editor, lead writer, and a director of the award-winning RealKM Magazine (www.realkm.com) and currently also teaches in the University of NSW (UNSW) Foundation Studies program in China. He has expertise and experience in a wide range of areas including knowledge management (KM), environmental management, program and project management, writing and editing, stakeholder engagement, communications, and research. Bruce holds a Master of Environmental Management with Distinction and a Certificate of Technology (Electronics). With a demonstrated ability to identify and implement innovative solutions to social and ecological complexity, Bruce's many career highlights include establishing RealKM Magazine as an award-winning resource for knowledge managers, using agile and knowledge management approaches to oversee the implementation of an award-winning $77.4 million river recovery program in western Sydney on time and under budget, leading a knowledge strategy process for Australia's 56 natural resource management (NRM) regional organisations, pioneering collaborative learning and governance approaches to support communities to sustainably manage landscapes and catchments, and initiating and teaching two new knowledge management subjects at Shanxi University in China.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to top button