Google has unveiled its latest large language model, called Gemini, marking the beginning of a new era of AI. Gemini comes in different versions, including a lighter model called Gemini Nano for offline use on Android devices, a more powerful Gemini Pro for Google AI services and Bard, and the ultra-capable Gemini Ultra for data centers and enterprise applications.
The model is being launched through various means, such as powering Bard with Gemini Pro and introducing new features for Pixel 8 Pro users with Gemini Nano. Developers and enterprise customers will gain access to Gemini Pro through Google Generative AI Studio or Vertex AI in Google Cloud from December 13th. Although currently available only in English, it will eventually integrate with Google's search engine, ad products, Chrome browser, and more worldwide. This marks a significant leap forward for Google, with the model set to influence a wide array of its products and services.
OpenAI launched ChatGPT a year and a week ago, and the company and product immediately became the biggest things in AI. Now, Google — the company that created much of the foundational technology behind the current AI boom, that has called itself an “AI-first” organization for nearly a decade, and that was clearly and embarrassingly caught off guard by how good ChatGPT was and how fast OpenAI’s tech has taken over the industry — is finally ready to fight back.
So, let’s just get to the important question, shall we? OpenAI’s GPT-4 versus Google’s Gemini: ready, go. This has very clearly been on Google’s mind for a while. “We’ve done a very thorough analysis of the systems side by side, and the benchmarking,” Hassabis says. Google ran 32 well-established benchmarks comparing the two models, from broad overall tests like the Multi-task Language Understanding benchmark to one that compares the two models’ ability to generate Python code. “I think we’re substantially ahead on 30 out of 32” of those benchmarks, Hassabis says, with a bit of a smile on his face. “Some of them are very narrow. Some of them are larger.”
Google say
In those benchmarks (which really are mostly very close) Gemini’s clearest advantage comes from its ability to understand and interact with video and audio. This is very much by design: multimodality has been part of the Gemini plan from the beginning. Google hasn’t trained separate models for images and voice, the way OpenAI created DALL-E and Whisper; it built one multisensory model from the beginning. “We’ve always been interested in very, very general systems,” Hassabis says. He’s especially interested in how to mix all of those modes — to collect as much data as possible from any number of inputs and senses and then give responses with just as much variety.
Right now, Gemini’s most basic models are text in and text out, but more powerful models like Gemini Ultra can work with images, video, and audio. “it’s going to get even more general than that,” Hassabis says. “There’s still things like action, and touch — more like robotics-type things.” Over time, he says, Gemini will get more senses, become more aware, and become more accurate and grounded in the process. “These models just sort of understand better about the world around them.” These models still hallucinate, of course, and they still have biases and other problems. But the more they know, Hassabis says, the better they’ll get.
“These mo
Benchmarks are just benchmarks, though, and ultimately, the true test of Gemini’s capability will come from everyday users who want to use it to brainstorm ideas, look up information, write code, and much more. Google seems to see coding in particular as a killer app for Gemini; it uses a new code-generating system called AlphaCode 2 that it says performs better than 85 percent of coding competition participants, up from 50 percent for the original AlphaCode. But Pichai says that users will notice an improvement in just about everything the model touches.
Equally important to Google is that Gemini is apparently a far more efficient model. It was trained on Google’s own Tensor Processing Units and is both faster and cheaper to run than Google’s previous models like PaLM. Alongside the new model, Google is also launching a new version of its TPU system, the TPU v5p, a computing system designed for use in data centers for training and running large-scale models.
Talking to Pichai and Hassabis, it’s clear that they see the Gemini launch both as the beginning of a larger project and as a step change in itself. Gemini is the model Google has been waiting for, the one it has been building toward for years, maybe even the one it should have had ready before OpenAI and ChatGPT took over the world.
Google, which declared a “code red” after ChatGPT’s launch and has been perceived to be playing catch-up ever since seems to be still trying to hold fast to its “bold and responsible” mantra. Hassabis and Pichai both say they’re not willing to move too fast just to keep up, especially as we get closer to the ultimate AI dream: artificial general intelligence, the term for an AI that is self-improving, smarter than humans, and poised to change the world. “As we approach AGI, things are going to be different,” Hassabis says. “It’s kind of an active technology, so I think we have to approach that cautiously. Cautiously, but optimistically.”
Google says it has worked hard to ensure Gemini’s safety and responsibility, both through internal and external testing and red-teaming. Pichai points out that ensuring data security and reliability is particularly important for enterprise-first products, which is where most generative AI makes its money. However, Hassabis acknowledges that one of the risks of launching a state-of-the-art AI system is that it will have issues and attack vectors no one could have predicted. “That’s why you have to release things,” he says, “to see and learn.” Google is taking the Ultra release particularly slowly; Hassabis compares it to a controlled beta, with a “safer experimentation zone” for Google’s most capable and unrestrained model. Basically, if there’s a marriage-ruining alternate personality inside Gemini, Google is trying to find it before you do.
For years, Pichai and other Google executives have waxed poetic about the potential for AI. Pichai himself has said more than once that AI will be more transformative to humanity than fire or electricity. In this first generation, the Gemini model may not change the world. Best-case scenario, it might just help Google catch up to OpenAI in the race to build great generative AI. (Worst-case scenario, Bard stays boring and mediocre, and ChatGPT keeps winning.) But Pichai, Hassabis, and everyone else at Google seem to think this is the beginning of something truly huge. The web made Google a tech giant; Gemini could be even bigger.