This was going to be an important week for OpenAIwith the bulk of the announcements planned for 12 days at a time, which started last week, taking center stage in the technological media. But, amid the lackluster nature of its first announcements—a subscription to 200 dollars a month or the launch of a sora which does not seem to have improved in the almost a year since its presentation—and those being carried out by the competition, is taking a backseat.
Yesterday, xthe old Twittermade AI available to all its free account users Grokprovided with a great new image generator that is causing a sensation. and today, Google has announced the launch of the first model of its new family of LLMs—acronym in English for Large Language Models—, Gemini 2.0in its version Flash.
Gemini 2.0 Flash is now available to all users, as one more option among the language models offered by the company — the others are 1.5 Flash and, for subscribers of Gemini Advanced, 1.5Pro— in the web version of Gemini. In January 2025 it will reach mobile apps.
And what improvements does Gemini 2.0 Flash bring?
As explained Demis HassabisCEO of Google DeepMindGemini 2.0 Flash ‘is as good as the current Pro model (1.5). So you can consider it as a complete level higher, with the same cost efficiency and performance, and with greater speed. We are very happy with that.’ This means that free account users can now access a model as good as the one used by paying subscribers, 10 months after its launch, and that these, before long, They will have a Gemini 2.0 Pro that justifies the monthly Gemini Advanced fee.
According to Google, Gemini 2.0 Flash, which is still considered experimental, offers better results than 1.5 Flash and 1.5 Pro on a series of general-purpose achievement tests and in areas such as coding, mathematics or reasoning.
But, beyond surpassing Gemini 1.5 Pro, there are new functions. In addition to its better performance —2x faster than 1.5 Pro— and low latency, Gemini 2.0 Flash includes native support for multimodal output. This allows generating native images mixed with text and adjustable multilingual text-to-speech audio, in addition to support multimodal inputs such as images, videos and audio. This new model can also make native calls to tools, including Google Search, code execution, and other functions.
The era of AI agents
According to Hassabis, Gemini 2.0’s multimodal capabilities set the stage for what will be the next trend with artificial intelligence: intelligent agents. AI agents are bots that can perform tasks on behalf of the user. Now, with Gemini 2, these agents, which were announced in the past Google I/OGoogle’s annual developer conference, are being tested by ‘trusted tester groups’. Among these intelligent agents are:
- Project Astra It is a system of visual recognition which can identify objects, help you navigate the world, and even tell you where you left your glasses. You can converse in multiple languages, use Google Search, Lens and Maps and has up to 10 minutes of memory in session.
- Project Mariner is an AI agent in the form of extension for chrome who understands and reasons about the information on the screen and can use the browser for you.
- Jules is an AI agent designed to help developers find and fix errors in the code.
Other examples, in other areas, are agents that help the user navigate virtual worlds in video games or also in the real world.
‘We really see 2025 as the true beginning of the agent era and Gemini 2.0 is the foundation of that,’ Hassabis said.
Gemini 2.0 will also be incorporated to the results generated by AI in the Google Searchwhich have not yet arrived in Spain, and in many of the Google applications, such as those of Workspace, Notebook either Pinpoint.