The news from Google I/O: launch of the AI ​​Search Engine, Project Astra, Veo vs Sora and the new Gemini

Google has started its annual developers conference this afternoon, Google I/O. It is an event whose prominence, traditionally, has been monopolized Android and the Google application ecosystem, but last year it already granted an important part of the space at the inaugural conference to AI. In this 2024, practically nothing else has been talked about and has left an important string of announcements. Many, as is often the case in this field, still have availability in the future or very limited in the present, but there have also been specific products that can already be used.

AI Overviews, AI is now available in Google Search, in the United States

Google announced in the past I/O the arrival of artificial intelligence to the search engine, which represents a substantial change in the product and in Google's business model based on advertising. During the past year it has become known as Google Search Generative Experience, SGEand has been available to limited users in Search Labs. Now It is launched free-to-air for the United States and in Englishother countries and languages ​​will arrive later.

Gemini arrives at the Seeker.Google.

The Google Search engine does not renounce ordered lists of websites so that the user can choose what seems most appropriate, but rather includes a new module called AI Overviewssomething like IA Overview, which is a response to the user's search in natural language and with links. It uses a version of Gemini adapted for this purpose, and is multimodal. That is, it can understand various formats, not just text. Also audio and image.

The doubt that grips Google is How AI Can Impact Your Advertising Business. According to Liz Reidvice president of searches for the company, The links displayed by AI Overviews receive more clicks than those included in the traditional results listbut it is worth asking why a user would click on the link and visit another website with more ads if you already have the information you need chewed by the AI ​​and available in the search engine. Another case is that of searches related to purchases or reservationswhere Reid's statement fits best.

You will also be able to answer complex, multi-faceted questions. For example, you can be asked about the best places to practice pilates in a city, how far they are and what offers they have for new clients, and AI Overviews will provide all the necessary information in natural language, requiring minimal cognitive effort from the user. .

Project Astra

OpenAI yesterday it presented its new language model, GPT-4oand its new voice capabilities that make it seem like an AI assistant very similar to the one it played Scarlett Johansson in the film Her of Spike Jonze. The product that Google has presented in the same line is Project Astrawhich will hardly be its commercial name when it is available.

Is about a virtual assistant with artificial intelligence which will arrive in the form of a mobile app, but not only. As explained Demis Hassabisco-founder of DeepMind which is now Google's AI division, will be able to see what is around you, identify it and answer questions about it.

Hassabis has pointed out what Google intends with Project Astra is “develop universal AI agents that can be useful in our daily lives” and can understand and respond as humans do. Also “remember what you see and hear to understand the context and act.”

In the example shown in the video conference, not live, a person uses the mobile phone to identify what is around, for example, the neighborhood in which it is located focusing from a window. The surprise was when she asked the assistant where her glasses were and then it was seen that they were glasses with a camera and integration with Project Astra. It should be available before the end of the year.

Gemini in Photos: Ask Photos

Gemini in Photos.
Gemini in Photos.Google.

Google has integrated Gemini in the app Photos. Ask Photos It is a new experimental function, which will arrive in the coming months, with which it will be easier to recover lost images from the gallery.

With Ask Photos, the user can perform any search in their gallery with questions in natural language. For example, “show me the best photo of all the amusement parks I've visited“, so you will not have to manually make the selection.

I see and Image 3

Landscape created with Image 3.
Landscape created with Image 3.Google.

Google has also presented new text-to-image and text-to-video models. The first is Image 3which as its name indicates is an evolution of previous models that offers Mainly it improves its ability to generate text in imagessomething that continues to be random in other models such as DALL-E and Midjourney.

I See is the answer to Sora, OpenAI's video text AI that left everyone speechless last February. This AI will be able generate videos at 1080p resolution and 1 minute long From the prompt entered by the user, with a deep understanding of natural language to produce videos that respond exactly to the request, you can work with visual concepts such as “time lapse” or “aerial landscape shot” and highlights in physics simulation in a scene. When they will be available? We will have to wait.

Gemini 1.5 Pro with 2 million tokens, Gemini Flash

Gemini arrived last year to replace bard and it did so based on a new language model. Now, the nomenclature we have at this point is complicated. It was initially launched Gemini Nano, Pro and Ultra. Nano to run locally on mobile phones, such as the Pixel 8, Pro with better capabilities than that and Ultra the most advanced, comparable to GPT-4.

Then the Pro advanced, but not the Ultra, which became Pro 1.5 with 1 million context tokens. Tokens translate into a certain number of words and mean the amount that an AI can handle in a conversation with the user, the context it can acquire. Pro has now increased up to 2 million, which means you can go much deeper with it. Until now, it was available in subscription Gemini Advanced in Spain, but only in English. Win the Spanish language and can perform actions such as summarizing a hundred of the user's emails or managing documents of up to 1,500 pages.

Gemini 1.5 Flash, Google's new language model.
Gemini 1.5 Flash, Google's new language model. Google.

To all this, we must add a new member to the Gemini family who is Gemini Flash. This is a lighter version of Gemini Pro that offers Faster to run and more economical to run on a large scale. For now, it will be available to developers, not the general public.