Google Search AI gives more than 1.3 billion inaccurate answers a day, according to a study

The google search has changed a lot in one year, if we talk about Spain, and in two since the launch in the United States of the AI Overviews or Views created with AI. It is now common to find answers generated by AI at the top of the traditional list of organic results. We know that AI models make mistakes and the companies behind them keep reminding us of this as a disclaimer, but The convenience offered by having the answer to what you are looking for convinces many users. Now, how much does Google search get AI answers wrong?

The New York Times has published an analysis evaluating the accuracy of AI Overviews and concluding that He gets it right 90 percent of the time. He is wrong, therefore, in one out of every ten answers.which may not seem like much until you take a look at the search engine’s global figures. The middle figure in five billion the searches carried out annually. That 10 percent of wrong answers represents a total of 500,000 million per year. That is to say, 1,369 million every day and 57 million every hour. It doesn’t seem like a negligible amount.

The New York Times conducted this analysis with the help of a startup called Oumiwhich is involved in the development of AI models. The company used a tool called SimpleQAa test commonly used to classify the factual reliability of generative models such as Geminithe AI behind AI Overviews. SimpleQA, published in 2024 by OpenAIconsists of a list of more than 4,000 questions with verifiable answers that can be introduced into an AI.

Oumi started running the test last year, when Gemini 2.5 It was the best model in the company. At that time, the benchmark showed an accuracy rate of 85 percent. When the test was repeated after updating the browser to Gemini 3AI Overviews correctly answered the 91 percent of the questions.

The report includes examples where AI Overviews failed. When asked the date on which the old house of Bob Marley became a museum, he cited three pages in his response, two of which didn’t even mention the date. The last one, Wikipedia, included two contradictory years and AI Overviews chose the wrong one.

The benchmark also asks the models to indicate the date on which Yo Yo Ma was incorporated into Classical Music Hall of Fame. Although AI Overviews cited the organization’s website where Ma’s incorporation was listed, He stated that there is nothing called the Classical Music Hall of Fame..

Google, as expected, disagrees with these results. Ned Adriancea company spokesperson, told The New York Times that SimpleQA contains incorrect information. Google evaluates its models using a similar test called SimpleQA Verifiedwhich uses a smaller set of questions reviewed, supposedly, with greater rigor. ‘This study has serious flaws. It doesn’t reflect what people actually search for on Google.’he stated to the media.

The company has explained that behind AI Overviews there is not a single model, but rather use the ‘appropriate’ for each query. Although the search would give better answers if it always ran Gemini 3.1 Prothat would end up being too slow and expensive. To load quickly on a search page, the overview uses templates Gemini Flashfaster but less precise, when possible and everything indicates that that happens most of the time. So the bottom line about whether AI Overviews is reliable is that it depends on how lucky you are.