Close Menu
    Facebook X (Twitter) Instagram
    Parhlo WorldParhlo World
    • Home
    • Life
    • Entertainment
    • News
    • World
    • Business
    • Technology
    • More
      • Celebrities
      • Sports
    Parhlo WorldParhlo World
    Home»Must Read»Gemini Isn’t As Good At Studying Data As Google Says It Is
    Must Read

    Gemini Isn’t As Good At Studying Data As Google Says It Is

    DavidBy DavidJuly 1, 2024Updated:July 1, 2024No Comments8 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr Email
    Gemini Isn't As Good At Studying Data As Google Says It Is
    Share
    Facebook Twitter LinkedIn Pinterest Email

    One of the best things about Google’s Gemini 1.5 Pro and 1.5 Flash generative AI models is that they can supposedly handle and study a lot of data. In press conferences and demos, Google has said over and over that the models’ “long context” lets them do things that weren’t possible before, like reviewing several hundred-page papers or looking across scenes in film footage.

    But new study shows that those things aren’t really what the models are good at.

    Two different studies looked at how well Google’s Gemini models and others can make sense of huge amounts of data, like the amount of data needed to write “War and Peace.” They both say that Gemini 1.5 Pro and 1.5 Flash have trouble giving correct answers to questions about big datasets. For example, in a set of tests based on documents, the models only got it right 40% to 50% of the time.

    Talked to Marzena Karpinska, a postdoc at UMass Amherst and co-author on one of the studies. “While models like Gemini 1.5 Pro can technically process long contexts, we have seen many cases indicating that the models don’t actually ‘understand’ the content,” she said.

    Gemini Doesn’t Have A Background Window

    The raw data (like text) that a model looks at before making output (like more text) is called its context, or context window. You can use a simple question like “Who won the 2020 U.S. presidential election?” or a movie script, TV show, or audio clip as background. As the context box gets bigger, the papers that are being shown in it also get bigger.

    The newest versions of Gemini can handle more than 2 million tokens as they are. “Tokens” are broken up pieces of raw data, like the “fan,” “tas,” and “tic” sounds in the word “fantastic.” That’s about 1.4 million words, two hours of video, or 22 hours of music. It’s the most background of any model that’s currently on the market.

    Google showed a bunch of pre-recorded demos at a meeting earlier this year that were meant to show how Gemini’s long-context features could be used. One had Gemini 1.5 Pro look through the about 402 pages of the text of the Apollo 11 moon landing TV show for lines with jokes. It then found a scene in the show that looked like a pencil sketch.

    The meeting was led by Oriol Vinyals, VP of research at Google DeepMind. He said the model was “magical.”

    “These kinds of reasoning tasks are done on every page and every word of 1.5 Pro,” he said.

    That May Have Been Too Much Of A Claim

    Karpinska, along with researchers from the Allen Institute for AI and Princeton, asked the models to judge whether claims about English-language fiction books were true or false in one of the studies that compared these skills. So the models couldn’t “cheat” by knowing ahead of time, the researchers chose recent works and filled the lines with specific details and story points that could only be understood by reading the books in their entirety.

    If you read a book that said something like, “By using her skills as an Apoth, Nusis is able to reverse engineer the type of portal opened by the reagents key found in Rona’s wooden chest,” Gemini 1.5 Pro and 1.5 Flash had to say whether they thought it was true or false and explain why.

    When experts tested 1.5 Pro on a book that was about 260,000 words (about 520 pages) long, it got the true/false questions 46.7% of the time, but Flash only got them right 20% of the time. That means that a coin is a lot better than Google’s newest machine learning model at answering questions about the book. When all the test results were added together, neither model was able to get better question-answering accuracy than random chance.

    Karpinska said, “We’ve seen that the models have more trouble verifying claims that need to look at larger parts of the book, or even the whole book, than claims that can be solved by finding evidence at the sentence level.” “Qualitatively, we also saw that the models have trouble checking claims about information that is clear to a human reader but not stated directly in the text.”

    #Tazow Gemini’s data-analyzing abilities aren’t as good as Google claims https://t.co/HVnfV9qSMv #crypto pic.twitter.com/pRKPUozFQ6

    — Tazow – $TZW (@_Tazow) June 29, 2024

    The second study, which was co-authored by researchers at UC Santa Barbara, looked at how well Gemini 1.5 Flash (but not 1.5 Pro) could “reason over” movies, which means it could search through them and answer questions about what it saw.

    A dataset of pictures (like a picture of a birthday cake) and questions for the model to answer about the things shown in the pictures (like “What cartoon character is on this cake?”). To test the models, they picked a picture at random and put “distractor” pictures before and after it to make video that looked like a slideshow.

    Flash wasn’t working very well. The model had to type in six scribbled numbers from a “slideshow” of 25 pictures. Flash got about half of the transcriptions right in the test. When there were eight numbers, the accuracy dropped to about 30%.

    He told TechCrunch, “On real question-answering tasks over images, it seems to be especially hard for all the models we tested.” Michael Saxon is a PhD student at UC Santa Barbara and one of the study’s co-authors. “That little bit of thinking—figuring out that a number is in a frame and reading it—could be what’s wrong with the model.”

    Google Is Making Too Many Claims About Gemini

    Researchers haven’t looked over either study with other researchers, and they don’t look into the releases of Gemini 1.5 Pro and 1.5 Flash with 2-million-token settings. (They both tried the context updates with 1 million tokens.) Also, Flash isn’t meant to be as fast or powerful as Pro; Google markets it as a cheaper option.

    Anyway, both of them make it even more clear that Google has been making too many promises about Gemini and not keeping them. The researchers tried several models, such as OpenAI’s GPT-4o and Anthropic’s Claude 3.5 Sonnet, but none of them did well. On the other hand, Google is the only model provider whose ads use context window top billing.

    Based on the facts, Saxon said, “There’s nothing wrong with saying, ‘Our model can take X number of tokens.'” “The question is, what can you do with it that is useful?”

    Businesses and investors are becoming more unhappy with the limits of generative AI as a whole, so the technology is getting more attention.

    Two recent surveys from Boston Consulting Group asked C-suite executives what they thought about generative AI. About half of them said they didn’t think it would lead to big productivity gains and were worried that generative AI-powered tools could lead to mistakes and data breaches. PitchBook recently said that creative AI dealmaking at the early stages has dropped for two quarters in a row, falling 76% from its peak in Q3 2023.

    Customers are looking for interesting ways to tell one product or service from another when there are meeting-summarizing robots that make up fake information about people and AI search platforms that are basically just plagiarism makers. Google has been rushing to catch up to its creative AI competitors, which has sometimes been a mess. They really wanted Gemini’s context to be one of the things that set it apart.

    But It Looks Like The Bet Was Made Too Soon

    Karpinska said, “We haven’t decided on a way to really show that’reasoning’ or ‘understanding’ over long documents is taking place, and pretty much every group putting out these models is putting together their own ad hoc evaluations to make these claims.” “It is hard to say how true these claims are because we don’t know how long context processing has been used because companies don’t share this information.”

    Google Didn’t Answer When Asked For A Response

    Both Saxon and Karpinska think that better standards and, in the same sense, more focus on third-party criticism are the best ways to stop people from making false claims about generative AI. Saxon says that the “needle in the haystack” test, which is often used by Google in its marketing materials, only checks how well a model can get specific information from datasets, like names and numbers, but not how well it can answer complicated questions about that information.

    Also Read: We Put Google’s Gemini Robot to the Test. This is How It Did

    Saxon said, “All scientists and most engineers using these models agree that our current benchmark culture is broken. It’s important that the public understands to take these huge reports with numbers like ‘general intelligence across benchmarks’ with a huge grain of salt.”

    What do you say about this story? Visit Parhlo World For more.

    Featured
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    David

    Related Posts

    What to Expect from All of Us Are Dead Season 2 Plot Twists

    May 23, 2025

    How Did Yellowjackets Season 2 End?

    May 22, 2025

    Everything We Know About Peaky Blinders Season 7 So Far

    May 21, 2025

    Everything We Know About Dragon Ball Super Season 2 So Far

    May 20, 2025
    Add A Comment

    Leave A Reply Cancel Reply

    You must be logged in to post a comment.

    Popular Posts

    The Second Moon Journey By Intuitive Machines Is Still On Track For 2024

    March 22, 2024

    Antitrust Claims Have Been Brought Against Google In The UK For “Self-preferencing” Its Ad Exchange

    September 6, 2024

    Shogun Season 2: New Characters and Plot Developments

    April 14, 2023

    Parhlo world

    world.parhlo.com is the leading open platform that represents the voice of youth with viral stories and believes in not just promoting talent and entertainment but in liberating world youth and giving rise to young changemakers!




    Our Picks

    What to Expect from All of Us Are Dead Season 2 Plot Twists

    May 23, 2025

    How Did Yellowjackets Season 2 End?

    May 22, 2025

    Everything We Know About Peaky Blinders Season 7 So Far

    May 21, 2025
    Quick Links
    • Home
    • Life
    • Entertainment
    • News
    • World
    • Business
    • Technology
    • More
      • Celebrities
      • Sports
    Parhlo World © 2014-2025. All Rights Reserved.
    • Home

    Type above and press Enter to search. Press Esc to cancel.