Home
Blog
Google Unveils "Gemini", a Multimodal AI Poised to Outdo GPT-4

Google Unveils "Gemini", a Multimodal AI Poised to Outdo GPT-4

13 Dec 2023

Google has taken a bold step into the AI arena with the introduction of "Gemini," its most advanced large language model to date. This multimodal LLM is not just about understanding text; it's been designed to process images, videos, and audio as well. To refine Gemini's capabilities, Google is considering using an array of personal data, including individual search histories and photographs.

"The Ellmann Project", is named in memory of the famous biographer Richard David Ellmann. The initiative aims to explore users' personal digital archives, such as photos, various files, and search data, with the lofty goal of creating a narrative similar to a person's life story.

Gemini, the brains behind this project, aspires to offer users a panoramic summarization of their life experiences. By analyzing content from users' Google accounts, including written documents and visual memories, the AI can become a personal archivist, drawing out significant life events from a mass of digital data.

It is yet to be determined whether Google will integrate these features into Google Photos; however, advancements like automatically organizing akin photographs and arranging screenshots have already been incorporated into the platform as of November.

An "internal summit" witnessed Google's team showcasing the AI's deep learning prowess, demonstrating its ability to ascertain personal details such as birthdates and possible family connections. The AI is designed to sift through data, including photographs and metadata, to create a storyline encompassing life's special milestones like graduations, reunions, weddings, and the journey to parenthood.

In addition to identifying major life events, Gemini is said to have a knack for deducing personal preferences and habits, from frequently used apps to favorite cuisines, by analyzing image content and search queries.

The project even aims to decipher dietary preferences—if a user's photo history is replete with Italian dishes, Gemini will deduce a penchant for Italian cuisine.

The team introduced 'Ellman Chat' during the presentation—an AI conversation system capable of answering nuanced inquiries about a user's life based on the digital breadcrumbs in their Google Photos. For example, this could involve pinpointing the last occasion a relative visited, inferred from photographic evidence.

Google's spokesperson has communicated that Project Ellmann is currently in the experimental phase. Should the decision be made to fully implement it, Google has pledged to take ample time to ensure the system is valuable for users while upholding stringent privacy protocols.

The potential for an AI with such insightful access to personal data may lead to concerns over privacy and just how deep into personal histories Google plans to dig. While the chances of a public release remain uncertain, the project speaks to the lengths AI technology can reach in personalizing digital experiences.

Google Unveils "Gemini", a Multimodal AI Poised to Outdo GPT-4

Most Popular