Google's AI Bonanza

May 15, 2024

Yesterday was Google's big annual I/O event, and boy did they bring the fireworks when it came to new AI announcements. I have to admit, after the splashy release OpenAI had the day before with their latest language model, I wasn't expecting too much from Google. But they absolutely brought their A-game.

Now, I'll be the first to admit I'm a bit of a tech geek at heart. I still remember being utterly mesmerized the first time I saw an early demo of Google's AI assistant years ago. The way it could understand and respond to freeform questions just blew my suburban teenager's mind. Since then, every new AI breakthrough has hit me with that same childlike wonder and excitement for the future.

So, you can imagine how I felt watching this year's keynote. It was like an AI junkie's dream!

Subscribe now

Gemini Rising

Let's start with the big enchilada - the upgraded version of Google's large language model called Gemini. Those lucky enough to be Gemini Advanced subscribers (they paid for early access) can now play with Gemini 1.5. This super-sized model can handle context windows of up to 1 million tokens, or roughly 750,000 words! That's more than double the context of GPT-3.

But wait, there's more! Google says that huge window will eventually expand to 2 million tokens, or 1.5 million words of context. Just let that ludicrous number sink in for a second. We're talking about an AI that could theoretically read and understand multiple full-length novels before answering your question.

One of the coolest demos, in my opinion, showed Gemini analyzing your photos by simply asking questions like "What's my license plate number?" or "When did Lucy learn how to swim?" The AI would then scan all your uploaded pictures to find the relevant ones and respond. As someone with thousands of random photos clogging up my phone, that's the kind of magic trick I could really use!

But, yeah, I know, privacy is a concern.

Your AI Butler

But Gemini isn't just a parlor trick - Google envisions it as a true AI assistant that can help you across all their products and services. They showed examples of it working seamlessly inside Gmail, summarizing all the important messages for you. In Google Docs, it can research topics and elaborate on points. With Google Drive, the AI can quickly find that PDF you've been looking for based on a vague description.

The ultimate goal, which Google calls "AI Agents," is for Gemini to actually take actions on your behalf across applications.

Now I have to issue the standard tech skeptic's warning here - we've seen plenty of these breathtaking concept demos that never quite made it to real-world products. And with Google's track record of losing interest in projects, I'll believe it when I can actually buy an AI Agent subscription.

But, hopes and dreams aside, you have to admit the vision is tantalizing. Just imagine having your own personal AI assistant handling all the little tedious chores and mental clutter that bogs us down every day. Maybe then we'd have more time and energy for...wait, what was my point again? I got a little distracted daydreaming about how awesome that would be! Where was I?

AI Multimedia Madness

Oh right, there was a ton of other cool stuff Google showed off. For their Notebook AI assistant, they demonstrated a wild feature that could turn uploaded documents and audio notes into an AI-generated podcast play-by-play explanation. You could even jump in with questions that it would address on the fly.

Then they unveiled Imagen 3, Google's new text-to-image generator to rival DALL-E and others. While not a massive leap forward, it looked very capable. More impressive in my book was their AI video generator, codenamed "Veo," which can create 1080p video clips of up to a minute in length based on text descriptions. We've seen snippets of this kind of thing before, but Google is now allowing wait-listed users to actually put Veo through its paces.

There was also considerable progress shown for core AI functions like real-time captioning, automatic document summarization, and workflow assistance. In a move sure to intrigue developers, they announced forthcoming open-source releases of their large multimodal models, like Gemma 2.

Search, Evolved

One of the most impactful announcements for regular users, though, has to be the AI overhaul coming to Google Search. We all know and love the core search experience, but now they're imbuing it with more advanced reasoning capabilities using Gemini and other models.

The example they gave was pretty mind-blowing - You could type in something like:

"Find the best Pilates studios in Boston near Beacon Hill that have introductory discounts, and show me their walking distance from my location."

Instead of a list of blue website links, Google would comprehensively answer that entire multi-part query with a formatted summary result! I can't overstate how transformative that would be for quickly getting the information you actually want without sifting through web pages.

At one point in the presentation, they even had Gemini analyze the keynote transcript in real time to count how many times the speakers said "AI." That meta little stunt admittedly made me laugh.

Human After All

Amidst all the tech dazzle, there were some smaller human moments that really stuck with me, too. At one point, the presenter got an incoming call during the live demo. Before he could answer, the on-device AI pegged it as a potential scam attempt and warned him about it. A handy real-world use case!

The New AI Race

So in summary, while OpenAI's launch captured the world's attention Monday, Google absolutely refused to be upstaged today. They unleashed an incredible cavalcade of new AI tools, assistants, and multimedia generators that push the boundaries in enticing ways.

Make no mistake—we've now entered a full-fledged AI innovation arms race between the tech giants. Google, Microsoft, OpenAI, Anthropic, and others are all doubling down to try to establish supremacy and mindshare in this pivotal emerging field.

Competition breeds excellence, though, and I, for one, am incredibly stoked to see where these companies take AI next. If Google's display this week is any indication, we're in for some wizardry and whiplash-inducing AI advancements in the months and years ahead.

Thanks for reading PulseCode' Newsletter! Subscribe for free to receive new posts and support my work.

Watch the Google I/O: