Women in AI Summit 2024 - Talks by Experts

The list of sessions & key takeaways -

AI-powered transformation: Driving innovation and reshaping organizations - Joana Carrasqueira, Senior Manager, AI Developer Relations, Google (15:09)

Building with the Gemini API and Google AI Studio - Shrestha Basu Mallick, Product Lead, Gemini API and AI Studio, Google (23:58)

The presentation introduced the Gemini API and AI Studio, highlighting several key features and capabilities.

Multimodal Input: Gemini models support a variety of input modalities, including text, images, audio, and video.

Gemini offers an exceptionally large context window of 2 million tokens, which enables users to process extensive amounts of data, such as entire code bases, large document sets, or long videos. This feature allows for applications like sales representatives having access to all product information when interacting with customers. Context caching is also available, which can significantly lower costs when repeatedly using the same context.

AI Studio and API offer several powerful features including:

Code Execution: This tool provides an execution sandbox for generating and running code, aiding in tasks like solving math problems. It also allows for data cleaning and formatting.
Structured Output: This feature allows Gemini to respond in a specific JSON format, enabling structured data extraction from unstructured or mixed format input.
Function Calling: Users can define their own functions that Gemini can call as tools.
Grounding with Google Search: This tool allows Gemini to access and utilise Google Search results to provide more accurate, recent, and rich responses.
Vision and Video Understanding: Gemini models are exceptionally strong in vision and video understanding tasks. It is able to detect bounding boxes within images, generate SVGs, and analyse video content. The models excel in tasks like technical analysis, narrative analysis, and plot analysis of videos.
Easy Access and Setup: Developers can get started with the Gemini API in under five minutes with just an API key. The AI Studio provides a prompt gallery to explore models and create API keys.
Fine-tuning: The API also supports fine-tuning of Gemini models, allowing developers to customize the models for specific tasks.

Introduction to Gemini APIs and Google AI Studio - Paige Bailey, AI Developer Experience Engineer, Google

A context window measures how many tokens — the smallest building blocks, like part of a word, image or video — that the model can process at once. With the 2M+ tokens support for Gemini 1.5 Pro, you can give your info all at once & it is off to the races. The model can receive a lot of information at inference time. This can eliminate the need for fine-tuning or vector databases.

A "feel" for 1-10M tokens:

All of the emails you've sent in the last year
All of the text messages you'll send in a lifetime
50-500K lines of code
1-11 hours of video @ 1fps
8-80 English novels
Full text of all of the US federal laws and regulations
Main text of all of the papers published at NeurlPS this year
Transcripts of over podcast episodes
2K - 20K news articles
Audio for over 200-2000 songs
Earnings scripts for 200-2000 companies
All of the calendars of a small to medium sized startup

AI Studio can be accessed at aistudio.google.com with a Gmail account. It allows users to experiment with the Gemini models and offers a prompt gallery for inspiration.

AI Studio allows users to upload files, record audio and video, and use sample media.

In the AI Studio demo, a 5 minute video on Dinosaurus was added to the Prompt through the AI Studio Playground. A Prompt was written to identify each dinosaur & the timestamp at which appears in the video along with a fun fact about it. Gemini 1.5 Flash consumed ~89K tokens for the video & took 14 seconds to analyze it. Also the steps to generate a transcript from selected pages in a PDF and Birthday Weekday calculation are demo'ed using the Code Execution feature. By turning on code execution, the model gets access to an isolated sandbox environment with Python.

Through the Safety settings, adjustment to sliders for Harassment, Hate, Sexually Explicity and Dangerous Content can be made to control responses that could be harmful

Gemini can be grounded with Google Search to get real-time information. It can synthesise the top Google search results and provide sources.

AI Studio can automatically generate code based on the actions performed in the UI and allows users to open these examples in Colab.

The models can be used with the OpenAI API library.

The function calling feature can be used for tool selection and information extraction, as well as to use different models or satellite imagery segmenters.

API keys can be generated directly within Colab or AI Studio.

Users can upload data to tune Gemini models.

Code can be exported and used in various applications using Python or JavaScript.

The Gemini API and UI are free to use.

AI for everyone with Gemma - Kathleen Kenealy, Staff Research Engineer, Google

Gemma is a family of lightweight, open models built from the same research and technology as Gemini.
Gemma is 'open' and 'offline', 'open' and 'offline' means you can run the model on local machines or host it yourself!
The models are designed to fit on standard hardware, not requiring giant clusters of GPUs or TPUs. This makes them accessible to a wider range of developers.
The weights are publicly available, which is the 'open' part of the model
In Korea, a developer is leveraging Gemma to build a dialect translator specifically focused on the Jeju dialect.
Tuning recipes - A collection of guides and examples for the Gemma open models from Google.
$100,000 prize for a Kaggle Competition to fine-tune Gemma 2 for a specific language or cultural context.

AI in your pocket: Building intelligent Android apps - Jingyu Shi, Developer Relations Engineering Manager, Google

Gemini Nano is Google’s most efficient AI model built for on-device tasks on Android.
Gemini Nano runs in Android's AICore system service, which leverages device hardware to enable low inference latency and keeps the model up-to-date.
Access to Gemini Nano API and AICore is provided by the Google AI Edge SDK.
Benefits of on-device execution:

Local processing
Offline availability
Potentially reduced latency
No additional cost

Gemini in Android Studio is your coding companion for Android development.

Prompt to production: Building an AI app with Flutter - Khanh Nguyen, Developer Relations Engineer, Google (1:17:43)

Flutter is an open-source framework for building multi-platform apps. Flutter code is written in Dart
Google AI Dart Client SDK is the fastest way to prototype generative AI features in your Flutter and Dart apps. This solution enables use of Gemini models through a Dart API in Google AI Studio
The presenter describes a scenario in which she records the sound of her stalled car and uses a Gen AI to diagnose the problem. She discusses the potential for the app to expand its services to include a list of local repair shops capable of fixing the issue.

How AI is changing the healthcare system - Fernanda Wanderley, Senior Data Scientist, Kunumi & Google Developer Expert (1:32:17)

Convolutional Neural Networks (CNN) are used to detect pathologies in X-ray or CT scans, for example. This can aid a non-radiologist to improve their work
Some advantages of using AI

Faster care
More efficiency
More time to care
Risk assessment

Fast-tracking your AI career with Kaggle - Ruchi Bhatia, Product Marketing Manager, Data Science & AI, HP (1:41:10)

Triple Kaggle Grandmaster Ruchi Bhatia's Data Science journey

A Kaggle profile helps your Data Science portfolio by offering:

Credibility and visibility
Live accessible projects
Centralized portfolio - Datasets, Notebooks & Competitions at one place

Consistency is key

Assessing AI's progress - Panel discussion

Search This Blog

Tech Tips, Tricks & Trivia

Women in AI Summit 2024 - Talks by Experts

Comments

Post a Comment

Popular posts from this blog

Things Near Me – Find & Learn About Landmarks Nearby

PhonaTick - A Word List for Confusing Pronunciations

GitHub Copilot Features