Posts

Leaderboards for Evaluating Language Models

Image
Leaderboards test a models' ability to perform across diverse tasks, including factual accuracy, general knowledge, reasoning, and ethical alignment. YouTube Benchmarks explainer - bycloud They incorporate benchmarks like MMLU (Massive Multitask Language Understanding measures general academic and professional knowledge), TruthfulQA  (tests the truthfulness of responses), and HellaSwag (testing commonsense reasoning and natural language inference) to test different aspects of model performance. *  Stanford Holistic Evaluation of Language Models (HELM) Leaderboard  - A reproducible and transparent framework for evaluating foundation models. These leaderboards cover many scenarios, metrics, and models with support for multimodality and model-graded evaluation. *  Artificial Analysis provides benchmarking and related information to support people & organizations in choosing the right model for their use-case and which provider to use for that model. *  Ch...

Hee-Haw Heroics

Image
See more of my AI co-creations

Women in AI Summit 2024 - Talks by Experts

Image
The list of sessions & key takeaways - AI-powered transformation: Driving innovation and reshaping organizations - Joana Carrasqueira, Senior Manager, AI Developer Relations, Google (15:09) Building with the Gemini API and Google AI Studio - Shrestha Basu Mallick, Product Lead, Gemini API and AI Studio, Google (23:58) Introduction to Gemini APIs and Google AI Studio - Paige Bailey, AI Developer Experience Engineer, Google AI for everyone with Gemma - Kathleen Kenealy, Staff Research Engineer, Google Gemma is ​​'open' and 'offline', ​​'open' and 'offline' means you can run the model on local machines or host it yourself! The weights are publicly available, which is the 'open' part of the model Tuning recipes - A collection of guides and examples for the Gemma open models from Google. $100,000 prize for a Kaggle Competition to fine-tune Gemma 2 for a specific language or cultural context. AI in your pocket: Building intelligent Android ...