Posts

Showing posts from March, 2025

This Week I Learned - Week #13 2025

Image
This Week I Learned -  * FastAPI is a high-performance web framework for building HTTP-based service APIs in Python 3.8+. * Scrollytelling is a web design technique that uses scroll-triggered animations, transitions, and interactions to create immersive and engaging stories. Some popular scrollytelling pieces: New York Times: Close Read ABC News: How to spot fake and AI images Stuff: This is the tale of two pandemics Permutation Test: A Visual Explanation of Statistical Reasoning * GenAI is more like a stochastic parrot imitating human language by matching statistical patterns. * "I'd decided to start AI Dev because while there're great academic AI conferences that disseminate research work (such as NeurIPS, ICML and ICLR) and also great meetings held by individual companies, often focused on each company's product offerings, there were few vendor-neutral conferences for AI developers . With the wide range of AI tools now available, there is a ...

Stronger Than Steel… Until Pizza Shows Up!

Image
See more of my AI co-creations To arrive at the final image after researching about bones , I had a long conversation which included this key prompt - "Draw a side-profile cartoonish illustration of an overweight human body from head to toe with a visible skeleton. The image should depict the skeleton in high detail, showing bones clearly while maintaining the outer body’s soft tissue contours. The background is neutral, with a scientific and anatomical style, focusing on the contrast between the body fat and the skeletal structure. Use a semi-transparent or X-ray-like effect to highlight the bones inside" Screenshot of the rendering process in ChatGPT

Generative AI: Essentials for Startups - Notes

Image
This session provides an overview of Google's foundational AI technologies – Vertex AI, Gemini models, Model Garden, databases, and storage. In the early days, there were only Large language models (LLMs). Then, retrieval augmented generation (RAG) emerged and evolved to Al agents for reasoning and orchestration 60+ models in Model Garden. Large language models (LLMs) can now enable building solutions that previously required research teams, potentially achievable "over a weekend". Generative AI involves large language models and often retrieval augmented generation (RAG) for information retrieval, leading to more factual and less hallucinated responses. The current era includes AI agents, which add logic and control to the LLM and RAG stack, empowering them with tools to take actions. Vertex AI in Google Cloud provides a user interface ( Vertex AI Studio ) to experiment with models like Gemini and offers a Model Garden with various foundation and partner models. A gen...

Bone Voyage

Image
See more of my AI co-creations

The Diet Delinquent

Image
See more of my AI co-creations With just a prompt, Grok was able to take a  repurposed   hand-drawn cartoon from 2004  and generate an image of it doing a 'namaste'.  I’ve been getting by with free Windows-native tools like Paint, Paint 3D , and the Generative Erase feature in Photos . Grok's image editing feature, which can be activated simply by prompting, is a fantastic resource for amateur cartoonists. 

Eco-friendly, by force

Image
  See more of my AI co-creations

This Week I Learned - Week #12 2025

Image
This Week I Learned -  *  Teachable Machine is a web-based tool from Google that makes creating machine learning models fast, easy, and accessible to everyone. *  Model Context Protocol (MCP) introduces a standardized, open-source way for AI applications to fetch real-time data, interact with external systems, and execute actions—without endless custom coding. It is an open standard released publicly as open-source by Anthropic in November 2024.  *  Kokoro TTS is a cutting-edge text-to-speech model with just 82 million parameters, delivering high-quality, natural-sounding speech. Kokoro Text to Speech (TTS) model is open-source and licensed under the Apache 2.0 license, making it free for both commercial and personal use. Developers can integrate it into their applications without any licensing restrictions. * Phi-4-multimodal, an open weights model that processes text, images, and speech simultaneously is Microsoft's first official large language model ...

The Mercurial Grok AI Assistant Understands & Speaks Indian Languages

Image
Grok , the AI model developed by Elon Musk's xAI, can understand when you type Indian languages like Hindi, Telugu, Odia or other Indian regional languages using English letters (like when you type 'namaste' instead of 'नमस्ते'), and it can respond by mixing English with those languages. Grok doesn't necessarily need the native script of these languages. It has natural language processing abilities that extend to multiple languages. This is great innovation because many people in India, especially in online communication, use transliteration. Grok can generate responses that combine English words and phrases with words and phrases from the regional language you used in your input. For example, if you ask a question using a mix of English and Telugu transliteration, Grok might respond with a sentence that includes both English and Telugu words. Check these Hindi, Telugu, Odiya samples - Grok, ab Hindi mein Grōk, ippuḍu telugulō Grok, Oḍiā re This way Grok is more...

Code for Free, Worth Trillions: The Unsung Value of Open Source

Image
Key facts from the working paper " The Value of Open Source Software " [^PDF, 42 pages] by Manuel Hoffmann, Frank Nagle, and Yanuo Zhou (Working Paper 24-038, January 1, 2024): The paper aims to measure the economic and social value of open source software (OSS) , a global public good critical to the modern economy but challenging to quantify due to its free nature and lack of centralized usage tracking. Despite its value, OSS generally shows up as zero in direct economic measurement since prices equal zero and quantity is hard to track. This paradox highlights a flaw in traditional economics, making the paper’s valuation effort a wake-up call. OSS is one of the most successful and impactful modern examples of 'the commons,' at risk of underinvestment without proper valuation. Framing OSS as a digital "commons" ties it to historical economic concepts, emphasizing the need to protect it from a "tragedy of the commons." It uses two complementary dat...

Notes: "Reasoning with o1"

Image
Notes from the short course " Reasoning with o1 " on DeepLearning.AI, presented by Colin Jarvis of OpenAI - o1 is a reasoning model for complex tasks that require broad general knowledge, including function calling and image input. It can reason through complex tasks in domains like mathematics, coding, science, strategy, and logistics. o1 is different from other models because it thinks before it speaks. o1 requires less context in prompting to produce very effective results. o1 uses large-scale reinforcement learning to generate a chain of thought before answering. o1's chain of thought (CoT) is longer and higher quality than what you can typically attain by a prompt alone. CoT  contains behavior like: - Error correction - Trying multiple strategies - Breaking down problems into smaller steps o1 performs well at understanding images out-of-the-box. It can be used to extract a detailed JSON that describes the image and what's going on in it. o1 follows a test-and...

AI: From Assistant to Gossip Reporter!

Image
See more of my AI co-creations

This Week I Learned - Week #11 2025

Image
This Week I Learned -  * Since Windows 10 (version 1803) and later, curl comes pre-installed. You can use it directly in the Command Prompt. * Google’s new Gemma 3 series of AI models are optimised for efficient performance on a single GPU or its tensor processing unit (TPU) . These new open-source models incorporate both text and visual reasoning capabilities. Google said the models support over 35 languages and can be further fine-tuned to accommodate up to 140 languages.  *  Over the past few decades, as programming has moved from assembly language to higher-level languages like C, from desktop to cloud, from raw text editors to IDEs to AI assisted coding where sometimes one barely even looks at the generated code (which some coders recently started to call vibe coding), it is getting easier with each step . - The Batch * Alibaba's LLM QwQ-32B is a version of Qwen2.5-32B that was fine-tuned to generate chains of thought using reinforcement learning (RL). * De...

Bookmarker Bookmarklet v2

Image
Remember the days of Google Bookmarks? A handy little service that let you keep all your interesting web finds in one place. Well, that service has sailed off into the digital sunset. I built an alternative for myself that saves my Bookmarks to a spreadsheet in Google Sheets using Google App Script - the Bookmarker bookmarklet . It works by using a bookmarklet - a browser bookmark that runs a bit of JavaScript code when you click it. This code grabs the page's URL and title and adds it as a new row in your own copy of Google Sheets.  This provides a centralized and flexible way to keep track of interesting articles and websites, offering the ability to add your own notes in the corresponding columns. The motivation behind this was to have all bookmarks in one location, accessible from any device, rather than being tied to a specific browser. I enhanced the original code by prompting Gemini Flash within GitHub Copilot Chat to generate code to categorize a bookmarked URL. To mak...

This Week I Learned - Week #10 2025

Image
This Week I Learned -  * Adding "-hi" or any equivalent Indian language code to the India-specific subdomain of the Open Food Facts URL will show few parts of the product page in Hindi (or equivalent regional language). For example, some words which have English to Hindi translations in Crowdin , a translation mapping service that Open Food Facts uses, will be shown on this webpage in Hindi -  https://in -hi .openfoodfacts.org/product/8908015592355/khatti-premium-seedless-tamarind-de-naturel * What is Crowdin? * Due to the localisation that Crowdin enables, different nationalities can benefit from utilizing the Open Food Facts database through its website in their native language. * Reinforcement Learning is a machine learning technique where an agent learns to make decisions by interacting with an environment and receiving feedback in the form of rewards or penalties. Andrew Barto and Richard Sutton are recognized for developing the foundational concepts and algorithms ...

PhonaTick - A Word List for Confusing Pronunciations

Image
The English language is full of surprises, especially when it comes to pronunciation. With more than 19 vowel sounds represented by just five vowels (plus sometimes “y”), English can be a challenge—especially for non-native speakers. Many words contain silent or extra letters that make pronunciation tricky. If your native language follows phonetic spelling (where words are pronounced as they are written), you might instinctively apply the same logic to English. This can lead to unexpected mistakes—like pronouncing the “t” in ballet or being puzzled by why colonel is pronounced ker-nil . To help with this, I started compiling a list of such tricky words in a Google Spreadsheet. Then, I learned that Google Sheets can function as a read-only database , so I built a simple web app using the Google Sheets API and JavaScript to display the list online.  Check it out ! The pronunciations are from Google & WordWeb (created by a physicist, Antony Lewis).  Switching to Datasett...

"How I use LLMs" - Andrej Karpathy

Image
This video [28 Feb 2025] by Andrej Karpathy is part of a "general audience series on large language models". It aims to show practical applications of LLMs and how to use them. It includes examples , different settings, and personal usage demonstrations. Key points (paraphrased notes including summary generated by NotebookLM): Language model is essentially a self-contained entity, like a "one terabyte zip file," which contains knowledge from pre-training and style from post-training. It's important to be mindful of the specific model being used, as different models have varying capabilities and pricing. Deep Research  combines internet search and thinking for in-depth analysis The launch of  ChatGPT  by  OpenAI  launched in  2022  marked the first instance where the general public could engage in conversational interactions with an LLM via a simple text interface. Its rapid spread and popularity online were immense, significantly impacting the...

GitHub Copilot Features

Image
A good way to understand commercial software products is to check features for its different pricing plans.  Here is the list of the 50+ GitHub Copilot features from its Pricing page - Chat Messages and interactions Access to OpenAI GPT-4o [Preview] Access to OpenAI GPT-4.5 [Preview] Access to Anthropic Claude 3.5 Sonnet [Preview] Access to Anthropic Claude 3.7 Sonnet [Preview] Access to OpenAI o1 [Preview] Access to OpenAI o3-mini [Preview] Access to Google Gemini 2.0 Flash Context-aware coding support and explanations Debugging and security remediation assistance Access to knowledge from top open source repositories Generate tests, docs, and more with slash commands Answers about issues, PRs, discussions, files, commits, etc. Web search powered by Bing [Preview] Explain failed Actions jobs [Preview] Multi-file editing in VS Code [Preview] Switch between models [Preview] Add images to prompts Code completion Real-time code suggestions [Preview] Next edit suggestions Comments to c...

Datasette: The Open-Source Tool for Data Exploration and Publication

Image
Can you imagine sharing a CSV file as URL and letting the recipients view the data through just their browser? Guess what, you can do that & more with  Datasette . Click on this link to see a table showing shelf life of some perishable food items Datasette is a tool created by Simon Willison for exploring and publishing data .  It's a web application that provides a  user interface (UI) for browsing, viewing,  faceting, filtering,  sharing and exploring data .  Datasette allows you to convert CSV files into a database table . Datasette runs on top of SQLite , which is a fast, widely used database where each database is a single file that's easy to copy and back up. Datasette is designed for read-only data, meaning you can't make changes to the data through the Datasette interface. You can run your own SQL queries against the data, which is usually risky for web applications but safe in Datasette because it uses a read-only database. Datasette's JSO...