"How I use LLMs" - Andrej Karpathy


This video [28 Feb 2025] by Andrej Karpathy is part of a "general audience series on large language models". It aims to show practical applications of LLMs and how to use them. It includes examples, different settings, and personal usage demonstrations.

Key points (paraphrased notes including summary generated by NotebookLM):

Language model is essentially a self-contained entity, like a "one terabyte zip file," which contains knowledge from pre-training and style from post-training.

It's important to be mindful of the specific model being used, as different models have varying capabilities and pricing.

Deep Research combines internet search and thinking for in-depth analysis

The launch of ChatGPT by OpenAI launched in 2022 marked the first instance where the general public could engage in conversational interactions with an LLM via a simple text interface. Its rapid spread and popularity online were immense, significantly impacting the digital landscape. However, since its inception, the LLM ecosystem has expanded dramatically

By 2025, a multitude of applications resembling ChatGPT have emerged, creating a much richer and more diverse environment. While OpenAI's ChatGPT stands as a pioneering and feature-rich incumbent, largely due to its longer presence in the market, numerous other alternatives, often described as "clones", offer unique user experiences not necessarily found within ChatGPT itself. Resources like Chatbot Arena and the SCALE leaderboard serve as valuable tools for tracking and comparing the performance of these various models based on metrics like ELO scores and evaluation results. 

The fundamental mode of interaction with an LLM involves providing text as input and receiving text as output. A simple request, such as asking for a haiku, effectively demonstrates the model's inherent talent for writing. On a technical level, both the user's input (query) and the model's response are segmented into smaller units known as tokensThese tokens form a sequential, one-dimensional stream. 

Tools like Tiktokenizer allow for the visual inspection of this tokenisation process. To maintain the flow of a conversation, the system employs special tokens that delineate the start and end of messages from both the user and the AI assistant within this underlying token stream. Initiating a new chat effectively clears the token window, which functions as the model's working memory or context window, resetting the conversation and the tokens to zero. The context window is crucial as it holds the tokens that the model can directly access and utilise during any given turn of the conversation.

At its core, an LLM can be conceived as a self-contained entity, akin to a sizable data file (perhaps one terabyte) housing the parameters of a complex neural network

The creation of this entity involves two primary training phases: pre-training and post-trainingPre-training entails the model processing and "compressing" vast quantities of text from the internet. This compression is lossy and probabilistic in nature, as it's impossible to perfectly represent the entirety of the internet within a finite file size. Instead, the model captures statistical patterns and the general "vibes" of the data, enabling it to predict the subsequent token in a sequence and acquire a broad understanding of the world. This accumulated knowledge is embedded within the model's parameters but has a knowledge cut-off date because the pre-training process is computationally demanding and infrequent.

Conversely, post-training focuses on refining the pre-trained model using datasets of human-generated conversations. This phase imbues the model with the persona of a helpful assistant, capable of engaging in dialogue and answering questions in a more natural and user-friendly style. Therefore, the final LLM artefact combines the extensive knowledge base acquired during pre-training with the conversational finesse developed during post-training.

Basic interactions with LLMs can range from simple factual queries, such as inquiring about the caffeine content of an Americano, to seeking advice on common health concerns, like remedies for a runny nose. While LLMs can offer valuable insights, it's crucial to remember that their responses are rooted in a probabilistic recollection of internet data, and thus accuracy cannot be absolutely guaranteed

It is always prudent to cross-reference information with reliable primary sources, particularly in situations where precision is paramount. 

For optimal performance and to manage computational resources effectively, it is advisable to maintain relatively short conversation lengths and to initiate a new chat when transitioning between unrelated topics. This prevents the model's context window from becoming cluttered with irrelevant information. 

Furthermore, users should remain conscious of the specific model they are interacting with, as different models (e.g., GPT-4o, GPT-4 Turbo mini) possess varying capabilities and are often associated with distinct pricing tiers (free, Plus, Pro). Exploring the diverse range of models available from various providers (e.g., Claude, Gemini, Grok) can lead to finding the most suitable tool for a given task.

Thinking models represent a more advanced stage in LLM development. These models are further refined through reinforcement learning, a process that allows them to learn and internalise sophisticated problem-solving strategies akin to human internal reasoning. This enhanced cognitive ability often results in significantly improved accuracy, particularly when tackling complex tasks such as mathematical problems and coding challenges. These models (e.g., OpenAI models with names starting with '0', DeepSeek R1) may require more processing time and generate longer outputs as they engage in more extensive "thought" processes. While standard models excel in providing quick responses for straightforward queries, thinking models are invaluable when a higher degree of accuracy is essential for intricate problems. Some platforms, like Grok, offer users a specific "think" mode to activate this deeper reasoning capability.

Tool use expands the functionality of LLMs beyond their inherent knowledge by enabling them to interact with external resources. Internet search stands out as a particularly useful tool, allowing models to access and integrate up-to-date information. This is typically achieved when the model emits a special token, signalling the LLM application to perform a web search based on the user's query. The application then retrieves relevant web pages, extracts their content, and incorporates it into the model's context window. This mechanism is vital for addressing questions about recent events or information not included in the model's original training data. Platforms such as Perplexity AI and ChatGPT have seamlessly integrated internet search capabilities, often providing citations to the sources used. The level of integration and the need for explicit user instruction to activate search can vary across different models and applications. Employing search tools is particularly beneficial for queries where the desired information is likely to be found within the top search results, such as current affairs, factual lookups, and information subject to frequent updates.

Deep research represents an even more sophisticated capability, combining internet search with prolonged and in-depth analysis. Currently accessible through premium subscription tiers (e.g., ChatGPT Pro), this feature enables the model to conduct multiple iterative searches, meticulously analyse retrieved documents and research papers, and ultimately generate comprehensive reports on intricate subjects. Similar functionalities are also available on platforms like Perplexity AI (Deep Research) and Grok (Deep Search). Deep research outputs often resemble custom-written research papers, complete with detailed summaries and citations. While exceptionally powerful for thorough investigations, users should maintain a degree of critical evaluation due to the inherent potential for the model to generate inaccuracies or "hallucinations". Examples of practical applications include conducting detailed product comparisons, exploring complex scientific topics, and synthesising information from a multitude of sources. The underlying process involves effectively providing the LLM with a wealth of relevant documents within its context window, similar to a regular internet search but with a significantly more sustained and analytical approach.

LLMs also offer the ability to process file uploads, allowing users to directly introduce specific documents (e.g., PDF files) into the model's context window. This functionality proves highly useful for tasks such as summarising lengthy research papers, analysing personal documents like blood test results, or even enhancing the experience of reading books by having an AI assistant readily available for questions and clarifications. While dedicated tools designed for seamless book interaction with LLMs are still in their nascent stages, the current methods of copying and pasting relevant text sections remain a valuable way to leverage this capability.

Another exceptionally powerful tool in the LLM toolkit is the python interpreter. This feature enables the LLM to not only generate code but also to execute it. When confronted with tasks that exceed its internal computational abilities (e.g., complex mathematical calculations), a model equipped with a Python interpreter, such as ChatGPT, can write and run Python code to arrive at accurate solutions. The availability of this feature can differ among various LLMs. ChatGPT Advanced Data Analysis builds upon the Python interpreter to offer the functionality of a junior data analyst. Users can upload data (e.g., in spreadsheet format), and the model can then write and execute code to analyse the data, generate visualisations such as figures and plots, and provide insights. While this offers a potent means for data exploration and analysis, users must exercise caution by carefully reviewing the generated code and the resulting outputs to identify any potential errors, inconsistencies, or implicit assumptions made by the model.

Claude Artifacts is a distinctive feature specific to the Claude platform. It empowers the model to generate functional applications and visual diagrams directly within the user's web browser. For instance, it can transform a block of text into an interactive flashcard application or create conceptual diagrams of book chapters using the Mermaid syntax. This allows for more dynamic and visual engagement with information.

For professional software development, dedicated Integrated Development Environments (IDEs) like Cursor are often favoured over the in-browser code generation features of LLMs. Cursor, which can leverage underlying models such as Claude 3.7 via API integration, operates directly with the user's local file system, providing the necessary contextual awareness for intricate coding projects. Features like the Composer within Cursor facilitate autonomous code generation and modification across multiple files, a workflow sometimes playfully termed "vibe coding".

LLMs are increasingly capable of interacting through multiple modalities, extending beyond simple text-based exchanges to include audioimages, and video. For audio input, users can utilise system-wide speech-to-text functionalities or the integrated microphone options available in mobile applications. For audio output, many LLM applications offer a "read aloud" feature powered by text-to-speech technology. Advanced Voice Mode (also known as true audio), found in models like ChatGPT and Grok, processes audio natively using audio tokens, enabling more natural and nuanced conversations, including voice imitation and even the generation of sounds. NotebookLM from Google introduces a unique audio interaction through its podcast generation feature, which creates custom audio content from user-uploaded documents.

Image input capabilities allow LLMs to analyse the content of uploaded images and respond to related queries. This can be applied to tasks such as extracting information from nutrition labels, interpreting the data presented in medical test results, or understanding the context of visual content like memes. Often, Optical Character Recognition (OCR) technology is employed to first transcribe any text present within the image before the LLM proceeds with its analysis. Conversely, image output functionalities, exemplified by ChatGPT's integration with DALL-E 3, enable the generation of entirely new images based on textual prompts provided by the user. 

Several quality of life features significantly enhance the user experience with LLMs. ChatGPT's memory feature allows the model to retain information across different conversations, gradually learning about the user's preferences and enabling more personalised and contextually relevant responses over time. Custom instructions empower users to define global preferences regarding the model's behaviour, tone of voice, and even a sense of identity, which are then applied consistently across all interactions. Custom GPTs, a feature within ChatGPT, enable users to create tailored versions of the model for specific, recurring tasks by pre-defining instructions and providing illustrative examples. These are particularly beneficial for streamlining repetitive workflows, such as vocabulary extraction for language learning or the creation of highly specific translation tools. Custom GPTs essentially save prompting time by storing detailed instructions that are automatically applied whenever that specific custom GPT is invoked. The creation and management of these custom GPTs are typically handled through the platform's user interface. Combining various functionalities, such as image input with OCR and custom instructions, can lead to the development of powerful, bespoke tools, as demonstrated by a custom GPT designed for translating and dissecting subtitles embedded in video screenshots. The creation and management of custom GPTs are usually managed through a dedicated section within the LLM platform.

In summary, the realm of LLM applications is a vibrant, rapidly evolving, and highly dynamic space. While ChatGPT stands as a foundational and feature-rich platform, other contenders are swiftly progressing, often achieving feature parity or even surpassing ChatGPT in specific areas. For instance, while ChatGPT now offers internet search, platforms like Perplexity AI, which have focused on search for a longer period, may still offer superior performance in this domain. Similarly, for tasks like quick web application prototyping and diagram generation, Cloud Artifacts presents unique advantages not found in ChatGPT. When it comes to purely conversational interactions, ChatGPT's advanced voice mode is notable, although users seeking a less "cringey" or more uninhibited experience might turn to Grok. Ultimately, each LLM application possesses its own set of strengths and weaknesses, but ChatGPT remains a strong and versatile default choice. Key considerations when navigating this landscape include the underlying model's capabilities, the user's subscription tier, the availability and utility of thinking models, the suite of tools offered (such as internet search and code interpreters), the support for multimodal interactions (audio, image, video), and the array of quality of life features (like memory and custom instructions). It's also important to note that feature availability can vary between the web and mobile versions of these applications. This dynamic environment necessitates ongoing exploration and experimentation to identify the tools that best align with individual needs and preferences.

Comments

Popular posts from this blog

Kai-Fu Lee on China-US AI Race - Q&A Transcript from a Bloomberg Interview

The Mercurial Grok AI Assistant Understands & Speaks Indian Languages

40 Talks from the Google Web AI Summit 2025