The Windsurf Story

Paraphrased notes from an interview with Windsurf CEO and Co-Founder, Varun Mohan, based on a summary initially generated by NotebookLM:

Key Points:

  • Windsurf's origin traces back to a company called Exaunction, which initially focused on building GPU virtualization systems for workloads like large-scale simulations in robotics and autonomous vehicles. With the advent of large generative models around mid-2022, the company pivoted to focus on the application layer, leveraging their infrastructure background to build code-related tools.
  • The decision to build their own Large Language Models (LLMs) and inference stack early on was driven by the limitations of existing open models, particularly their inability to handle the "fill-in-the-middle" capability crucial for coding where changes are often needed within existing lines or snippets.
  • Evaluating new models for coding use cases is complex due to their non-deterministic properties. Windsurf uses a rigorous evaluation infrastructure, inspired by practices in autonomous vehicles, which includes comprehensive evaluation suites that test not only end-to-end task completion but also metrics like retrieval accuracy, edit accuracy, and undesirable behaviours such as redundant changes. This allows them to quickly assess performance on tens of thousands of repositories by programmatically testing against ground truth from open source commits.
  • Latency is identified as the number one challenge for Windsurf, especially for the passive code suggestion experience. They aim for very low latency, such as sub-couple hundred milliseconds for the first token and hundreds of tokens per second for generation. Optimising GPU usage involves balancing compute load (highly parallelisable) with memory bandwidth (a potential bottleneck). Even small increases in latency (e.g., 10 milliseconds) can materially affect user willingness to use the product. Physical data centre location and local network conditions also play a role.
  • Handling context for large and growing codebases is a significant challenge. Current models have limited context windows, and using larger ones incurs higher costs and latency. Windsurf tackles this through a mixture of techniques, including approximations like better checkpointing (with some data loss) and enhancing retrieval accuracy. They don't see a single "silver bullet" solution but rather a combination of improved checkpointing, better context usage, faster models, and leveraging codebase structure like knowledge graphs and dependencies.
  • Windsurf's approach to indexing and retrieval combines multiple strategies, including embedding-based and keyword-based searches. They found that relying solely on embedding search resulted in low recall. Therefore, they fuse various methods like keyword search, embedding search, knowledge graph insights, and Abstract Syntax Tree (AST) dependencies, often performing additional computation at retrieval time to achieve higher precision and recall.
  • The company leverages its infrastructure background, having built custom GPU virtualisation and inference systems. They have achieved FedRAMP High compliance, which was facilitated by keeping their systems "very tight" and managing dependencies. Their system processes over 500 billion tokens of code daily.
  • Windsurf is built on a fork of Code OSS (the open-source base of VS Code), allowing them to innovate while leveraging a familiar environment and existing extension ecosystem. They avoided Microsoft's proprietary components, necessitating the development of their own versions of essential extensions like language servers and remote SSH. To support other IDEs like JetBrains, Eclipse, and Vim, they developed a shared binary (language server) that performs the heavy lifting, enabling support across multiple platforms without extensive code duplication.
  • The development of their agent product, Cascade, was a result of models improving significantly, combined with advancements in their retrieval stack for larger codebases and the development of systems for quickly editing code based on high-level plans. They use Windsurf internally for development, with an "insiders developer mode" and a tiered release system to dogfood features and get immediate feedback.
  • Windsurf sees potential in enabling non-developers to build software, exemplified by their own internal use cases where non-developers build simple, often stateless, applications to replace bespoke SAS tools. This suggests a future where the reduced cost and effort of software creation could lead to a proliferation of custom internal tools, potentially disrupting certain categories of SAS products.
  • Regarding the future of software engineering, Varun is optimistic, countering predictions of massive job losses. He argues that the decreased cost of building software increases the return on investment, encouraging companies to build more technology, thereby increasing demand for engineers. AI tools are seen as increasing engineer efficiency and reducing friction and mental fatigue, allowing them to focus on higher-level problem-solving and collaboration. Great engineers are defined by their problem-solving skills and ability to navigate layers of abstraction, a skill that remains highly valuable.
  • Windsurf supports MCP (Modular Cloud Protocol) and sees its potential in democratising access to company systems within the coding environment and allowing secure internal integrations. However, there are ongoing questions about the right level of granularity for access controls and whether the current specification provides enough flexibility compared to the capabilities of LLMs for zero-shot integration.
  • On a personal note, Varun manages to incorporate endurance sports into his busy schedule by making exercise convenient (e.g., indoor cycling) and dedicating specific time slots. He recommends the book "The Idea Factory" about Bell Labs for its insights into balancing innovation with commercial goals.

Sound Bites:

  • "...technology actually increases the ceiling of your company much faster"
  • "...if you have a good test, it's a lot more easy to verify software"
  • "...fine-tuning was a bump but it was a very modest bump compared to what great personalization and great retrieval could do"
  • "...when you make a developer wait the answer, better be 100% correct"
  • "...a 10 millisecond increase in latency affects people's willingness to use the product materially"
  • "I would be totally fine if 50% of the bets we make don't work"

Comments

Popular posts from this blog

Kai-Fu Lee on China-US AI Race - Q&A Transcript from a Bloomberg Interview

The Mercurial Grok AI Assistant Understands & Speaks Indian Languages

40 Talks from the Google Web AI Summit 2025