Scott Hanselman explains OpenClaw (formerly Clawdbot)
Annotated summary / transcript of Scott Hanselman’s video explaining modern AI agents like Clawdbot / OpenClaw / Moltbot (2025)
This will be a bit of a yap, but a lot of people have been asking me about Clawdbot, which was MoltBot, now called OpenClaw. And it is an 1 AI-powered assistant that feels like Jarvis from Tony Stark. It feels like the Siri or Alexa we were promised. And it's got a lot of people really excited. They're starting to make up stuff. Some tech journalists think they are becoming conscious... some AI grifters saying it's 2 AGI, and it's all nonsense. But let's talk about it.
So, very large 3 language models – Large Language Models are called 4 Generative Pre-trained Transformers – are basically doing next-token prediction. And they do that based on a whole lot of context. So if I say to you, "It's a beautiful day, let's go to the...", you might say beach or park. But that's a small amount of context. It's a beautiful day, let's go to the... I didn't say where you are, I didn't say what time of day it is. But for the most part, we can assume that the statistically most likely next word is going to be beach or park. But if I'm somewhere else, or it's night, it might be pub or movies or the mall, something like that. That's basic ChatGPT.
Now with ChatGPT, you've got a huge amount of context. And that 5 context window can vary. It could be context about who you are, where you are, whatever you choose to share with them. With things like Claw Code, which is using pre-trained transformers and next-token prediction in order to generate code for you, there's usually a markdown file, Claw.md, something like that. GitHub Copilot has Copilot instructions. And that provides context about the thing you're working on.
Because these calls are effectively stateless, every single time you make a call out to one of these, you have to pass in all this context. So why is context important? If I said, "My wife, it's a beautiful day, let's go to the...", we've been married for 25 years, there's 25 years of context there. So we are not going to go to beach or park. We're gonna go to the bagel shop, because that's where we go, 'cause that's what we enjoy doing. She has context.
Now then, when a pre-trained transformer, LLM, has a 6 tool, the ability to have arms and legs, that means I could say to the LLM, "Hey, look at this text file." Now before, you would say, "Hey, look at this text file," and you'd paste it directly into ChatGPT or Copilot. But now it can do what's called 7 tool calling. You can say, "Look at this text file in my downloads folder," and it will then call some code, read a file, and look at that actual file. So that's the basics of tool calling.
What's interesting about that is then you can add what are called 8 MCP servers, which are effectively USB for AI. Let's you plug in stuff. Those are just little programs that provide N number tools, where N is some number greater than one. So I could say, "Hey ChatGPT, check my email." With your lawyer's permission to read your email, you can give ChatGPT via OAuth the ability to read your email. And then say things like in Gemini or Code Cloud Coder, any of those tools, and say, "Read my email, summarize them, and maybe put the ones that are most interesting at the top." It will go make a call, inject into context, increases the information available to make an appropriate judgment.
And then if you asked it a question like, "It's a beautiful day, let's go to the...", it has all of that context available to it to make a decision. So what does that have to do with Open Claw / Claw Bot / Mold Bot? Well, Claude Code in GitHub Copilot and coding agents like it operates in a loop where they can call tools – not just reading files, but they can write code and run that code, then loop again.
So for example, if I have a blood sugar system – I am diabetic, so I have a system in the cloud that manages my blood sugar, and I can visit that website to see my blood sugar. It comes from my sensor. If I said, "What is my blood sugar?", obviously it doesn't know that. ChatGPT is not training on my blood sugar. My blood sugar is a thing happening right now. And I would say, "How do you use blood sugar system?" "I use Night Scout." "Oh, I can Google for Night Scout." Then go to Google for Night Scout, it will learn the 9 API – Application Programming Interface. So now it knows Night Scout token that is unique to you and let me access it again.
Then it would loop again and make a call in HTTP GET or REST API call to this URL with this token, get my blood sugar back, get back some 10 JSON – JavaScript Object Notation – grab out the blood sugar and say, "Your blood sugar is 100." And that feels magical. That is what is interesting about this one thing – to have a next-token prediction. It is another to add tools. Yet another to allow it to write its own tools and then run them.
Now this is both incredibly interesting and powerful and dangerous. So you are hearing about people getting pwned, basically nailed, because they don't have control of these agents. The agent might do something depending on how assertive it chooses to be. They are not sentient. This is not how we get 2 AGI or Artificial General Intelligence. This is just really aggressive, assertive loops where you give it an amount of agency or ability that might seem challenging.
Because the Lego pieces that it has available to it now are effectively unlimited, because it has all open-source software available. Example: I want to go to a restaurant and say, "My agent, make me dinner reservation at Chipotle." Chipotle takes reservations? Then go and see what is an API like blood sugar system has a documented API. Does Chipotle have an API? No. OpenTable that goes Google, "How do I make a reservation at Chipotle?" Where it gets weird into the uncanny valley of AI – uncanny valleys make you go "oh, that didn't feel good" – would be, would the agent think it was a good idea to download a text-to-speech voice API like a Whisper or something like that, and then find an API like Skype or Twilio, make a phone call on your behalf, clone your voice, talk to you – talk on behalf of you – and then make the reservation for you.
These things are basically based on how enthusiastic or how much reinforcement learning like. If you say, "Make me a reservation by any means necessary, use all tools available to you," do not stop until I get my reservation, then it might think formatting your hard drive is a problem or emailing your parents and asking them to do it for you, because it's an ambiguous loop. What I call ambiguity loops. You have given it a problem to solve, and you have told it to solve by any means necessary, like the Terminator. We all know how that ends. It will do whatever it can within its ability.
For example, I was testing Open Claw when I told it that I was diabetic. Now it knows and checks my blood sugar, and it texted me when my blood sugar gets high or low. It seemed very aggressive. It Googles and knows – again, this is, I'm not anthropomorphizing it. It doesn't have a personality or any kind of actual brain – but it knows that lows are more dangerous than highs in the short term. So it was texting me over and over and over. If I had told it "by any means necessary, wake my ass up," it could check my email, found her phone number, figured out some way to text her a wave file or do something.
So I had to tell it, "Chill out. I'm a grown-ass man and know what I'm doing. Just text me." It's good. Those things are kind of with the call emergent behaviors that make you think it's smarter than it is. But if you go back to the root of an agent in AI agents is next-token prediction that can sometimes feed upon itself. And when you give the ability to write its own tools, that then causes a feedback loop. It can be freaky.
And that's why YOLO mode is probably not a good idea for these. You don't want to give them full access to your email or bank accounts. There is an idiot right now on Twitter who says he will give this bot access to his account and say "buy anything, do anything, go ahead, whatever it takes." That's stupid. It's going to order a body and jump into it, kill us all? Not really, but it's gonna wreck havoc on this gentleman's life.
So stay woke. Protect your neck. Wu-Tang Financial.
Other Comments:
This is a toy by one person that made it for himself and he’s thinking about the future rather than thinking about Security. At some point we’ll see these things running in sandboxes with checks and balances.References / ELI5 Definitions
- AI-powered — Like having a very smart robot helper that can understand sentences and reply almost like a human, but it's just clever computer math, not real thinking.
↑ back - AGI (Artificial General Intelligence) — A dream super-smart AI that can do any thinking job a human can (cook, write books, do surgery, argue politics). We don't have it yet — today's AIs are only good at specific things.
↑ back - language models — Giant computer programs trained on almost the entire internet so they can guess the next word in a sentence very well (like the best auto-complete ever).
↑ back - Generative Pre-trained Transformers — The tech family name of models like ChatGPT, Gemini, Claude. "Generative" = makes new text, "Pre-trained" = learned from huge text first, "Transformers" = the math that lets them pay attention to many words at once.
↑ back - context window — How much of the conversation or instructions the AI can "remember" at one time when answering you. Bigger window = smarter and more accurate replies.
↑ back - tool / tool calling — Giving the AI permission to use outside programs (read email, search the web, run code) instead of just chatting.
↑ back - MCP servers — Small plug-and-play programs that give AI new superpowers/tools (like USB sticks for AI abilities). Stands for Model Context Protocol or similar in agent systems.
↑ back - API (Application Programming Interface) — A polite menu that lets one computer program ask another for information (like asking Swiggy "give me restaurant list" instead of scraping the website).
↑ back - JSON (JavaScript Object Notation) — A very simple, easy-to-read format computers use to send organized data (like a shopping list written with curly braces: name, price, quantity).
↑ back
Comments
Post a Comment