Introducing Guard Rails: Spec Driven Development for your Coding Agent

A road with futuristic cars driving on a high way with rails and cameras overhead scanning cars to ensure they are where they're supposed to be.
Guard Rails uses gates to ensure tasks don't just disappear into the void, Gates prevent AI coding agents from auto-closing tasks.

I originally wrote this post completely differently, but after some feedback I decided to rewrite it entirely to better introduce GuardRails. I really like the term spec driven development as opposed to "vibe coding" because that better aligns with how I use Claude Code with Guard Rails, I didn't come up with the terminology but I will gladly adopt it.

So what does any of this mean? What is spec driven development? Are there other tools out there that accomplish the same outcome?

Spec driven development is a step up from simply vibe coding, and adopted from standard software engineering practices. Instead of asking for simple things, you give the model more to work with, there's a few different techniques to achieve this, and if done right, you can prompt very simple prompts and get higher quality outputs from your model. Think of a developer sprint, then condense it into a 30 minute session with an AI model.

What GuardRails does is it provides your AI model tooling to track tasks, think of Jira, ADO board, GitHub issues, whatever ticketing system you've used, it is now directly given to your AI model to use. This allows us to direct the AI model in interesting ways. Including ensuring that it does not forget tasks, because when Claude Code wipes old memories away, it can always talk to our ticketing system for critical details letting it run for hours endlessly.

Will it mess up? Sure, but what developer writes perfect code the first time? What do you do? What do developers do? They go back and fix the bug. You can either let Claude code as much as it can, asking it to commit changes per task accomplished, so you have a paper trail, or you can sit alongside Claude as it churns through GuardRails tasks.

So we've got a ticketing system, but why is this any different from Claude's TODO list or someone else's markdown file system? Well, that's where things get more interesting. I build GuardRails out of minor frustrations with Beads which is a fantastic project, if you choose not to try my interpretation on how to solve this problem, you should check out Beads. I went with the option of building my own because Beads is too married to git, it works with git hooks, and I may not always have a project thats on git, maybe its on SVN, or TFS, or literally just a zip file someone sent me.

Things I've also added so far include GitHub Issues synching. I use GitHub issues for my personal / private projects, and I figured maybe its worth adding it to GuardRails, and sure enough it was the best decision I made. It has two way synching too, so if you want to sync down a GitHub issue from a project, you can "claim it" meaning the tool will leave a comment on the GitHub Issue so anyone else who has GuardRails or similar will know not to touch the task. 

I'm going to run you down through my workflow as I have used Beads and now GuardRails. It's going to be a really simple rundown but I want to illustrate this with a brand new project.

As you can see, I have a folder called "comments" under Projects. Since I'm improvising, I'm going to just make this a React app we're going to write up with Claude Code using GuardRails. Since I feel too safe using GuardRails I'm going to run Claude in dangerous mode (and because for the sake of finally finishing this blog post I want to do this task much quicker).

Now here is where the magic begins. I'm going to ask Claude to start by initializing "GuardRails" installed globally as "gur" via:

go install github.com/Giancarlos/guardrails/cmd/gur@latest

I'll basically tell Claude some hints because gur is a new tool that is unknown to Claude, I'm basically nobody right. So here's the prompt I'm going to write:

I'm working on a brand new greenfield project. I want to build a very simple project let's start by setting up guardrails, which I installed using "go install" as gur, this globally accessible command will handle ticketing / tasks for you. Please run the help commands, once you've initialized it and copy a summary of how to use the tool into CLAUDE.md

Now that we've initialized GuardRails the real fun begins.

Showing the current file structure with guardrails installed and the CLAUDE.md file filled in.

So I prompt Claude with my initial prompt:

Alright let's create some tasks for the following project: I want to build a ReactJS application using raw CSS (so no third party libraries) with a material design inspired hand   crafted CSS, I don't have a back-end system I want to use for this, so lets just use the browser's localStorage to save comments. This is a comment system similar to what places   like YouTube or Reddit would have. I can leave a comment, and reply to a comment, it only nests once, so if I reply to someone in a nested comment it just nests on the parent not a    new nest, similar to YouTube. I want the username displayed. I want to ask the user for their username and for the comment. Create tasks for all of these things, but do not start   on any of them just yet.

This is what I call "brain dumping" into Claude. As you can see below, Claude figured out how to break up the tasks, and then fed them to gur.

Sometimes I ask Claude to do "market level research" for all these tasks, and fill them in, I also suggest Claude ask me clarifying questions. Remember when I said think of a developer sprint and condense it down to about 30 minutes? This is basically why.

Alright, can you do market level research on these tasks and commenting systems, and ask me any clarifying questions, and add more to the requirements, think of bugs to account for    and issues that each taks should consider. Give me a staff level engineer working on mission critical rocketry levels of detail.

This follow up prompt I get slightly more dramatic, I tell it what I expect, and then I suggest it thinks like a staff level engineer at a fancy rocketry company where the code is all deemed mission critical. It sometimes will spin up agents in parallel to handle multiple tasks.

It is at this time that I go grab a root beer or a cherry coke. Then come back and its usually done (I probably am done last, let's be real). Once this process is finished, and the model asks me any clarifying questions and it has essentially refined the stories, sometimes I ask for the full descriptions to review them myself, but for the sake of this demonstration on this blog pot I'm going to 100% vibe code and trust that Claude did it all correctly.

One thing I wish Anthropic would do, and I hope if you're reading this you would consider this, is let tools like mine and Beads take over the Claude Code UI for remaining tasks, I have no way to give the end-user visuals as tasks are still being worked on outside of the status bar, see below:

That last tangent was enough time that Claude finished with all 4 background workers.

Now I have two options:

  1. Ask Claude to do all these tasks, and do any in parallel when it can / as soon as it can.
  2. Ask Claude to add gates to these tasks that make sense.

Wait, what the heck are gates? Well that was one pain point for me with Beads. With Beads I can feed ideas, have Claude do research, update each Bead (what Beads calls an individual Task) and then it would just happily close them as completed without any sort of follow through with the human, or unit testing, or compilation checks. So I came up with a concept I called gates, since it sorta fits the theme name.

A gate is a hard dependency before closing any task. Every task must have exactly one gate, the gates must be completed before a task can be closed. Gates can be re-used, but the pass / failure per gate / task combination must always be unique, so if you re-use a gate it will have to pass again. Gates can be unit testing, testing that a build works or even just asking a human to validate the change.

So now that Claude finished, it is asking me clarifying questions I didn't even think about, I just went with all the default answers.

And here's everything I told Claude

Now that we've got every answer for Claude, I'm going to go ahead and ask Claude to add some gates for all of these tasks.

And then finally, I'll have Claude work on all the tasks to completion.

... and now I wait ... I will post the screenshot of the output.

And voila, as you can see I was too lazy to change terminals, so Claude took my intent and ran npm run dev for me.

As for the output, see for yourself below:

This was what I envisioned when I typed YouTube comments, with reddit style replies. Everything is hosted on localStorage for simplicity.

Then I finally run the command requesting to close everything, but to confirm with me if it needs my confirmation.

As you can see, the model is prevented from just closing tasks, it is given a solid reason why NOT to close it.

As you can see, you can basically go from nothing, to a prototype, with tasks, some research done by the model, and GitHub sync (which reminds me, I need to push that project to github, I forgot to commit as the model worked on it, but I can still push up all the tasks the model worked on to GitHub so you can see the outcome.

In the meantime, if this interests you, whether you've never used something like Beads, or maybe you have, but like me you wanted some things to be somewhat different, I welcome you to the world of GuardRails. It is fully open sourced, definitely "vibe coded" but the full spec was my design based on research and several years of experience developing software. I rather call it spec driven development, since I did pour in weeks of my own time doing research into different techniques.

Now while its mostly SQLite based, I am also exploring the idea of letting this tool run fully off of markdown files and letting you import / export to markdown files. I think this could make it more resilient. Other steps I would love to do is have others test it with different harnesses, I only have Claude Code, if some fancy AI company wants me to test their harness feel free to donate my a subscription for about a week for me to evaluate it, otherwise I'm not spending more money on models.

Till next time. See you on Hacker News. ;)

GitHub Repository:

https://github.com/Giancarlos/GuardRails