(Semi) spec-driven development

Creating a interactive data table with spec.md as a guide

Now I've heard a lot about one-shot coding with AI, but from my last experiment I thought that it was unlikely that, using Cursor the way I have been (i.e., entering the prompt with description of the end product) it would actually generate a usable prototype in one-shot.

In my last experiment, I mostly focused on CI/CD and Snakemake to create a reproducible workflow with a data source that is fetched daily, but in this case I wanted to understand how much efficiency a spec.md actually adds to the development process - this is partially inspired by Sebastian’s post here. Of course this depends on the complexity of the tool, but at the very least I’m no longer making ad hoc requests in the chat window, which, as I understand, tied to the context window and therefore subjected to limitations (e.g., forgetting discussions and changes made in the middle of the “window”). While none of the things I’m prototyping are complex enough that I’ve hit the context window limit, keeping track of the spec changes I make myself is already quite difficult.

My last project was more complex in that it had 3 components (CI/CD to automate daily data fetching + hosting it on GitHub Pages + building it as Shinylive app). In the present case, I wanted to simply create an interactive data table where one can select filters to inspect the data in a PyShiny app and host it on Posit Cloud. This was done in ~45 minutes, I started coding at 9am, got it working by 9:45am, and deployed it on Posit Cloud by 10am.

However, I wanted to make specific changes to the UI, and I ended up spending more than 4 hours trying to make it work. Of course, this is in part due to my lack of experience with implementing PyShiny, but I was surprised that how much time I spent i) figuring out what are possible choices in the design space and ii) debugging the UI/front end design instead of the backend.

DataTable design choices

PyShiny has a .DataGrid output feature (see documentation) that enables users to filter the dataset using embedded headers. However, showing all the rows at once was affecting performance, so it made sense to instead turn off DataGrid(filters = FALSE). It also made more sense to use a custom workflow where a separate UI (outside of the data grid) is used to apply filters to the dataset. In this case, a reactive.event updates the active_rulesand re-renders the data table so that DataGrid displays the filtered dataset.

I thought there might be a way to progressively load the dataset as the filters were applied, but it turns out not be the case as I had a priori determined the number of rows to be displayed. So regardless of the filters I had applied, it would only return it first N rows alphabetically - even when I had adjusted for it to account for the filtered station, when I progressively filtered for the year range, nothing showed up, as seen below,

AI is still no good at design

Eventually I had settled on reverting it to a previous design (using a reactive.event to trigger filtering); however, there was no way to move the data table next to the search/filter UI, and after several attempts I’ve decided to leave it as it is. I echo the sentiments shared in this Linkedin post from Feb 2025 that AI is still no good at design, and it affirmed my experience that it might have little to do with me being a novice software designer (in the sense I am not a SWE in industry) - despite changing my prompts with multiple attempts it could not make the simple change of moving the data table to the right of the UI.

Coupling between spec and generated code

I’m not sure how much quicker I could have been if I had entered better prompts or if had included agents.md or skills.md files. I reckon they may be unnecessarily since this is just a tiny experiment with limited scope, and my limiting factor is how much I already know about PyShiny. Although my guess is I could have benefited is if the agent had presented several options or examples from the PyShiny documentation, and explicitly asked which of those packages (e.g., DT, gt, reactable, or other design options) or alternative design choices would best fulfill my goals.

The nice thing about using spec.md is that even without a plan.md I could keep track of what has been implemented / TODOs by asking it to break it down into chunks and tell me i) what are the remaining tasks and ii) if I had made any ad hoc requests in the chat window, ask it to compare to the spec.mdto figure out what is different, and ask it to list the changes made so I can put them back in the spec.md . While surely I could have asked it to make changes directly to spec.md , but this is where I prefer human interference so I am still in the driver seat and actually re-reading what is in spec.md rather than having the AI just update it automatically for me. Otherwise, I am not sure what is the added value of having the spec.md other than it preserves context for the AI throught the project, but I would say the human programmer is relegating too much of their agency to the AI then without a closely being involved in the loop of validation and review.

The closest analogy I can think of when working with agents effectively is collaborating with short-term interns in the lab - we don’t expect them to deliver the cleanest or most efficient code since it is a learning experience for them. But they need to have a baseline level of competency. Going back and forth over spec.md is akin to having a meeting doc + README where we iterate over design choices. So overall, I cannot say working with agents is more or less efficient than what I am used to when working with interns, as many others have echoed. There is a lot of back and forth between reviewing the spec and ensuring it is the code base is still in sync with the specifications in it. Still, this is faster than what I could have come up with on my own at the first attempt going from starting a repo to deployment in ~1 hour.

Technical Debt and Documentation Debt

However, I do worry about acruing technical debt as the project grows. Beyond technical debt, there is also documentation debt - given the changes made the no longer few lines of code but larger chunks of features made possible by discarding various portions of the specs, I wonder how much documentation then becomes necessary for future reference. The kind of documentation debt I find more concerning is related to workflow. At this point, my repo doesn’t include any information about how the data is fetched, although one can say it could be self-explanatory and many codebases actually don’t explicitly state how the data was obtained. But I think if humans already are sloppy with documentation, we will only continue to be sloppy, or sloppier with AI.