I find that even though this isn't standard, that these -cli tools will scan the repo for .md files and for the most part execute the skills accordingly. Having said that, I would much prefer standards not just for this, but for plugins as well.
Standards for plugins makes sense, because you're establishing a protocol that both sides need to follow to be able to work together.
But I don't see why you need a strict standard for "an informal description of how to do a particular task". I say "informal" because it's necessarily written in prose -- if it were formal, it'd be a shell script.
I mean, it'd be good if these tools followed the xdg base spec and put their config in `~/.config/claude` e.t.c instead of `~/.claude`.
It's one of my biggest pet peeves with a lot of these tools (now admittedly a lot of them have a config env var to override, but it'd be nice if they just did the right thing automatically).
Eventually, you can standardize what you don't understand
The problem I see now is that everyone wants to be the winner in a hype cycle and be the standards bringer. How many "standards" have we seen put out now? No one talks about MCP much anymore, langchain I haven't seen in more than a year, will we be talking about Skills in another year?
Why do I want to throw away my dependency management system and shared libraries folder for putting scripts in skills?
What tools do they have access to, can I define this so it's dynamic? Do skills even have a concept for sub tools or sub agents? Why do I want to put references in a folder instead of a search engine? Does frontmatter even make sense, why not something closer to a package.json in a file next to it?
Does it even make sense to have skills in the repo? How do I use them across projects? How do we build an ecosystem and dependency management system for skills (which are themselves versioned)
> They are more than that, for example the frontmatter and code files around them.
You are right. I have edited my post slightly.
> Why do I want to throw away my dependency management system and shared libraries folder for putting scripts in skills?
You don't have to put scripts in skills. The script can be anywhere the agent can access. The skill just needs to tell the LLM how to run it.
> Does it even make sense to have skills in the repo? How do I use them across projects?
You don't have to put them in the repo. E.g. with Claude Code you can put project-specific skills in `.claude/skills` in the repo and system-wide skills in `~/.claude/skills`.
2. The spec / docs show people how to put code in a subdir. While you can reference external scripts, there is a blessed pattern that seems like an anti-pattern to me
3. generalize: how do I store, maintain, and distribute skills shared by employees who work on multiple repos. Sounds like standard dependency management to me. Does to some of the people building collections / registries. Not sure if any of them account for versioning, have not seen anything tied to lock files (though I'd avoid that by using MVS for dep selection)
Agreed. I think being overly formal about what can be in the frontmatter would be a mistake, but the beauty of doing this with an LLM is that you can pretty much emulate skills in any agent by telling it to start by reading the frontmatter of each skills file and use that to decide when to read the rest, so given that as a fallback, it's hardly imposing some massive burden to standardise it a bit.
I see it similar to browser user-agents all claiming to be an ancient version of Mozilla or KHTML. We pick whatever works and then move on. It might not be "correct," but as long as our tools know what to do, who cares?
My repos are littered with agent-specific files containing “treat this other file as if it were this one.” We’re moving so fast on so many fronts, and it seems odd that this is the persistent problem. It doesn’t even help lock folks into one agent, so I’m not clear why the industry hasn’t yet standardized on one project-specific file name yet.
That doesn't work very well if your developers are on Windows (and most are). Uneven Git support for symbolic links across platforms is going to end up causing more problems than it solves.
It's why I wrapped my tiny skills repo with a script that softlink them into whichever is your skills folder, defaulting to Claude, but could be any other.
I treat my skills the same as I would write tiny bash scripts and fish functions in the days gone to simplify my life by writing 2 words instead of 2 sentences. Tiny improvement that only makes sense for a programmer at heart.
> It's in Java, but the lessons can be applied in every language.
I can only discourage anyone from applying Java patterns all over the place. One example in JavaScript: There was a functionality that required some parameters with default values. The plain solution would have been:
function doStuff({ x = 9, y = 10 } = {}) { ... }
Instead, they created a class with private properties and used the builder pattern to set them. Totally unnecessary.
- Everything locally stored in the repo: PRs, comments, issues, discussions, boards, ...
- CLI first
- Offline first (+ syncing)
- A website for hosting/presentation
Noted :) In another comment I linked to beads, which is a cool project to keep your issue tracker in your repo, but that's just a personal thing, no comment on what the company plans to do (or not) in this area.
I use command-line tooling much more than IDEs (e.g. VS Code), so the `gh` command-line tool (https://cli.github.com) for doing most of the usual hub-oriented workflow (PR authoring, viewing issues, status updates, etc) really helps a lot - I don't have to constantly <cmd>+<tab> to my browser, and point-click-point-click through web pages so much. It would be fantastic if ersc or any other jj-centered code-sharing hub had similar tooling early on.
When I tried Fossil it had things weirdly separated.
I was expecting when I make a commit, I would have the facility to specify what issues it addressed and it would close them for me automatically. It seemed there is so much opportunity there to "close the loop" when the issue tracker, etc and integrated in your VCS, but it wasn't taken.
That's my favourite thing about fossil though. History is what it is, not simplified to look "clean" (i.e. hide what actually happened and when) and you get a lot fewer footguns to ruin everything by accidentally rebasing things to the wrong place without noticing.
I have huge respect for Mitchell, it's impressive what he achieved.
I agree with all the points of this article and would like to add one: Have a quick feedback loop. For me, it's really motivating to be able to make a change and quickly see the results. Many problems just vanish or become tangible to solve when you playfully modify your source code and observe the effect.
This perfectly aligns with my experience.
Every large project I have worked on showed a clear correlation between the ease of setup and running and the number of problems on the project, like bugs and missed deadlines.
Totally agree. I work in LLM training software and I believe progress in the field is actually much slower than it should be because of the excruciatingly long feedback loops involved in development. The software stacks are deep and abstract and much of the testing involves full integration tests that take a long time to spin up.
Interesting. What aspects of the development workflow/cycle have the most room for improvement (i.e. is there ranking of the "height" of the "hanging fruit" throughout the process)? What sort of software tooling would help?
YES that is one of the all-time most inspiring talks I've ever seen. DX is so important. I got a taste for this kind of thing when I first encountered LiveReload (circa 2012?) and radically upgraded my and my team's webdev workflows.
E2E tests in a high ratio to other tests will cause problems. They’re slow and brittle and become a job all on their own. It’s possible that they might help at the start of debugging, but try to isolate the bugs to smaller units of code (or interactions between small pieces of code).
Hermetic e2e tests (i.e. ones that can run offline and fake apis/databases) dont have that problem so much.
They also have the advantage that you can A) refactor pretty much everything underneath them without breaking the test, B) test realistically (an underrated quality) and C) write tests which more closely match requirements rather than implementation.
> i.e. ones that can run offline and fake apis/databases
I can see a place for this, but these are no longer e2e tests. I guess that’s what “hermetic” means? If so it’s almost sinister to still call these e2e tests. They’re just frontend tests.
> A) refactor pretty much everything underneath them without breaking the test
This should always be true of any type of tests unless it’s behavior you want to keep from breaking.
> B) test realistically (an underrated quality)
Removing major integration points from a test is anything but realistic. You can do this, but don’t pretend you’re getting the same quality as a colloquial e2e tests.
> C) write tests which more closely match requirements rather than implementation
If you’re ever testing implementation you’re doing it wrong. Tests should let you know when a requirement of your app breaks. This is why unit tests are often kinda harmful. They test contracts that might not exist.
> try to isolate the bugs to smaller units of code (or interactions between small pieces of code).
This is why unit tests before e2e tests.
It's higher risk to build on components without unit tests test coverage, even if the paltry smoke/e2e tests say it's fine per the customer's input examples.
Is it better to fuzz low-level components or high-level user-facing interfaces first?
IIUC in relation to Formal Methods, tests and test coverage are not sufficient but are advisable.
Competency Story: The customer and product owner can write BDD tests in order to validate the app against the requirements
Prompt: Write playwright tests for #token_reference, that run a named factored-out login sequence, and then test as human user would that: when you click on Home that it navigates to / (given browser MCP and recently the Gemini 2.5 Computer Operator model)
And I would add that e2e tests should be more about the businesses rules. Making sure everything is there for a specific flow and not caring that much about the intricacy of things. And such, it should really be part of Ops, not Dev.
Quick feedback with unit tests can help. It can be a pain to decouple stuff so you can test them better, but it’s worth it IMO.
This might be said in jest. But does everything have to be for world domination? Is the guy not allowed to have actual hobby projects? That go just where he fancies, including potentially nowhere at all really...
reply