Every workflow deserves a skill (and other lies I told myself)

I have a problem. Somewhere around my fifteenth custom skill for my AI agent, I started to wonder if I'd crossed a line.

It started innocently. I set up OpenClaw, named my bot Boo, and within a few days I'd built a skill to format my standup notes. Then one to manage my task list. Then one for summarizing Slack threads. Then meeting agendas. Then git flows. Every time I found myself prompting Boo the same way twice, my brain said: that should be a skill.

I wrote about the three building blocks of a real agent a recently, and skills were one of them. I still believe that. But I'm starting to think there's a version of the skills story nobody's talking about yet: what happens when you build too many.

The instinct and its limits

The whole point of skills is to encode repeated workflows so you don't re-explain them every session. Anthropic's Tariq shared how they think about it internally: skills are for repeated workflows, prompts are for one-off instructions. Clean distinction.

In practice, the line gets blurry. Is a prompt I've used three times a repeated workflow? What about something I do regularly but slightly differently each time? I kept defaulting to "just make it a skill," and before I knew it I had this sprawling collection of markdown files, each encoding some micro-workflow that felt important when I wrote it.

Some of them I haven't triggered in weeks.

There's a finding that put this in perspective. Researchers looked at whether agents perform better with skills, and the answer is yes, with a massive caveat. Human-curated skills, built from real expertise about where things break, improved task completion by about 16 percent. Skills that agents generated for themselves? Slightly worse than having no skills at all.

The value of a skill isn't the format or the automation. It's the judgment baked into it. The gotchas. The "don't do this because it breaks in production" warnings. Stuff that only comes from a human who's been burned.

What earns its keep

After going through this cycle of build-everything, question-everything, I've started to see which skills actually matter.

They encode expertise that isn't obvious. Not "how to format markdown" but "how to format our markdown, with the gotchas that'll bite you on deploy." Tariq's post has a whole taxonomy: library references, product verification, data fetching, runbooks. The highest-value ones capture knowledge that lives in people's heads and nowhere else.

They verify work. Anthropic has a whole category for skills that test whether the agent's output actually works. Tariq made the point that it can be worth having an engineer spend a full week making verification skills excellent. That's a big investment for a markdown file. But if it catches bugs that would otherwise ship, it pays for itself.

And they get better over time. You keep adding gotchas as you discover new edge cases. A skill that stays stable is more valuable than one that ships features every week but breaks old workflows.

The ones gathering dust in my collection? Those automated something obvious. Things Claude already knows how to do. I was writing skills for the sake of having skills.

The line I'm drawing

I haven't deleted the excess. But I've stopped building new ones reflexively. Now when I catch myself prompting the same way twice, I ask: is there real expertise to encode here, or am I writing down what Claude already knows?

More often than not, the answer is "just prompt it." And that's fine. Skills and prompts aren't competing. The skill is for when you've got hard-won judgment to preserve. The prompt is for everything else.

The skill ecosystem is growing fast. Over 96,000 on SkillsMP, 13,000 on OpenClaw, 1,500 on Claude Code. That growth carries the same risk as every plugin marketplace before it: a lot of noise, some real gems, and a constant temptation to install one more thing.

The agents that work best won't be the ones with the most skills. They'll be the ones with the right skills, built by people who know what they're encoding and why. That's the part that can't be automated.