Why Teams Should Be Building Coding Taste

There is something interesting happening in AI-assisted coding that I think most teams are getting wrong.

The standard workflow goes like this: you prompt an LLM, it produces code, you fix the code, you prompt it again, it produces slightly different code, you fix that too. Repeat. The LLM never updates its priors about you. It has no representation of your preferences.

1┌─────────┐    ┌──────────┐    ┌──────────┐    ┌──────────┐
2│ Prompt  │───▶│ LLM Code │───▶│ You Fix  │───▶│ Prompt   │
3└─────────┘    └──────────┘    └──────────┘    └──────────┘
4                                     │
5                                     ▼
6                            (signal falls on floor)

Every session starts from scratch, sampling from the same internet-scale distribution of "average code." You are, in effect, a stateless correction function in an unrolled loop that never converges.

This is surprisingly wasteful. And I think the fix is not better models. It is better conditioning.

What is coding taste?

Let me be precise about what I mean by "taste" because the word sounds subjective and fuzzy, and the concept is actually neither.

When you write code, you make hundreds of micro-decisions per hour. Variable naming conventions. When to extract a helper. Whether to use named or default exports. How you structure error handling. Whether you reach for commander or yargs or roll your own arg parser. ISO dates vs. locale strings. Tabs vs. spaces (ok, that one is subjective).

Each of these decisions, individually, is almost irrelevant. Collectively, they define your engineering identity. They are the residual of years of building, debugging, refactoring, and shipping. They encode hard-won intuitions about what makes code maintainable, readable, and correct in practice rather than in theory.

I want to call this a "style prior" a learned probability distribution over code decisions that is specific to an individual or team. LLMs have their own style prior, learned from the entire internet. Yours is different. And yours is almost certainly better for your codebase.

The information-theoretic argument

Here is a simple way to think about this. Consider the mutual information between:

The code an LLM generates for you
The code you actually want

Without any personalization, the LLM is sampling from P(code | prompt). With taste, it samples from P(code | prompt, taste). The taste variable reduces the entropy of the output distribution. It is literally additional bits of information that constrain generation toward what you want.

1Without taste:        With taste:
2┌──────────────┐     ┌──────────────┐
3│ ◆ (target)   │     │ ◆ (target)   │
4│              │     │              │
5│  ••••••••••  │     │    •••••     │
6│  ••••••••••  │     │    •••••     │
7│  ••••••••••  │     │    •••••     │
8└──────────────┘     └──────────────┘
9  wide scatter        tight cluster

The key insight is that these bits are already being generated -- every time you accept, reject, or edit AI-produced code, you are producing a supervision signal. The question is just whether anyone is collecting and using it. Right now, for most teams, the answer is no. You generate the signal, and it falls on the floor.

This is like running gradient descent but throwing away the gradients after each step. You can technically still get somewhere by random search, but you are leaving enormous amounts of information on the table.

Why this matters at the team level

Individual taste is interesting. Team taste is where this gets really powerful.

A codebase is a shared artifact, and the best codebases have a strong prior -- a consistent set of conventions, patterns, and micro-decisions that make the code feel like it was written by one person even when twenty people contributed. This is what people mean when they talk about "code quality" in a way that goes beyond "does it pass the tests."

Traditionally, teams try to encode this prior in three ways:

Linting rules. These capture the syntactic subset of taste: formatting, import ordering, naming patterns. They are great at what they do, but they only cover maybe 10% of what taste actually is. You cannot write an ESLint rule for "extract validators into separate files" or "prefer domain-specific error types over generic Error."

1Taste Coverage:
2[████░░░░░░░░░░░░░░░░░░░░░░░░] Linting (10%)

Style guides and documentation. These attempt to capture the semantic subset. They are useful when first written and then decay monotonically. I have never seen a style guide that was fully up to date. The half-life of a style guide's accuracy is maybe six months. After that, the guide and the codebase diverge, and new developers have to figure out which one to trust (hint: trust the code).

1Style Guide Accuracy Over Time:
2100% ┤ ╱─────
3     │╱       ╲
4  50% ┤        ╲___
5     │            ╲___
6  0% └─────────────────── (6 months)

Code review. This is actually the best mechanism teams have today. A senior engineer reviews a PR and says "we don't do it that way here." The problem is this doesn't scale. Senior engineers are expensive. Review cycles are slow. And the knowledge transfer is lossy: the reviewer catches the big stuff but the micro-decisions slip through.

What you want is something that captures taste from the full signal surface -- not just what gets written down in a linter config, but the entire space of accept/reject/edit decisions -- and applies it at generation time. You want the taste to be a first-class object that updates continuously, is shared across the team, and conditions every line of AI-generated code.

The compounding dynamics

The reason I think this matters now and not in some abstract future is the compounding effect.

If you use an AI coding agent for 8 hours a day and it generates code that requires, say, 3 edits on average before it is acceptable, you are spending a significant fraction of your day doing corrections. If after a week those 3 edits drop to 1.5, and after a month they drop to 0.4, the productivity difference is enormous. Not 2x. More like 5-10x. Because it is not just the edits that disappear. It is the cognitive overhead of context-switching between "what the AI wrote" and "what I want." When those converge, you stop reviewing and start flowing.

1Edits Required Per Day:
23.0 ┤ ●
3    │  ╲
41.5 ┤   ●
5    │    ╲
60.4 ┤     ●
7    │
8    └─────────────── (week 1 → 4)
9
10Productivity Gain: 5-10x ^

And here is the part that gets interesting for teams: taste is shareable. If a senior engineer's taste profile captures their years of accumulated micro-decisions, a junior engineer can pull that profile and immediately benefit from constraints they would have taken years to develop on their own. This is not the same as the junior engineer having the senior's judgment -- taste is patterns, not reasoning. But it closes a meaningful gap.

Think of it as knowledge distillation, not from a large model to a small model, but from a senior engineer's brain to a team-wide constraint system.

What the architecture probably needs to look like

I have been thinking about what kind of system can actually do this, and I think the answer is something hybrid -- part neural, part symbolic.

Pure fine-tuning does not work here. It is too expensive, requires too much data, and does not update in real time. You are not going to retrain a model every time a developer rejects a suggestion.

RAG (retrieval-augmented generation) gets you part of the way. You can retrieve similar past code and put it in context. But the model still generates in its default style. Retrieval gives you relevant content but not the right form.

What you actually want is a symbolic constraint layer that sits between the user and the model. Something that encodes patterns as explicit, inspectable, updateable rules: rules that are learned from behavior, not manually written. The symbolic layer is lightweight (easy to update in real time), interpretable (you can inspect what it learned), and compositional (you can combine taste profiles from multiple sources).

1┌──────────────────────────────────────┐
2│         Your Prompt                  │
3└──────────────────┬───────────────────┘
4                   │
5        ┌──────────▼──────────┐
6        │ Taste Constraints   │ ◄─── Accept/Reject/Edit
7        │ (Symbolic Layer)    │      Signals
8        └──────────┬──────────┘
9                   │
10        ┌──────────▼──────────┐
11        │ LLM Generation      │
12        │ (Neural)            │
13        └──────────┬──────────┘
14                   │
15        ┌──────────▼──────────┐
16        │ Your Code           │
17        └─────────────────────┘

This is a neuro-symbolic approach, and I think it is the right one for this problem. The neural component handles generation capability: writing code that compiles and works. The symbolic component handles alignment: making sure that code matches your patterns.

The important property is that the symbolic layer updates on every interaction. Accept? Reinforce the pattern. Reject? Weaken it. Edit? Extract the delta and learn from it. No batch training. No scheduled updates. Continuous adaptation.

The taste.md idea

One thing I find appealing is the idea of storing learned taste in a human-readable format. Something like:

1## Error Handling
2- Use typed error classes. Confidence: 0.85
3- Always include error codes. Confidence: 0.90
4- Log to stderr, not stdout. Confidence: 0.75

Each learned preference has a confidence score that updates over time. You can read it, understand it, and correct it if the system learned something wrong. This is the kind of interpretability that I think is essential. If your coding agent makes a decision you disagree with, you should be able to trace it back to a specific learned preference and override it.

1Interpretability Comparison:
2
3taste.md:
4┌─────────────────────────────┐
5│ Use typed errors (0.85)     │ ◄─── You can read & edit
6│ Include error codes (0.90)  │
7│ Log to stderr (0.75)        │
8└─────────────────────────────┘
9
10Fine-tuning:
11┌─────────────────────────────┐
12│ [7B parameters]             │ ◄─── Black box
13│ ███████████████████████████ │
14│ ███████████████████████████ │
15└─────────────────────────────┘

Compare this to fine-tuning, where the learned behavior is buried somewhere in 7 billion parameters and you have no idea why the model decided to use camelCase instead of snake_case.

Why now

Three things are converging:

LLMs are good enough at code generation. The base capability is there. The bottleneck is no longer "can the model write correct code" but rather "can the model write code I want."

Developer-AI interaction is generating massive signal. Every day, millions of developers accept, reject, and edit AI-generated code. This is a rich supervision signal that is almost entirely unused.

The infrastructure for continuous learning is maturing. Real-time constraint updating, symbolic reasoning layers, efficient conditioning mechanisms -- these are all tractable problems now.

The teams that figure out how to capture and use coding taste will have a compounding advantage. Their AI tools will get better every day. Their codebases will stay consistent. Their onboarding will be faster. And their developers will spend less time fighting the AI and more time building.

What to do about it

If I were leading an engineering team right now, here is what I would do:

Start paying attention to the correction signal. Every time your developers edit AI-generated code, that is information. Right now you are throwing it away.

Look at tools that learn from behavior rather than requiring manual configuration. Rules files are a start, but they are a point-in-time snapshot. You want something that updates continuously.

Think about taste as a team artifact, not just an individual one. The best teams have a shared sensibility about code. Make that sensibility explicit, shareable, and enforceable.

And finally, measure correction loops, not just output volume. The metric is not "how much code did the AI generate." It is "how much of that code survived review unchanged." That is the number that tells you whether your AI tools are actually aligned with how your team works.

The era of generic AI code generation is ending. The next phase is personal. And the teams that get there first will compound their advantage every single day.

Why Teams Should Be Building Coding Taste

What is coding taste?

The information-theoretic argument

Why this matters at the team level

The compounding dynamics

What the architecture probably needs to look like

The taste.md idea

Why now

What to do about it

More insights

Introducing /goal: The first step toward autonomous loops

6.9B tokens, 83,022 agent runs, $156: 3 weeks of Command Code

The $1 line in a $281 stack: Command Code is "Insane ROI"