rules_jsonschema
API reference, generated from the module’s .bzl docstrings (stardoc).
RFC-001 — Codegen Plugin Protocol
Status: draft, revised. Captures the architecture pivot from “Rust-binary-per-output-language” to “per-language plugins reading the schema directly via a minimal stdin/stdout contract”.
Earlier drafts of this RFC proposed a protoc-style architecture with a frontend, a parsed AST proto, and a dual
ast/rawplugin mode. That design was abandoned because (a) JSON Schema is already JSON — every plugin language can parse it directly — and (b) most realistic plugins wrap upstream tools (typify,atombender/go-jsonschema,oapi-codegen, …) that have their own parsing anyway. The AST was a small spec language we’d be inventing for marginal benefit. See “Why we abandoned the AST” below for the full reasoning.
Goal
Decouple rules_jsonschema’s user-facing rules from a hardcoded codegen language. After this RFC lands, adding a new output language is:
- Write a plugin binary in that language so it leverages native
AST tooling —
go/formatfor Go,quote/synfor Rust,ts-morphfor TypeScript. - Register a
jsonschema_codegen_toolchainpointing at it. - Add a
jsonschema_<lang>_libraryuser-facing rule that wraps the target language’s*_libraryBazel rule.
The plugin reads the schema bytes from stdin, options from argv, writes the generated file content to stdout, and signals errors via stderr + exit code. No protobuf dep, no AST proto, no frontend binary. Stdlib-only plugins are achievable in any language.
The contract
A plugin is any executable that conforms to:
INPUT
stdin the schema file contents (raw bytes)
argv --key=value pairs, repeated. Plugin-specific.
The rule may also pass standard flags it owns.
OUTPUT
stdout the generated file content (raw bytes)
stderr diagnostics / error messages
EXIT
0 success — stdout is the generated file
non-zero failure — stderr explains why
That’s it. A plugin in Go is:
package main
import (
"encoding/json"
"io"
"os"
)
func main() {
schemaBytes, _ := io.ReadAll(os.Stdin)
var schema map[string]any
if err := json.Unmarshal(schemaBytes, &schema); err != nil {
fmt.Fprintln(os.Stderr, "parse:", err)
os.Exit(1)
}
// ... generate Go source from schema ...
os.Stdout.Write([]byte(generated))
}
A plugin in Rust is the same thing with serde_json. A plugin in
Python wraps json.load(sys.stdin.buffer). There is no contract-
specific dep in any language.
Standard argv conventions
The rule passes a fixed set of flags every plugin receives, plus
whatever the consumer set in options:
| Flag | Set by | Meaning |
|---|---|---|
--schema-name=NAME | rule | Original schema file basename (e.g. compose-spec.json). For error messages and stable codegen header comments. |
--rule-name=NAME | rule | The Bazel target’s name. Useful for picking output identifiers. |
--<consumer-flag>=VAL | consumer | Free-form per-plugin options from the rule attrs. |
Plugins should treat unknown flags as a hard error so misconfigured options don’t silently degrade output.
Bazel output declaration
Bazel rules must declare their outputs at analysis time, before any action runs. Three real options were considered:
| Approach | Pros | Cons |
|---|---|---|
| A. Single file per rule invocation | Output path known at analysis. Simple. Matches protoc-gen-go in practice. | Plugin authors can’t naturally split output. |
B. declare_directory (tree artifact) | Plugin emits arbitrarily many files. | Downstream rust_library / go_library rules have to glob the directory or expand it. Awkward, non-standard. |
| C. Two-pass: pre-flight + emit | Plugin advertises outputs given a schema, then generates. | Two plugin invocations per build. Doubles action overhead. |
Decision: A. Plugin produces exactly one file (on stdout) per rule invocation. Multi-output needs (types vs validators, client vs server) split into separate rule targets:
jsonschema_go_types(name = "person_types", schema = "person.json")
jsonschema_go_validators(name = "person_validators", schema = "person.json")
Each target is independently cacheable; the build graph is clearer. Tree artifacts (B) remain available as an escape hatch for the rare genuinely-multi-file plugin.
Bazel rule shape
Each per-language user-facing rule has the same structure:
def _jsonschema_rust_codegen_impl(ctx):
out = ctx.actions.declare_file(ctx.label.name + ".rs")
tc = ctx.toolchains[_RUST_TOOLCHAIN].codegen_info
args = [
"--schema-name=" + ctx.file.schema.basename,
"--rule-name=" + ctx.label.name,
]
# Plugin-specific options passed through from rule attrs.
for k, v in ctx.attr.options.items():
args.append("--{}={}".format(k, v))
ctx.actions.run_shell(
inputs = [ctx.file.schema],
outputs = [out],
tools = [tc.binary],
command = '{plugin} {args} < {schema} > {out}'.format(
plugin = tc.binary.path,
args = " ".join([shell.quote(a) for a in args]),
schema = ctx.file.schema.path,
out = out.path,
),
)
return [DefaultInfo(files = depset([out]))]
User-facing macro composes that codegen with the target language’s library rule:
def jsonschema_rust_library(name, schema, **kwargs):
gen_name = name + "_rs_gen"
_jsonschema_rust_codegen(name = gen_name, schema = schema)
rust_library(
name = name,
srcs = [":" + gen_name],
edition = "2021",
deps = [...],
**kwargs
)
Same shape per language.
Why we abandoned the AST
The first draft of this RFC proposed a protoc-style architecture: a frontend parses the schema into a canonical AST proto, plugins consume that AST instead of raw bytes. After looking at it harder I think this was the wrong call. Reasons:
-
The protoc analogy doesn’t transfer. protoc has an AST because
.protofiles have a grammar nobody else has implemented. Plugin authors would otherwise re-implement parsing. JSON Schema is already JSON — every plugin language has a JSON parser in stdlib or one-line dep. The “no plugin reparses” argument is ~free to ignore for us. -
Most plugins wrap upstream tools.
typify,atombender/go-jsonschema,oapi-codegen,openapi-generatorall take raw schema bytes and have their own parsing. Our AST would be throwaway work for them. The dualmode = "ast" | "raw"we briefly proposed was evidence the AST wasn’t the natural fit. -
Cross-plugin consistency was illusory. Different upstream tools interpret edge cases differently (recursive refs, allOf ordering, oneOf discriminator behavior). Putting an AST in front doesn’t unify them — each wrapping plugin still defers to its underlying library.
-
Maintenance cost is real. Defining
Schema/Type/UnionType/IntersectionTypeis a small spec language we invent and ship. Every JSON Schema feature we don’t model becomes anextra_jsonescape hatch. We’d end up maintaining a parallel type system that nothing consumes natively. -
Plugin author ergonomics matter. “Read stdin, write stdout” is the lowest possible barrier to entry. A Bash script could be a plugin. Adding “deserialise a protobuf request” pushes plugin authors into language-specific toolchain setup before they write the first line of codegen logic.
The toolchain pattern (toolchain types per output language, register your own plugin to override) survives the simplification unchanged.
Why we also abandoned the proto envelope
Even without an AST, we considered keeping a thin proto wrapper:
CodeGenRequest{raw_schema, options, version} in, CodeGenResponse{file, error, features} out. Forward-compat without the AST baggage.
The argument against:
- The structured-options part is the only piece of the proto that isn’t trivially expressible as stdin/argv/stderr/exit-code. argv handles structured options fine.
- For ~5 plugins over the foreseeable future, “add a field without breaking old plugins” isn’t load-bearing; we can coordinate.
- Plugin author barrier matters more than abstract evolvability. A one-file Python plugin (15 lines) beats a Rust plugin with protobuf codegen deps for any reasonable measure.
- We can always add a proto envelope later if we hit a real wall. Migrating plugins is straightforward — only the stdin-parsing changes, the codegen logic doesn’t.
Open questions
-
Stable JSON Schema spec-version handling. Plugins should probably refuse to operate on schemas whose
$schemadoesn’t match what they expect. Convention: plugins error with--schema-name=… : unsupported $schema: <value>rather than producing wrong output. Each plugin owns its own version detection. -
Cross-plugin shared parsing. If we ever need it (we don’t yet), a future RFC could add an optional sidecar artifact: the rule runs a one-time
jsonschema_parseaction that emits a normalised JSON form, and plugins opt into reading that instead of the original schema. Backward compatible — old plugins still consume raw. -
Diagnostic format. stderr is freeform today. If we ever want structured diagnostics (file:line:col annotations), we’d define a stderr-line format like
WARNING:path:line:col:msg. Not v1. -
Toolchain attr surface. Currently the toolchain rule just carries
binary. Future fields might include:supported_drafts(list of$schemavalues),default_options(dict),version(for diagnostic banners). All additive.
Decisions to lock in before Phase 1
- Plugin contract: stdin = schema bytes, argv = options, stdout = generated file content, stderr + exit code for errors. No proto, no AST.
- Bazel outputs: single file per rule invocation. Multi-output needs split into separate targets. Tree-artifact escape hatch for genuine many-file plugins.
- Plugin discovery: toolchain types per output language (already in place).
- Repo naming: stay
rules_jsonschema.
Phases
Phase 1: nail down the contract in code
//jsonschema:plugin_contract.md(or similar) — a concise written spec of stdin/argv/stdout/stderr the contract docs reference.- Refit the existing Rust + Starlark codegen binaries to the new
contract.
schema_to_rustalready mostly does this (it reads a path from--schema); switch to stdin and the standard argv flags. - Update
//rust:defs.bzland//starlark:defs.bzlto invoke plugins via the contract. - Existing rules_docker_compose tests should pass byte-identical.
Phase 2: Go plugin (in Go)
tools/plugin_go/main.goreads schema bytes from stdin, parses viaencoding/json, emits Go types usinggo/format. Uses rules_go.//go:defs.bzlwithjsonschema_go_library.- Smoke example: person.json → Go types → round-trip decode test.
This validates the cross-language contract works as cleanly as the RFC claims. If implementing the Go plugin is harder than the “15 lines” pitch, the contract needs tightening.
Phase 3: contract testing
A small integration-test rule that runs an arbitrary plugin against a curated set of “interesting” schemas (compose-spec subset, edge cases, malformed input) and asserts on stdout/stderr/exit behavior. Lets plugin authors verify conformance before registering as a toolchain.
Phase 4: rules_docker_compose migration
Should be a no-op end-user-visibly — the codegen binaries still exist, just invoked through the new contract. Tests pass byte-identical.
Plugin conformance test.
jsonschema_plugin_contract_test(name, plugin) runs the contract
test driver against any executable that claims to implement the
rules_jsonschema plugin contract (see
plugin_contract.md). The driver exercises:
- Minimum-viable invocation produces non-empty stdout + exit 0.
- Malformed JSON input → non-zero exit, stderr explanation, empty stdout (the discipline most likely to be violated by plugins emitting partial output before erroring).
- Unknown flags are rejected.
- Output is deterministic across identical invocations.
Plugin authors use it to gate their toolchain registration:
load("@rules_jsonschema//jsonschema:contract_test.bzl",
"jsonschema_plugin_contract_test")
jsonschema_plugin_contract_test(
name = "my_plugin_conforms",
plugin = "//my:rust_codegen",
)
jsonschema_plugin_contract_test
load("@rules_jsonschema//jsonschema:contract_test.bzl", "jsonschema_plugin_contract_test")
jsonschema_plugin_contract_test(name, plugin)
Run the rules_jsonschema plugin contract scenarios against a plugin binary.
ATTRIBUTES
| Name | Description | Type | Mandatory | Default |
|---|---|---|---|---|
| name | A unique name for this target. | Name | required | |
| plugin | The plugin binary to test. Any executable that claims to implement the rules_jsonschema plugin contract. | Label | required |
Go user-facing rules for rules_jsonschema.
jsonschema_go_library is the Go-specific shape of the schema → code
pipeline:
- Resolves the
go_codegen_toolchain_typetoolchain. - Runs the toolchain’s binary on the schema (stdin/argv/stdout
per
//jsonschema/plugin_contract.md), producing a.gofile. - Wraps the
.goin ago_libraryfrom@rules_go.
The default toolchain (registered by rules_jsonschema’s MODULE.bazel)
points at the in-repo schema_to_go Go binary. Coverage is minimal —
primitives, structs, slices, maps, optional pointers, refs. For
fuller JSON-Schema-to-Go support, register your own
jsonschema_codegen_toolchain pointing at a different binary (e.g.
atombender/go-jsonschema).
jsonschema_go_library
load("@rules_jsonschema//go:defs.bzl", "jsonschema_go_library")
jsonschema_go_library(name, schema, importpath, package, extra_args, visibility,
**go_library_kwargs)
Generate a go_library of typed schema bindings.
The emitted package exports one Go type per schema $defs /
definitions entry plus a top-level type from the schema’s
title (if set). Required properties become value-typed fields;
optional properties become pointer-typed with ,omitempty tags.
PARAMETERS
Helpers used by schema_to_starlark-generated rule code.
Kept in a separate file (rather than inlined per generated .bzl) so
the codegen output stays small and any helper fix benefits every
consumer at once. Generated .bzl files load from this module:
load("@rules_jsonschema//runtime:helpers.bzl", "strip_empty", "parse_json_or_none")
parse_json_or_none
load("@rules_jsonschema//runtime:helpers.bzl", "parse_json_or_none")
parse_json_or_none(s)
Return None for empty input, otherwise json.decode(s).
Used for typed schema attrs whose value is a structured object
or array. Generated rule callers pass json.encode({...}) (or
leave the attr empty); the generated impl invokes this to expand
the encoded payload back into a Starlark dict/list that gets
merged into the shard.
PARAMETERS
strip_empty
load("@rules_jsonschema//runtime:helpers.bzl", "strip_empty")
strip_empty(d)
Drop dict entries whose values are absent / zero / empty.
Matches the JSON omitempty convention so generated shards stay
terse — Bazel attr.* zero values (0, False, “”, [], {}) shouldn’t
serialise as explicit overrides. Distinguishing “user set to 0”
from “user didn’t set” isn’t possible at the Starlark layer, so
we conflate them: every typed schema field that wants to mean
something non-default ships a non-zero/-empty value.
PARAMETERS
Providers exposed by rules_jsonschema.
JsonschemaCodegenToolchainInfo is the contract every codegen
toolchain provides: a single binary File that implements the
schema → output-language conversion. Per-language user-facing rules
resolve a toolchain by type
(@rules_jsonschema//jsonschema:<lang>_codegen_toolchain_type),
fetch this provider, and run the binary.
Splitting it out from defs.bzl lets language modules (//rust:,
//starlark:, //go:, …) load just the provider without dragging in
language-specific BUILD machinery.
JsonschemaCodegenToolchainInfo
load("@rules_jsonschema//jsonschema:providers.bzl", "JsonschemaCodegenToolchainInfo")
JsonschemaCodegenToolchainInfo(binary)
A schema → code codegen tool.
FIELDS
| Name | Description |
|---|---|
| binary | File: the codegen executable. Invoked with --schema PATH --out PATH and any language-specific flags the calling rule passes through. |
Rust user-facing rules for rules_jsonschema.
jsonschema_rust_library is the Rust-specific shape of the
schema → code pipeline:
- Resolves the
rust_codegen_toolchain_typetoolchain. - Runs the toolchain’s binary on the schema, producing a
.rs. - Wraps the
.rsin arust_librarywith serde / serde_json / regress threaded as direct deps.
The default toolchain (registered by rules_jsonschema’s MODULE.bazel)
points at the in-repo typify-based schema_to_rust binary. Swap by
declaring your own jsonschema_codegen_toolchain + registering it
ahead of the default.
jsonschema_rust_library
load("@rules_jsonschema//rust:defs.bzl", "jsonschema_rust_library")
jsonschema_rust_library(name, schema, extra_args, serde, serde_json, regress, visibility,
**rust_library_kwargs)
Generate a rust_library of typed schema bindings.
The emitted library exports one Rust struct/enum per top-level
JSON-Schema definition, with #[derive(Serialize, Deserialize)]
plus #[serde(deny_unknown_fields)] wherever the source schema
sets additionalProperties: false.
PARAMETERS
Starlark user-facing rule for rules_jsonschema.
jsonschema_starlark_codegen emits typed Bazel rule() definitions
from a JSON Schema:
- Resolves the
starlark_codegen_toolchain_typetoolchain. - Runs the toolchain’s binary on the schema, producing a
.bzl.
The default toolchain (registered by rules_jsonschema’s MODULE.bazel)
points at the in-repo schema_to_starlark binary. Swap by declaring
your own jsonschema_codegen_toolchain and registering it ahead of
the default.
The output is meant to be committed in the consumer repo; pair with a
diff_test to catch drift (re-runs codegen on every CI build and
asserts the committed .bzl matches what the toolchain emits).
jsonschema_starlark_codegen
load("@rules_jsonschema//starlark:defs.bzl", "jsonschema_starlark_codegen")
jsonschema_starlark_codegen(name, schema, kinds, extra_args, **kwargs)
Generate a .bzl of typed rules from a JSON Schema.
PARAMETERS
Toolchain rules for rules_jsonschema codegen.
jsonschema_codegen_toolchain wraps a single codegen executable
(schema_to_rust, schema_to_starlark, schema_to_go, …) as a
Bazel toolchain. The matching toolchain_type lives in
//jsonschema:BUILD.bazel — one type per output language so a
consumer can independently swap, say, the Rust generator without
touching the Starlark or Go ones.
Default toolchains are registered in //rust:BUILD.bazel,
//starlark:BUILD.bazel, //go:BUILD.bazel. To swap an
implementation, declare your own jsonschema_codegen_toolchain and
register_toolchains(...) it ahead of rules_jsonschema’s default in
your MODULE.bazel.
jsonschema_codegen_toolchain
load("@rules_jsonschema//jsonschema:toolchains.bzl", "jsonschema_codegen_toolchain")
jsonschema_codegen_toolchain(name, binary)
Declare a schema → code codegen executable as a Bazel toolchain.
ATTRIBUTES
| Name | Description | Type | Mandatory | Default |
|---|---|---|---|---|
| name | A unique name for this target. | Name | required | |
| binary | The codegen executable for this toolchain. Must accept --schema PATH --out PATH plus any language-specific flags. | Label | required |
write_source_files: copy generated outputs back into source.
The canonical Bazel pattern for committed-codegen workflows. A typical
setup pairs a codegen rule (whose output sits under bazel-bin/...)
with a write_source_files target that copies the output to a path
under source control:
jsonschema_starlark_codegen(
name = "compose_rules_gen",
schema = "...",
kinds = [...],
)
write_source_files(
name = "update_compose_rules",
files = {
"compose_rules.bzl": ":compose_rules_gen",
},
)
bazel build //compose:update_compose_rules— no-op.bazel run //compose:update_compose_rules— copies each generated file to its source-tree destination, respectingBUILD_WORKSPACE_DIRECTORYso multi-repo workspaces still work.
Pair with a diff_test to gate freshness:
diff_test(
name = "compose_rules_up_to_date",
file1 = "compose_rules.bzl",
file2 = ":compose_rules_gen",
)
This rule replaces ad-hoc sh_binary + update.sh pairs throughout
rules_jsonschema’s consumers. Functionally equivalent to
@aspect_bazel_lib//lib:write_source_files.bzl, but in-repo so we
don’t take on aspect_bazel_lib as a dep for a single rule.
write_source_files
load("@rules_jsonschema//util:write_source_files.bzl", "write_source_files")
write_source_files(name, files)
bazel run-able target that copies generated files back into source control.
ATTRIBUTES
| Name | Description | Type | Mandatory | Default |
|---|---|---|---|---|
| name | A unique name for this target. | Name | required | |
| files | Map of package-relative destination path → label whose single output file should be copied there. Each source label must produce exactly one output file. | Dictionary: String -> Label | required |