rules_jsonschema

API reference, generated from the module’s .bzl docstrings (stardoc).

RFC-001 — Codegen Plugin Protocol

Status: draft, revised. Captures the architecture pivot from “Rust-binary-per-output-language” to “per-language plugins reading the schema directly via a minimal stdin/stdout contract”.

Earlier drafts of this RFC proposed a protoc-style architecture with a frontend, a parsed AST proto, and a dual ast / raw plugin mode. That design was abandoned because (a) JSON Schema is already JSON — every plugin language can parse it directly — and (b) most realistic plugins wrap upstream tools (typify, atombender/go-jsonschema, oapi-codegen, …) that have their own parsing anyway. The AST was a small spec language we’d be inventing for marginal benefit. See “Why we abandoned the AST” below for the full reasoning.

Goal

Decouple rules_jsonschema’s user-facing rules from a hardcoded codegen language. After this RFC lands, adding a new output language is:

Write a plugin binary in that language so it leverages native AST tooling — go/format for Go, quote/syn for Rust, ts-morph for TypeScript.
Register a jsonschema_codegen_toolchain pointing at it.
Add a jsonschema_<lang>_library user-facing rule that wraps the target language’s *_library Bazel rule.

The plugin reads the schema bytes from stdin, options from argv, writes the generated file content to stdout, and signals errors via stderr + exit code. No protobuf dep, no AST proto, no frontend binary. Stdlib-only plugins are achievable in any language.

The contract

A plugin is any executable that conforms to:

INPUT
  stdin              the schema file contents (raw bytes)
  argv               --key=value pairs, repeated. Plugin-specific.
                     The rule may also pass standard flags it owns.

OUTPUT
  stdout             the generated file content (raw bytes)
  stderr             diagnostics / error messages

EXIT
  0                  success — stdout is the generated file
  non-zero           failure — stderr explains why

That’s it. A plugin in Go is:

package main

import (
    "encoding/json"
    "io"
    "os"
)

func main() {
    schemaBytes, _ := io.ReadAll(os.Stdin)
    var schema map[string]any
    if err := json.Unmarshal(schemaBytes, &schema); err != nil {
        fmt.Fprintln(os.Stderr, "parse:", err)
        os.Exit(1)
    }
    // ... generate Go source from schema ...
    os.Stdout.Write([]byte(generated))
}

A plugin in Rust is the same thing with serde_json. A plugin in Python wraps json.load(sys.stdin.buffer). There is no contract- specific dep in any language.

Standard argv conventions

The rule passes a fixed set of flags every plugin receives, plus whatever the consumer set in options:

Flag	Set by	Meaning
`--schema-name=NAME`	rule	Original schema file basename (e.g. `compose-spec.json`). For error messages and stable codegen header comments.
`--rule-name=NAME`	rule	The Bazel target’s name. Useful for picking output identifiers.
`--<consumer-flag>=VAL`	consumer	Free-form per-plugin options from the rule attrs.

Plugins should treat unknown flags as a hard error so misconfigured options don’t silently degrade output.

Bazel output declaration

Bazel rules must declare their outputs at analysis time, before any action runs. Three real options were considered:

Approach	Pros	Cons
A. Single file per rule invocation	Output path known at analysis. Simple. Matches `protoc-gen-go` in practice.	Plugin authors can’t naturally split output.
B. `declare_directory` (tree artifact)	Plugin emits arbitrarily many files.	Downstream `rust_library` / `go_library` rules have to glob the directory or expand it. Awkward, non-standard.
C. Two-pass: pre-flight + emit	Plugin advertises outputs given a schema, then generates.	Two plugin invocations per build. Doubles action overhead.

Decision: A. Plugin produces exactly one file (on stdout) per rule invocation. Multi-output needs (types vs validators, client vs server) split into separate rule targets:

jsonschema_go_types(name = "person_types", schema = "person.json")
jsonschema_go_validators(name = "person_validators", schema = "person.json")

Each target is independently cacheable; the build graph is clearer. Tree artifacts (B) remain available as an escape hatch for the rare genuinely-multi-file plugin.

Bazel rule shape

Each per-language user-facing rule has the same structure:

def _jsonschema_rust_codegen_impl(ctx):
    out = ctx.actions.declare_file(ctx.label.name + ".rs")
    tc = ctx.toolchains[_RUST_TOOLCHAIN].codegen_info

    args = [
        "--schema-name=" + ctx.file.schema.basename,
        "--rule-name=" + ctx.label.name,
    ]
    # Plugin-specific options passed through from rule attrs.
    for k, v in ctx.attr.options.items():
        args.append("--{}={}".format(k, v))

    ctx.actions.run_shell(
        inputs = [ctx.file.schema],
        outputs = [out],
        tools = [tc.binary],
        command = '{plugin} {args} < {schema} > {out}'.format(
            plugin = tc.binary.path,
            args = " ".join([shell.quote(a) for a in args]),
            schema = ctx.file.schema.path,
            out = out.path,
        ),
    )
    return [DefaultInfo(files = depset([out]))]

User-facing macro composes that codegen with the target language’s library rule:

def jsonschema_rust_library(name, schema, **kwargs):
    gen_name = name + "_rs_gen"
    _jsonschema_rust_codegen(name = gen_name, schema = schema)
    rust_library(
        name = name,
        srcs = [":" + gen_name],
        edition = "2021",
        deps = [...],
        **kwargs
    )

Same shape per language.

Why we abandoned the AST

The first draft of this RFC proposed a protoc-style architecture: a frontend parses the schema into a canonical AST proto, plugins consume that AST instead of raw bytes. After looking at it harder I think this was the wrong call. Reasons:

The protoc analogy doesn’t transfer. protoc has an AST because .proto files have a grammar nobody else has implemented. Plugin authors would otherwise re-implement parsing. JSON Schema is already JSON — every plugin language has a JSON parser in stdlib or one-line dep. The “no plugin reparses” argument is ~free to ignore for us.
Most plugins wrap upstream tools. typify, atombender/go-jsonschema, oapi-codegen, openapi-generator all take raw schema bytes and have their own parsing. Our AST would be throwaway work for them. The dual mode = "ast" | "raw" we briefly proposed was evidence the AST wasn’t the natural fit.
Cross-plugin consistency was illusory. Different upstream tools interpret edge cases differently (recursive refs, allOf ordering, oneOf discriminator behavior). Putting an AST in front doesn’t unify them — each wrapping plugin still defers to its underlying library.
Maintenance cost is real. Defining Schema / Type / UnionType / IntersectionType is a small spec language we invent and ship. Every JSON Schema feature we don’t model becomes an extra_json escape hatch. We’d end up maintaining a parallel type system that nothing consumes natively.
Plugin author ergonomics matter. “Read stdin, write stdout” is the lowest possible barrier to entry. A Bash script could be a plugin. Adding “deserialise a protobuf request” pushes plugin authors into language-specific toolchain setup before they write the first line of codegen logic.

The toolchain pattern (toolchain types per output language, register your own plugin to override) survives the simplification unchanged.

Why we also abandoned the proto envelope

Even without an AST, we considered keeping a thin proto wrapper: CodeGenRequest{raw_schema, options, version} in, CodeGenResponse{file, error, features} out. Forward-compat without the AST baggage.

The argument against:

The structured-options part is the only piece of the proto that isn’t trivially expressible as stdin/argv/stderr/exit-code. argv handles structured options fine.
For ~5 plugins over the foreseeable future, “add a field without breaking old plugins” isn’t load-bearing; we can coordinate.
Plugin author barrier matters more than abstract evolvability. A one-file Python plugin (15 lines) beats a Rust plugin with protobuf codegen deps for any reasonable measure.
We can always add a proto envelope later if we hit a real wall. Migrating plugins is straightforward — only the stdin-parsing changes, the codegen logic doesn’t.

Open questions

Stable JSON Schema spec-version handling. Plugins should probably refuse to operate on schemas whose $schema doesn’t match what they expect. Convention: plugins error with --schema-name=… : unsupported $schema: <value> rather than producing wrong output. Each plugin owns its own version detection.
Cross-plugin shared parsing. If we ever need it (we don’t yet), a future RFC could add an optional sidecar artifact: the rule runs a one-time jsonschema_parse action that emits a normalised JSON form, and plugins opt into reading that instead of the original schema. Backward compatible — old plugins still consume raw.
Diagnostic format. stderr is freeform today. If we ever want structured diagnostics (file:line:col annotations), we’d define a stderr-line format like WARNING:path:line:col:msg. Not v1.
Toolchain attr surface. Currently the toolchain rule just carries binary. Future fields might include: supported_drafts (list of $schema values), default_options (dict), version (for diagnostic banners). All additive.

Decisions to lock in before Phase 1

Plugin contract: stdin = schema bytes, argv = options, stdout = generated file content, stderr + exit code for errors. No proto, no AST.
Bazel outputs: single file per rule invocation. Multi-output needs split into separate targets. Tree-artifact escape hatch for genuine many-file plugins.
Plugin discovery: toolchain types per output language (already in place).
Repo naming: stay rules_jsonschema.

Phases

Phase 1: nail down the contract in code

//jsonschema:plugin_contract.md (or similar) — a concise written spec of stdin/argv/stdout/stderr the contract docs reference.
Refit the existing Rust + Starlark codegen binaries to the new contract. schema_to_rust already mostly does this (it reads a path from --schema); switch to stdin and the standard argv flags.
Update //rust:defs.bzl and //starlark:defs.bzl to invoke plugins via the contract.
Existing rules_docker_compose tests should pass byte-identical.

Phase 2: Go plugin (in Go)

tools/plugin_go/main.go reads schema bytes from stdin, parses via encoding/json, emits Go types using go/format. Uses rules_go.
//go:defs.bzl with jsonschema_go_library.
Smoke example: person.json → Go types → round-trip decode test.

This validates the cross-language contract works as cleanly as the RFC claims. If implementing the Go plugin is harder than the “15 lines” pitch, the contract needs tightening.

Phase 3: contract testing

A small integration-test rule that runs an arbitrary plugin against a curated set of “interesting” schemas (compose-spec subset, edge cases, malformed input) and asserts on stdout/stderr/exit behavior. Lets plugin authors verify conformance before registering as a toolchain.

Phase 4: rules_docker_compose migration

Should be a no-op end-user-visibly — the codegen binaries still exist, just invoked through the new contract. Tests pass byte-identical.

Plugin conformance test.

jsonschema_plugin_contract_test(name, plugin) runs the contract test driver against any executable that claims to implement the rules_jsonschema plugin contract (see plugin_contract.md). The driver exercises:

Minimum-viable invocation produces non-empty stdout + exit 0.
Malformed JSON input → non-zero exit, stderr explanation, empty stdout (the discipline most likely to be violated by plugins emitting partial output before erroring).
Unknown flags are rejected.
Output is deterministic across identical invocations.

Plugin authors use it to gate their toolchain registration:

load("@rules_jsonschema//jsonschema:contract_test.bzl",
     "jsonschema_plugin_contract_test")

jsonschema_plugin_contract_test(
    name = "my_plugin_conforms",
    plugin = "//my:rust_codegen",
)

jsonschema_plugin_contract_test

load("@rules_jsonschema//jsonschema:contract_test.bzl", "jsonschema_plugin_contract_test")

jsonschema_plugin_contract_test(name, plugin)

Run the rules_jsonschema plugin contract scenarios against a plugin binary.

ATTRIBUTES

Name	Description	Type	Mandatory	Default
name	A unique name for this target.	Name	required
plugin	The plugin binary to test. Any executable that claims to implement the rules_jsonschema plugin contract.	Label	required

Go user-facing rules for rules_jsonschema.

jsonschema_go_library is the Go-specific shape of the schema → code pipeline:

Resolves the go_codegen_toolchain_type toolchain.
Runs the toolchain’s binary on the schema (stdin/argv/stdout per //jsonschema/plugin_contract.md), producing a .go file.
Wraps the .go in a go_library from @rules_go.

The default toolchain (registered by rules_jsonschema’s MODULE.bazel) points at the in-repo schema_to_go Go binary. Coverage is minimal — primitives, structs, slices, maps, optional pointers, refs. For fuller JSON-Schema-to-Go support, register your own jsonschema_codegen_toolchain pointing at a different binary (e.g. atombender/go-jsonschema).

jsonschema_go_library

load("@rules_jsonschema//go:defs.bzl", "jsonschema_go_library")

jsonschema_go_library(name, schema, importpath, package, extra_args, visibility,
                      **go_library_kwargs)

Generate a go_library of typed schema bindings.

The emitted package exports one Go type per schema $defs / definitions entry plus a top-level type from the schema’s title (if set). Required properties become value-typed fields; optional properties become pointer-typed with ,omitempty tags.

PARAMETERS

Name	Description	Default Value
name	go_library target name. Consumers add to `deps`.	none
schema	label of a `.json` schema file.	none
importpath	Go import path for the generated package.	none
package	Go package name. Defaults to a sanitised rule name.	`None`
extra_args	extra `--key=value` flags appended to the plugin’s argv. Use to set plugin-specific options without registering a new toolchain.	`None`
visibility	forwarded to go_library.	`None`
go_library_kwargs	forwarded to go_library.	none

Helpers used by schema_to_starlark-generated rule code.

Kept in a separate file (rather than inlined per generated .bzl) so the codegen output stays small and any helper fix benefits every consumer at once. Generated .bzl files load from this module:

load("@rules_jsonschema//runtime:helpers.bzl", "strip_empty", "parse_json_or_none")

parse_json_or_none

load("@rules_jsonschema//runtime:helpers.bzl", "parse_json_or_none")

parse_json_or_none(s)

Return None for empty input, otherwise json.decode(s).

Used for typed schema attrs whose value is a structured object or array. Generated rule callers pass json.encode({...}) (or leave the attr empty); the generated impl invokes this to expand the encoded payload back into a Starlark dict/list that gets merged into the shard.

PARAMETERS

Name	Description	Default Value
s	-	none

strip_empty

load("@rules_jsonschema//runtime:helpers.bzl", "strip_empty")

strip_empty(d)

Drop dict entries whose values are absent / zero / empty.

Matches the JSON omitempty convention so generated shards stay terse — Bazel attr.* zero values (0, False, “”, [], {}) shouldn’t serialise as explicit overrides. Distinguishing “user set to 0” from “user didn’t set” isn’t possible at the Starlark layer, so we conflate them: every typed schema field that wants to mean something non-default ships a non-zero/-empty value.

PARAMETERS

Name	Description	Default Value
d	-	none

Providers exposed by rules_jsonschema.

JsonschemaCodegenToolchainInfo is the contract every codegen toolchain provides: a single binary File that implements the schema → output-language conversion. Per-language user-facing rules resolve a toolchain by type (@rules_jsonschema//jsonschema:<lang>_codegen_toolchain_type), fetch this provider, and run the binary.

Splitting it out from defs.bzl lets language modules (//rust:, //starlark:, //go:, …) load just the provider without dragging in language-specific BUILD machinery.

JsonschemaCodegenToolchainInfo

load("@rules_jsonschema//jsonschema:providers.bzl", "JsonschemaCodegenToolchainInfo")

JsonschemaCodegenToolchainInfo(binary)

A schema → code codegen tool.

FIELDS

Name	Description
binary	File: the codegen executable. Invoked with `--schema PATH --out PATH` and any language-specific flags the calling rule passes through.

Rust user-facing rules for rules_jsonschema.

jsonschema_rust_library is the Rust-specific shape of the schema → code pipeline:

Resolves the rust_codegen_toolchain_type toolchain.
Runs the toolchain’s binary on the schema, producing a .rs.
Wraps the .rs in a rust_library with serde / serde_json / regress threaded as direct deps.

The default toolchain (registered by rules_jsonschema’s MODULE.bazel) points at the in-repo typify-based schema_to_rust binary. Swap by declaring your own jsonschema_codegen_toolchain + registering it ahead of the default.

jsonschema_rust_library

load("@rules_jsonschema//rust:defs.bzl", "jsonschema_rust_library")

jsonschema_rust_library(name, schema, extra_args, serde, serde_json, regress, visibility,
                        **rust_library_kwargs)

Generate a rust_library of typed schema bindings.

The emitted library exports one Rust struct/enum per top-level JSON-Schema definition, with #[derive(Serialize, Deserialize)] plus #[serde(deny_unknown_fields)] wherever the source schema sets additionalProperties: false.

PARAMETERS

Name	Description	Default Value
name	rust_library target name. Consumers add this to `deps`.	none
schema	label of a `.json` schema file.	none
extra_args	extra `--key=value` flags appended to the plugin’s argv. Use to set plugin-specific options without registering a new toolchain. The default plugin (schema_to_rust) accepts no extra flags today; consumers of custom toolchains will.	`None`
serde	label of the `serde` crate to use as a direct dep. Defaults to rules_jsonschema’s own `@crates//:serde`. Consumers whose binary also depends on serde must point this at their own crate repo, otherwise the generated types’ trait impls live in a different compile unit than the consumer’s and Rust treats them as distinct types (`error[E0277]: the trait bound Service: serde::Serialize is not satisfied`).	`None`
serde_json	same story for `serde_json`.	`None`
regress	same story for `regress` (typify uses it for `pattern`-validated string newtypes).	`None`
visibility	forwarded to rust_library.	`None`
rust_library_kwargs	forwarded to rust_library (e.g. extra `deps`).	none

Starlark user-facing rule for rules_jsonschema.

jsonschema_starlark_codegen emits typed Bazel rule() definitions from a JSON Schema:

Resolves the starlark_codegen_toolchain_type toolchain.
Runs the toolchain’s binary on the schema, producing a .bzl.

The default toolchain (registered by rules_jsonschema’s MODULE.bazel) points at the in-repo schema_to_starlark binary. Swap by declaring your own jsonschema_codegen_toolchain and registering it ahead of the default.

The output is meant to be committed in the consumer repo; pair with a diff_test to catch drift (re-runs codegen on every CI build and asserts the committed .bzl matches what the toolchain emits).

jsonschema_starlark_codegen

load("@rules_jsonschema//starlark:defs.bzl", "jsonschema_starlark_codegen")

jsonschema_starlark_codegen(name, schema, kinds, extra_args, **kwargs)

Generate a .bzl of typed rules from a JSON Schema.

PARAMETERS

Name	Description	Default Value
name	target name; output file is `<name>.bzl`.	none
schema	label of a `.json` schema document.	none
kinds	list of `(id, pointer, rule_name, provider_name)` 4-tuples. - `id`: short tag used in generated symbol names + the rule-name attr (e.g. `service`). - `pointer`: JSON-pointer into the schema for the definition whose `properties` become attrs (e.g. `#/definitions/service`). - `rule_name`: the public Starlark symbol the emitted rule binds to. - `provider_name`: the public Starlark symbol the rule’s companion provider binds to. Optional — if omitted, `extra_args` typically enables the plugin’s auto-kinds derivation (e.g. `--kinds-pointer-base=...` for the default `schema_to_starlark` toolchain). Leaving both empty produces a preamble-only `.bzl` (legal but rarely useful).	`None`
extra_args	extra `--key=value` flags appended to the plugin’s argv. Use to set plugin-specific options without registering a new toolchain.	`None`
kwargs	forwarded to the underlying rule (visibility, etc.).	none

Toolchain rules for rules_jsonschema codegen.

jsonschema_codegen_toolchain wraps a single codegen executable (schema_to_rust, schema_to_starlark, schema_to_go, …) as a Bazel toolchain. The matching toolchain_type lives in //jsonschema:BUILD.bazel — one type per output language so a consumer can independently swap, say, the Rust generator without touching the Starlark or Go ones.

Default toolchains are registered in //rust:BUILD.bazel, //starlark:BUILD.bazel, //go:BUILD.bazel. To swap an implementation, declare your own jsonschema_codegen_toolchain and register_toolchains(...) it ahead of rules_jsonschema’s default in your MODULE.bazel.

jsonschema_codegen_toolchain

load("@rules_jsonschema//jsonschema:toolchains.bzl", "jsonschema_codegen_toolchain")

jsonschema_codegen_toolchain(name, binary)

Declare a schema → code codegen executable as a Bazel toolchain.

ATTRIBUTES

Name	Description	Type	Mandatory	Default
name	A unique name for this target.	Name	required
binary	The codegen executable for this toolchain. Must accept `--schema PATH --out PATH` plus any language-specific flags.	Label	required

write_source_files: copy generated outputs back into source.

The canonical Bazel pattern for committed-codegen workflows. A typical setup pairs a codegen rule (whose output sits under bazel-bin/...) with a write_source_files target that copies the output to a path under source control:

jsonschema_starlark_codegen(
    name = "compose_rules_gen",
    schema = "...",
    kinds = [...],
)

write_source_files(
    name = "update_compose_rules",
    files = {
        "compose_rules.bzl": ":compose_rules_gen",
    },
)

bazel build //compose:update_compose_rules — no-op.
bazel run //compose:update_compose_rules — copies each generated file to its source-tree destination, respecting BUILD_WORKSPACE_DIRECTORY so multi-repo workspaces still work.

Pair with a diff_test to gate freshness:

diff_test(
    name = "compose_rules_up_to_date",
    file1 = "compose_rules.bzl",
    file2 = ":compose_rules_gen",
)

This rule replaces ad-hoc sh_binary + update.sh pairs throughout rules_jsonschema’s consumers. Functionally equivalent to @aspect_bazel_lib//lib:write_source_files.bzl, but in-repo so we don’t take on aspect_bazel_lib as a dep for a single rule.

write_source_files

load("@rules_jsonschema//util:write_source_files.bzl", "write_source_files")

write_source_files(name, files)

bazel run-able target that copies generated files back into source control.

ATTRIBUTES

Name	Description	Type	Mandatory	Default
name	A unique name for this target.	Name	required
files	Map of package-relative destination path → label whose single output file should be copied there. Each source label must produce exactly one output file.	Dictionary: String -> Label	required

Keyboard shortcuts

fastverk