rules_rdf

API reference, generated from the module’s .bzl docstrings (stardoc).

rules_rdf roadmap

Two waypoints between today’s scaffold and a usable abstract RDF toolchain layer. Each waypoint is one published bazel-registry release.

v0.1 — toolchain types + plugin contract + placeholder rules

The goal is for a consumer to be able to declare every planned target type (rdf_dataset, sparql_query_test, rdf_validate_test, rdf_transform, rdf_reason) today, against a no-op default toolchain, then swap in a real implementation (e.g. rules_jena) without touching their BUILD files. This makes rules_rdf adoptable incrementally — consumers can wire their build graph before any engine is integrated.

Deliverables:

Plugin contract document at rdf/plugin_contract.md (draft already in tree). Same shape as rules_jsonschema’s plugin_contract.md, adjusted for RDF semantics:
- stdin = the RDF document bytes (the dataset; format declared via --in-format), not a JSON schema.
- argv = --key=value pairs (same as jsonschema). Standard flags: --rule-name, --in-format. Per-toolchain flags: --query, --shapes, --out-format, --profile.
- stdout = generated output (query results / validation report / converted graph / inferred triples). Same single-file-per- invocation discipline.
- stderr = diagnostics.
- exit = 0 / non-zero.
All four toolchain types defined in //rdf:BUILD.bazel: sparql_engine_toolchain_type, rdf_validator_toolchain_type, rdf_serializer_toolchain_type, rdf_reasoner_toolchain_type.
Providers: RdfDatasetInfo, RdfEngineToolchainInfo, RdfValidatorToolchainInfo, RdfSerializerToolchainInfo, RdfReasonerToolchainInfo. Each toolchain info wraps a single binary File, matching the jsonschema pattern.
Default user-facing rules implemented as _no_op placeholders:
- rdf_dataset — real (returns RdfDatasetInfo; no toolchain needed).
- sparql_query_test, sparql_query_run, rdf_validate_test, rdf_transform, rdf_reason — declare their toolchain dependency and accept all their final attrs, but the in-repo default toolchain points at a _no_op binary that writes an empty stdout and exits 0. Consumers can declare targets and they build; swapping in rules_jena makes them actually run.
Conformance test driver rdf_plugin_contract_test covering the same scenarios as the jsonschema driver — valid_minimal (small dataset round-trips), malformed_input (garbage on stdin → exit non-zero, empty stdout), unknown_flag (rejects unknown argv), determinism (byte-identical stdout on identical invocations). One driver, parameterised by toolchain type.
stardoc for the public surface, with diff_test freshness.

Out of scope for v0.1: chained pipelines, real-engine examples, result-set diff helpers.

v0.2 — cross-toolchain wiring + real-engine examples

Once rules_jena is published and registered, rules_rdf grows the glue that ties multiple toolchains together in one pipeline.

Deliverables:

Chained pipelines — rdf_validate_test and sparql_query_test accept the output of rdf_reason as their dataset, so a consumer can express “materialise inferences, then run shape validation on the closure” as a typed build graph. The intermediate inferred graph is a real RdfDatasetInfo-bearing target, not a hidden side effect.
Result-set helpers — a small Starlark helper for the common zero-row-CSV gate pattern, plus an rdf_results_diff_test for golden SPARQL result sets (SRX/JSON normalisation).
Examples directory using a real RDF corpus:
- W3C example datasets fetched via http_file with a pinned sha256 (the same fetch-and-pin discipline rules_docker_compose uses for the compose-spec schema).
- One end-to-end smoke target per toolchain type, registered against rules_jena.
CI matrix running the conformance test driver against every registered concrete implementation we know about, gating rules_rdf releases on at least one concrete backend passing.

After v0.2 the abstract layer is feature-complete; further work moves into the concrete-implementation repos.

rdf_plugin_contract_test(name, plugin, toolchain_type) runs the rules_rdf conformance test driver against any executable claiming to implement the plugin contract for the named toolchain type. See plugin_contract.md for what the driver asserts.

Plugin authors gate toolchain registration on it:

load("@rules_rdf//rdf:contract_test.bzl", "rdf_plugin_contract_test")

rdf_plugin_contract_test(
    name = "jena_sparql_conforms",
    plugin = "//jena:jena_sparql",
    toolchain_type = "sparql_engine",
)

The four toolchain types each have their own minimum-valid input inside the driver; pass the bare name (without the _toolchain_type suffix or @rules_rdf//rdf: prefix).

rdf_plugin_contract_test

load("@rules_rdf//rdf:contract_test.bzl", "rdf_plugin_contract_test")

rdf_plugin_contract_test(name, plugin, toolchain_type)

Run the rules_rdf conformance test driver against a plugin binary. See plugin_contract.md.

ATTRIBUTES

Name	Description	Type	Mandatory
name	A unique name for this target.	Name	required
plugin	The plugin binary to test. Any executable that claims to implement the rules_rdf plugin contract.	Label	required
toolchain_type	Which toolchain type’s scenarios to run: one of sparql_engine, rdf_validator, rdf_serializer, rdf_reasoner.	String	required

rdf_dataset(name, srcs, in_format) — declare a labeled collection of RDF files.

This is the single source of “what triples are in this graph?” that every other rule consumes. Carrying both the file depset and the format string up-front lets sparql_query_test / rdf_validate_test / … avoid sniffing extensions at action time and lets consumers mix datasets with declared formats in one BUILD target without ambiguity.

Multi-file datasets are concatenated by the consuming rule in lexicographic order before being piped to the plugin’s stdin (see rdf/plugin_contract.md). Consumers that care about ordering should name files to sort accordingly.

rdf_dataset

load("@rules_rdf//rdf:dataset.bzl", "rdf_dataset")

rdf_dataset(name, deps, srcs, in_format)

A labeled collection of RDF source files + linked-graph deps.

ATTRIBUTES

Name	Description	Type	Mandatory	Default
name	A unique name for this target.	Name	required
deps	Other `rdf_dataset`s this graph links to (imported ontologies, vocabulary modules). Their files are folded into this dataset’s `transitive_files` closure, so reasoning/query over the linked vocabularies resolves. Deps should share `in_format` (normalize otherwise).	List of labels	optional	`[]`
srcs	RDF source files. Concatenated in lexicographic order by consuming rules before being piped to the plugin’s stdin.	List of labels	required
in_format	Serialization of every file in `srcs`. Mixed-format datasets aren’t supported in v0.1 — use rdf_transform first.	String	optional	`"turtle"`

Providers for the four rules_rdf toolchain types.

Each provider wraps both the executable and the runfiles needed to invoke it. Carrying runfiles in the provider matters for plugin implementations that aren’t a single self-contained binary — py_binary, java_binary, sh_binary all stage helper files via runfiles. Consuming rules merge the provider’s runfiles into their own to make the plugin actually executable inside a Bazel sandbox.

RdfDatasetInfo

load("@rules_rdf//rdf:providers.bzl", "RdfDatasetInfo")

RdfDatasetInfo(files, transitive_files, in_format)

A declared RDF dataset.

FIELDS

Name	Description
files	depset[File]: this dataset’s own source files (excludes `deps`).
transitive_files	depset[File]: the full graph closure — this dataset’s files plus the transitive closure of every `deps` dataset. Consumers needing all linked triples (sparql_query, rdf_reason, rdf_validate) operate over this; the subclass/import closure of a grounding ontology (schema.org + SKOS + DC + modules) is assembled here.
in_format	str: serialization of the dataset files. One of turtle, ntriples, nquads, trig, jsonld, rdfxml. The whole closure must share this format (normalize a differing dep with rdf_transform first).

RdfReasonerToolchainInfo

load("@rules_rdf//rdf:providers.bzl", "RdfReasonerToolchainInfo")

RdfReasonerToolchainInfo(binary, runfiles, files_to_run)

An RDF inference engine. Resolved by rdf_reason.

FIELDS

Name	Description
binary	File: an executable that runs RDFS / OWL / custom-rule inference and emits derived triples.
runfiles	runfiles: the plugin binary’s runfiles bundle.
files_to_run	FilesToRunProvider: pass in an action’s `tools=` to materialize the plugin’s runfiles tree.

RdfSerializerToolchainInfo

load("@rules_rdf//rdf:providers.bzl", "RdfSerializerToolchainInfo")

RdfSerializerToolchainInfo(binary, runfiles, files_to_run)

An RDF format converter. Resolved by rdf_transform.

FIELDS

Name	Description
binary	File: an executable that converts between RDF serializations (Turtle / N-Triples / N-Quads / JSON-LD / RDF/XML / TriG).
runfiles	runfiles: the plugin binary’s runfiles bundle.
files_to_run	FilesToRunProvider: pass in an action’s `tools=` to materialize the plugin’s runfiles tree.

RdfValidatorToolchainInfo

load("@rules_rdf//rdf:providers.bzl", "RdfValidatorToolchainInfo")

RdfValidatorToolchainInfo(binary, runfiles, files_to_run)

An RDF validator (SHACL today; ShEx in scope for v0.2). Resolved by rdf_validate_test.

FIELDS

Name	Description
binary	File: an executable that validates an RDF dataset against a shapes graph per the contract.
runfiles	runfiles: the plugin binary’s runfiles bundle.
files_to_run	FilesToRunProvider: pass in an action’s `tools=` to materialize the plugin’s runfiles tree.

SparqlEngineToolchainInfo

load("@rules_rdf//rdf:providers.bzl", "SparqlEngineToolchainInfo")

SparqlEngineToolchainInfo(binary, runfiles, files_to_run)

A SPARQL query engine. Resolved by sparql_query_test and sparql_query_run.

FIELDS

Name	Description
binary	File: an executable that runs SPARQL queries per the rules_rdf plugin contract.
runfiles	runfiles: the plugin binary’s runfiles bundle.
files_to_run	FilesToRunProvider: pass in an action’s `tools=` so Bazel materializes the plugin’s runfiles tree (java_binary / py_binary plugins fail to locate runfiles otherwise).

User-facing inference rules.

rdf_reason runs the registered rdf_reasoner toolchain over an RDF dataset and emits the derived-triples graph (Turtle) as a build artifact. Unlike sparql_query_test / rdf_validate_test, this is a regular rule — its output is a file that downstream rules can declare as a src or data dependency.

load("@rules_rdf//rdf:dataset.bzl", "rdf_dataset")
load("@rules_rdf//reason:defs.bzl", "rdf_reason")

rdf_dataset(name = "ontology", srcs = glob(["*.ttl"]))

rdf_reason(
    name = "inferred",
    base = ":ontology",
    profile = "rdfs",
)

For custom rule sets (Jena RETE rules):

rdf_reason(
    name = "inferred",
    base = ":ontology",
    profile = "custom",
    rules = "rules/transitive.rule",
)

The reasoner toolchain implementation decides which profiles are supported; the abstract layer only validates that profile = "custom" is paired with rules and vice versa.

rdf_reason

load("@rules_rdf//reason:defs.bzl", "rdf_reason")

rdf_reason(name, base, include_base, profile, rules)

Run inference over an RDF dataset; emit the derived-triples graph (Turtle).

ATTRIBUTES

Name	Description	Type	Mandatory	Default
name	A unique name for this target.	Name	required
base	RDF dataset to run inference over.	Label	required
include_base	If True, emit base + derived triples; otherwise only the derived (default).	Boolean	optional	`False`
profile	Reasoning profile. `custom` requires `rules`.	String	optional	`"rdfs"`
rules	Custom rule file (Jena RETE syntax). Required iff profile = ‘custom’.	Label	optional	`None`

User-facing SPARQL rules.

sparql_query_test is the zero-row gate idiom: declare an invariant as a SPARQL query whose result set is empty when the graph satisfies the invariant. CI runs it as a Bazel test; any non-empty row triggers a failure.

It’s the rules_rdf analog of the production GateZeroRows.java pattern in the Aion RFC repo’s kg/java/. v0.1 wires the rule through sparql_engine_toolchain_type; the actual SPARQL execution comes from whichever concrete toolchain the consumer registered (rules_jena, a future rules_rdflib, etc.).

load("@rules_rdf//rdf:dataset.bzl", "rdf_dataset")
load("@rules_rdf//sparql:defs.bzl", "sparql_query_test")

rdf_dataset(name = "corpus", srcs = glob(["*.ttl"]))

sparql_query_test(
    name = "no_dangling_refs",
    dataset = ":corpus",
    query = "queries/dangling.rq",
)

sparql_query

load("@rules_rdf//sparql:defs.bzl", "sparql_query")

sparql_query(name, dataset, out_format, query)

Run a SPARQL query and emit the results as a build artifact (the producer counterpart to sparql_query_test’s gate). Turns a reasoned graph into queryable, downstream-consumable data — e.g. grounding tuples for training-data generation.

ATTRIBUTES

Name	Description	Type	Mandatory
name	A unique name for this target.	Name	required
dataset	The `rdf_dataset` (closure) to query.	Label	required
out_format	Result serialization. Tabular (tsv/csv/json/xml) for SELECT/ASK; RDF (turtle/ntriples/…) for CONSTRUCT/DESCRIBE (also yields an rdf_dataset).	String	required
query	The SPARQL query file (SELECT/ASK → tabular; CONSTRUCT/DESCRIBE → graph).	Label	required

sparql_query_smoke_test

load("@rules_rdf//sparql:defs.bzl", "sparql_query_smoke_test")

sparql_query_smoke_test(name, dataset, queries)

Assert that a set of SPARQL queries all parse + execute against a dataset. The query-smoke gate idiom — catches syntax errors and reference rot after schema changes.

ATTRIBUTES

Name	Description	Type	Mandatory
name	A unique name for this target.	Name	required
dataset	An `rdf_dataset` the queries run against.	Label	required
queries	SPARQL query files. The test passes iff every one parses and executes without error (no row-count assertion — that’s `sparql_query_test`).	List of labels	required

sparql_query_test

load("@rules_rdf//sparql:defs.bzl", "sparql_query_test")

sparql_query_test(name, dataset, query)

Run a SPARQL query against an RDF dataset; fail if the result set is non-empty. The zero-row gate idiom.

ATTRIBUTES

Name	Description	Type	Mandatory
name	A unique name for this target.	Name	required
dataset	An `rdf_dataset` whose triples the query runs against.	Label	required
query	The SPARQL query file. Result set must be empty for the test to pass (per `--fail-on-nonempty`).	Label	required

Toolchain registration rules for rules_rdf.

One rule per toolchain type. Each takes the plugin binary as a mandatory exec-config label and exposes the matching *ToolchainInfo provider with both the binary File and its runfiles bundle.

Concrete plugins (rules_jena, rules_rdflib, …) register via:

sparql_engine_toolchain(
    name = "jena_arq_sparql_toolchain",
    binary = ":jena_sparql",
)

toolchain(
    name = "jena_arq_sparql",
    toolchain = ":jena_arq_sparql_toolchain",
    toolchain_type = "@rules_rdf//rdf:sparql_engine_toolchain_type",
)

rdf_reasoner_toolchain

load("@rules_rdf//rdf:toolchains.bzl", "rdf_reasoner_toolchain")

rdf_reasoner_toolchain(name, binary)

Declare an RDF reasoner (inference) toolchain.

ATTRIBUTES

Name	Description	Type	Mandatory	Default
name	A unique name for this target.	Name	required
binary	The plugin executable. Must conform to the contract in rdf/plugin_contract.md.	Label	required

rdf_serializer_toolchain

load("@rules_rdf//rdf:toolchains.bzl", "rdf_serializer_toolchain")

rdf_serializer_toolchain(name, binary)

Declare an RDF serializer (format-converter) toolchain.

ATTRIBUTES

Name	Description	Type	Mandatory	Default
name	A unique name for this target.	Name	required
binary	The plugin executable. Must conform to the contract in rdf/plugin_contract.md.	Label	required

rdf_validator_toolchain

load("@rules_rdf//rdf:toolchains.bzl", "rdf_validator_toolchain")

rdf_validator_toolchain(name, binary)

Declare an RDF validator toolchain.

ATTRIBUTES

Name	Description	Type	Mandatory	Default
name	A unique name for this target.	Name	required
binary	The plugin executable. Must conform to the contract in rdf/plugin_contract.md.	Label	required

sparql_engine_toolchain

load("@rules_rdf//rdf:toolchains.bzl", "sparql_engine_toolchain")

sparql_engine_toolchain(name, binary)

Declare a SPARQL engine toolchain.

ATTRIBUTES

Name	Description	Type	Mandatory	Default
name	A unique name for this target.	Name	required
binary	The plugin executable. Must conform to the contract in rdf/plugin_contract.md.	Label	required

User-facing format-conversion rule.

rdf_transform re-serializes an RDF dataset into a different format via the registered rdf_serializer toolchain. The output is a regular build artifact.

load("@rules_rdf//rdf:dataset.bzl", "rdf_dataset")
load("@rules_rdf//transform:defs.bzl", "rdf_transform")

rdf_dataset(name = "src_turtle", srcs = ["data.ttl"], in_format = "turtle")

rdf_transform(
    name = "data_ntriples",
    dataset = ":src_turtle",
    out_format = "ntriples",
)

Output filename = <name>.<ext> where <ext> is the canonical extension for out_format (.ttl, .nt, .nq, .trig, .jsonld, .rdf).

rdf_transform

load("@rules_rdf//transform:defs.bzl", "rdf_transform")

rdf_transform(name, dataset, out_format)

Convert an RDF dataset between serializations.

ATTRIBUTES

Name	Description	Type	Mandatory
name	A unique name for this target.	Name	required
dataset	RDF dataset to convert.	Label	required
out_format	Target serialization.	String	required

User-facing RDF validation rules.

rdf_validate_test runs a SHACL shapes graph against an RDF dataset and fails the build if any violations are reported. Resolves through rdf_validator_toolchain_type so the actual SHACL engine is pluggable (rules_jena’s org.apache.jena.shacl.ShaclValidator, a future rules_pyshacl, …).

load("@rules_rdf//rdf:dataset.bzl", "rdf_dataset")
load("@rules_rdf//validate:defs.bzl", "rdf_validate_test")

rdf_dataset(name = "ontology", srcs = glob(["ontology/*.ttl"]))

rdf_validate_test(
    name = "ontology_conforms",
    dataset = ":ontology",
    shapes = "shapes.ttl",
)

ShEx support is in scope for v0.2 (the toolchain contract leaves room for it via the --shapes-language arg, but for v0.1 the shapes file is assumed Turtle-encoded SHACL).

rdf_validate_test

load("@rules_rdf//validate:defs.bzl", "rdf_validate_test")

rdf_validate_test(name, dataset, severity, shapes)

Validate an RDF dataset against a SHACL shapes graph.

ATTRIBUTES

Name	Description	Type	Mandatory	Default
name	A unique name for this target.	Name	required
dataset	An `rdf_dataset` to validate.	Label	required
severity	Minimum severity that fails the build.	String	optional	`"violation"`
shapes	SHACL shapes graph (Turtle).	Label	required

Keyboard shortcuts

fastverk