rules_rdf
API reference, generated from the module’s .bzl docstrings (stardoc).
rules_rdf roadmap
Two waypoints between today’s scaffold and a usable abstract RDF toolchain layer. Each waypoint is one published bazel-registry release.
v0.1 — toolchain types + plugin contract + placeholder rules
The goal is for a consumer to be able to declare every planned target
type (rdf_dataset, sparql_query_test, rdf_validate_test,
rdf_transform, rdf_reason) today, against a no-op default
toolchain, then swap in a real implementation (e.g.
rules_jena) without
touching their BUILD files. This makes rules_rdf adoptable
incrementally — consumers can wire their build graph before any
engine is integrated.
Deliverables:
- Plugin contract document at
rdf/plugin_contract.md(draft already in tree). Same shape as rules_jsonschema’splugin_contract.md, adjusted for RDF semantics:- stdin = the RDF document bytes (the dataset; format declared
via
--in-format), not a JSON schema. - argv =
--key=valuepairs (same as jsonschema). Standard flags:--rule-name,--in-format. Per-toolchain flags:--query,--shapes,--out-format,--profile. - stdout = generated output (query results / validation report / converted graph / inferred triples). Same single-file-per- invocation discipline.
- stderr = diagnostics.
- exit = 0 / non-zero.
- stdin = the RDF document bytes (the dataset; format declared
via
- All four toolchain types defined in
//rdf:BUILD.bazel:sparql_engine_toolchain_type,rdf_validator_toolchain_type,rdf_serializer_toolchain_type,rdf_reasoner_toolchain_type. - Providers:
RdfDatasetInfo,RdfEngineToolchainInfo,RdfValidatorToolchainInfo,RdfSerializerToolchainInfo,RdfReasonerToolchainInfo. Each toolchain info wraps a singlebinaryFile, matching the jsonschema pattern. - Default user-facing rules implemented as
_no_opplaceholders:rdf_dataset— real (returnsRdfDatasetInfo; no toolchain needed).sparql_query_test,sparql_query_run,rdf_validate_test,rdf_transform,rdf_reason— declare their toolchain dependency and accept all their final attrs, but the in-repo default toolchain points at a_no_opbinary that writes an empty stdout and exits 0. Consumers can declare targets and they build; swapping inrules_jenamakes them actually run.
- Conformance test driver
rdf_plugin_contract_testcovering the same scenarios as the jsonschema driver —valid_minimal(small dataset round-trips),malformed_input(garbage on stdin → exit non-zero, empty stdout),unknown_flag(rejects unknown argv),determinism(byte-identical stdout on identical invocations). One driver, parameterised by toolchain type. - stardoc for the public surface, with
diff_testfreshness.
Out of scope for v0.1: chained pipelines, real-engine examples, result-set diff helpers.
v0.2 — cross-toolchain wiring + real-engine examples
Once rules_jena is published and registered, rules_rdf grows the
glue that ties multiple toolchains together in one pipeline.
Deliverables:
- Chained pipelines —
rdf_validate_testandsparql_query_testaccept the output ofrdf_reasonas their dataset, so a consumer can express “materialise inferences, then run shape validation on the closure” as a typed build graph. The intermediate inferred graph is a realRdfDatasetInfo-bearing target, not a hidden side effect. - Result-set helpers — a small Starlark helper for the common
zero-row-CSV gate pattern, plus an
rdf_results_diff_testfor golden SPARQL result sets (SRX/JSON normalisation). - Examples directory using a real RDF corpus:
- W3C example datasets
fetched via
http_filewith a pinned sha256 (the same fetch-and-pin discipline rules_docker_compose uses for the compose-spec schema). - One end-to-end smoke target per toolchain type, registered
against
rules_jena.
- W3C example datasets
fetched via
- CI matrix running the conformance test driver against every registered concrete implementation we know about, gating rules_rdf releases on at least one concrete backend passing.
After v0.2 the abstract layer is feature-complete; further work moves into the concrete-implementation repos.
rdf_plugin_contract_test(name, plugin, toolchain_type) runs
the rules_rdf conformance test driver against any executable
claiming to implement the plugin contract for the named toolchain
type. See plugin_contract.md for what the
driver asserts.
Plugin authors gate toolchain registration on it:
load("@rules_rdf//rdf:contract_test.bzl", "rdf_plugin_contract_test")
rdf_plugin_contract_test(
name = "jena_sparql_conforms",
plugin = "//jena:jena_sparql",
toolchain_type = "sparql_engine",
)
The four toolchain types each have their own minimum-valid input
inside the driver; pass the bare name (without the
_toolchain_type suffix or @rules_rdf//rdf: prefix).
rdf_plugin_contract_test
load("@rules_rdf//rdf:contract_test.bzl", "rdf_plugin_contract_test")
rdf_plugin_contract_test(name, plugin, toolchain_type)
Run the rules_rdf conformance test driver against a plugin binary. See plugin_contract.md.
ATTRIBUTES
| Name | Description | Type | Mandatory | Default |
|---|---|---|---|---|
| name | A unique name for this target. | Name | required | |
| plugin | The plugin binary to test. Any executable that claims to implement the rules_rdf plugin contract. | Label | required | |
| toolchain_type | Which toolchain type’s scenarios to run: one of sparql_engine, rdf_validator, rdf_serializer, rdf_reasoner. | String | required |
rdf_dataset(name, srcs, in_format) — declare a labeled
collection of RDF files.
This is the single source of “what triples are in this graph?” that every other rule consumes. Carrying both the file depset and the format string up-front lets sparql_query_test / rdf_validate_test / … avoid sniffing extensions at action time and lets consumers mix datasets with declared formats in one BUILD target without ambiguity.
Multi-file datasets are concatenated by the consuming rule in
lexicographic order before being piped to the plugin’s stdin
(see rdf/plugin_contract.md). Consumers that care about ordering
should name files to sort accordingly.
rdf_dataset
load("@rules_rdf//rdf:dataset.bzl", "rdf_dataset")
rdf_dataset(name, deps, srcs, in_format)
A labeled collection of RDF source files + linked-graph deps.
ATTRIBUTES
| Name | Description | Type | Mandatory | Default |
|---|---|---|---|---|
| name | A unique name for this target. | Name | required | |
| deps | Other rdf_datasets this graph links to (imported ontologies, vocabulary modules). Their files are folded into this dataset’s transitive_files closure, so reasoning/query over the linked vocabularies resolves. Deps should share in_format (normalize otherwise). | List of labels | optional | [] |
| srcs | RDF source files. Concatenated in lexicographic order by consuming rules before being piped to the plugin’s stdin. | List of labels | required | |
| in_format | Serialization of every file in srcs. Mixed-format datasets aren’t supported in v0.1 — use rdf_transform first. | String | optional | "turtle" |
Providers for the four rules_rdf toolchain types.
Each provider wraps both the executable and the runfiles needed
to invoke it. Carrying runfiles in the provider matters for
plugin implementations that aren’t a single self-contained binary
— py_binary, java_binary, sh_binary all stage helper files via
runfiles. Consuming rules merge the provider’s runfiles into
their own to make the plugin actually executable inside a Bazel
sandbox.
RdfDatasetInfo
load("@rules_rdf//rdf:providers.bzl", "RdfDatasetInfo")
RdfDatasetInfo(files, transitive_files, in_format)
A declared RDF dataset.
FIELDS
RdfReasonerToolchainInfo
load("@rules_rdf//rdf:providers.bzl", "RdfReasonerToolchainInfo")
RdfReasonerToolchainInfo(binary, runfiles, files_to_run)
An RDF inference engine. Resolved by rdf_reason.
FIELDS
RdfSerializerToolchainInfo
load("@rules_rdf//rdf:providers.bzl", "RdfSerializerToolchainInfo")
RdfSerializerToolchainInfo(binary, runfiles, files_to_run)
An RDF format converter. Resolved by rdf_transform.
FIELDS
RdfValidatorToolchainInfo
load("@rules_rdf//rdf:providers.bzl", "RdfValidatorToolchainInfo")
RdfValidatorToolchainInfo(binary, runfiles, files_to_run)
An RDF validator (SHACL today; ShEx in scope for v0.2). Resolved by rdf_validate_test.
FIELDS
SparqlEngineToolchainInfo
load("@rules_rdf//rdf:providers.bzl", "SparqlEngineToolchainInfo")
SparqlEngineToolchainInfo(binary, runfiles, files_to_run)
A SPARQL query engine. Resolved by sparql_query_test and sparql_query_run.
FIELDS
User-facing inference rules.
rdf_reason runs the registered rdf_reasoner toolchain over an
RDF dataset and emits the derived-triples graph (Turtle) as a
build artifact. Unlike sparql_query_test / rdf_validate_test,
this is a regular rule — its output is a file that downstream
rules can declare as a src or data dependency.
load("@rules_rdf//rdf:dataset.bzl", "rdf_dataset")
load("@rules_rdf//reason:defs.bzl", "rdf_reason")
rdf_dataset(name = "ontology", srcs = glob(["*.ttl"]))
rdf_reason(
name = "inferred",
base = ":ontology",
profile = "rdfs",
)
For custom rule sets (Jena RETE rules):
rdf_reason(
name = "inferred",
base = ":ontology",
profile = "custom",
rules = "rules/transitive.rule",
)
The reasoner toolchain implementation decides which profiles are
supported; the abstract layer only validates that profile = "custom" is paired with rules and vice versa.
rdf_reason
load("@rules_rdf//reason:defs.bzl", "rdf_reason")
rdf_reason(name, base, include_base, profile, rules)
Run inference over an RDF dataset; emit the derived-triples graph (Turtle).
ATTRIBUTES
| Name | Description | Type | Mandatory | Default |
|---|---|---|---|---|
| name | A unique name for this target. | Name | required | |
| base | RDF dataset to run inference over. | Label | required | |
| include_base | If True, emit base + derived triples; otherwise only the derived (default). | Boolean | optional | False |
| profile | Reasoning profile. custom requires rules. | String | optional | "rdfs" |
| rules | Custom rule file (Jena RETE syntax). Required iff profile = ‘custom’. | Label | optional | None |
User-facing SPARQL rules.
sparql_query_test is the zero-row gate idiom: declare an
invariant as a SPARQL query whose result set is empty when the
graph satisfies the invariant. CI runs it as a Bazel test; any
non-empty row triggers a failure.
It’s the rules_rdf analog of the production GateZeroRows.java
pattern in the Aion RFC repo’s kg/java/. v0.1 wires the rule
through sparql_engine_toolchain_type; the actual SPARQL
execution comes from whichever concrete toolchain the consumer
registered (rules_jena, a future rules_rdflib, etc.).
load("@rules_rdf//rdf:dataset.bzl", "rdf_dataset")
load("@rules_rdf//sparql:defs.bzl", "sparql_query_test")
rdf_dataset(name = "corpus", srcs = glob(["*.ttl"]))
sparql_query_test(
name = "no_dangling_refs",
dataset = ":corpus",
query = "queries/dangling.rq",
)
sparql_query
load("@rules_rdf//sparql:defs.bzl", "sparql_query")
sparql_query(name, dataset, out_format, query)
Run a SPARQL query and emit the results as a build artifact (the producer counterpart to sparql_query_test’s gate). Turns a reasoned graph into queryable, downstream-consumable data — e.g. grounding tuples for training-data generation.
ATTRIBUTES
| Name | Description | Type | Mandatory | Default |
|---|---|---|---|---|
| name | A unique name for this target. | Name | required | |
| dataset | The rdf_dataset (closure) to query. | Label | required | |
| out_format | Result serialization. Tabular (tsv/csv/json/xml) for SELECT/ASK; RDF (turtle/ntriples/…) for CONSTRUCT/DESCRIBE (also yields an rdf_dataset). | String | required | |
| query | The SPARQL query file (SELECT/ASK → tabular; CONSTRUCT/DESCRIBE → graph). | Label | required |
sparql_query_smoke_test
load("@rules_rdf//sparql:defs.bzl", "sparql_query_smoke_test")
sparql_query_smoke_test(name, dataset, queries)
Assert that a set of SPARQL queries all parse + execute against a dataset. The query-smoke gate idiom — catches syntax errors and reference rot after schema changes.
ATTRIBUTES
| Name | Description | Type | Mandatory | Default |
|---|---|---|---|---|
| name | A unique name for this target. | Name | required | |
| dataset | An rdf_dataset the queries run against. | Label | required | |
| queries | SPARQL query files. The test passes iff every one parses and executes without error (no row-count assertion — that’s sparql_query_test). | List of labels | required |
sparql_query_test
load("@rules_rdf//sparql:defs.bzl", "sparql_query_test")
sparql_query_test(name, dataset, query)
Run a SPARQL query against an RDF dataset; fail if the result set is non-empty. The zero-row gate idiom.
ATTRIBUTES
| Name | Description | Type | Mandatory | Default |
|---|---|---|---|---|
| name | A unique name for this target. | Name | required | |
| dataset | An rdf_dataset whose triples the query runs against. | Label | required | |
| query | The SPARQL query file. Result set must be empty for the test to pass (per --fail-on-nonempty). | Label | required |
Toolchain registration rules for rules_rdf.
One rule per toolchain type. Each takes the plugin binary as a
mandatory exec-config label and exposes the matching *ToolchainInfo
provider with both the binary File and its runfiles bundle.
Concrete plugins (rules_jena, rules_rdflib, …) register via:
sparql_engine_toolchain(
name = "jena_arq_sparql_toolchain",
binary = ":jena_sparql",
)
toolchain(
name = "jena_arq_sparql",
toolchain = ":jena_arq_sparql_toolchain",
toolchain_type = "@rules_rdf//rdf:sparql_engine_toolchain_type",
)
rdf_reasoner_toolchain
load("@rules_rdf//rdf:toolchains.bzl", "rdf_reasoner_toolchain")
rdf_reasoner_toolchain(name, binary)
Declare an RDF reasoner (inference) toolchain.
ATTRIBUTES
| Name | Description | Type | Mandatory | Default |
|---|---|---|---|---|
| name | A unique name for this target. | Name | required | |
| binary | The plugin executable. Must conform to the contract in rdf/plugin_contract.md. | Label | required |
rdf_serializer_toolchain
load("@rules_rdf//rdf:toolchains.bzl", "rdf_serializer_toolchain")
rdf_serializer_toolchain(name, binary)
Declare an RDF serializer (format-converter) toolchain.
ATTRIBUTES
| Name | Description | Type | Mandatory | Default |
|---|---|---|---|---|
| name | A unique name for this target. | Name | required | |
| binary | The plugin executable. Must conform to the contract in rdf/plugin_contract.md. | Label | required |
rdf_validator_toolchain
load("@rules_rdf//rdf:toolchains.bzl", "rdf_validator_toolchain")
rdf_validator_toolchain(name, binary)
Declare an RDF validator toolchain.
ATTRIBUTES
| Name | Description | Type | Mandatory | Default |
|---|---|---|---|---|
| name | A unique name for this target. | Name | required | |
| binary | The plugin executable. Must conform to the contract in rdf/plugin_contract.md. | Label | required |
sparql_engine_toolchain
load("@rules_rdf//rdf:toolchains.bzl", "sparql_engine_toolchain")
sparql_engine_toolchain(name, binary)
Declare a SPARQL engine toolchain.
ATTRIBUTES
| Name | Description | Type | Mandatory | Default |
|---|---|---|---|---|
| name | A unique name for this target. | Name | required | |
| binary | The plugin executable. Must conform to the contract in rdf/plugin_contract.md. | Label | required |
User-facing format-conversion rule.
rdf_transform re-serializes an RDF dataset into a different
format via the registered rdf_serializer toolchain. The output
is a regular build artifact.
load("@rules_rdf//rdf:dataset.bzl", "rdf_dataset")
load("@rules_rdf//transform:defs.bzl", "rdf_transform")
rdf_dataset(name = "src_turtle", srcs = ["data.ttl"], in_format = "turtle")
rdf_transform(
name = "data_ntriples",
dataset = ":src_turtle",
out_format = "ntriples",
)
Output filename = <name>.<ext> where <ext> is the canonical
extension for out_format (.ttl, .nt, .nq, .trig,
.jsonld, .rdf).
rdf_transform
load("@rules_rdf//transform:defs.bzl", "rdf_transform")
rdf_transform(name, dataset, out_format)
Convert an RDF dataset between serializations.
ATTRIBUTES
| Name | Description | Type | Mandatory | Default |
|---|---|---|---|---|
| name | A unique name for this target. | Name | required | |
| dataset | RDF dataset to convert. | Label | required | |
| out_format | Target serialization. | String | required |
User-facing RDF validation rules.
rdf_validate_test runs a SHACL shapes graph against an RDF
dataset and fails the build if any violations are reported.
Resolves through rdf_validator_toolchain_type so the actual
SHACL engine is pluggable (rules_jena’s
org.apache.jena.shacl.ShaclValidator, a future
rules_pyshacl, …).
load("@rules_rdf//rdf:dataset.bzl", "rdf_dataset")
load("@rules_rdf//validate:defs.bzl", "rdf_validate_test")
rdf_dataset(name = "ontology", srcs = glob(["ontology/*.ttl"]))
rdf_validate_test(
name = "ontology_conforms",
dataset = ":ontology",
shapes = "shapes.ttl",
)
ShEx support is in scope for v0.2 (the toolchain contract leaves
room for it via the --shapes-language arg, but for v0.1 the
shapes file is assumed Turtle-encoded SHACL).
rdf_validate_test
load("@rules_rdf//validate:defs.bzl", "rdf_validate_test")
rdf_validate_test(name, dataset, severity, shapes)
Validate an RDF dataset against a SHACL shapes graph.
ATTRIBUTES