Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

rules_rdf

API reference, generated from the module’s .bzl docstrings (stardoc).


rules_rdf roadmap

Two waypoints between today’s scaffold and a usable abstract RDF toolchain layer. Each waypoint is one published bazel-registry release.

v0.1 — toolchain types + plugin contract + placeholder rules

The goal is for a consumer to be able to declare every planned target type (rdf_dataset, sparql_query_test, rdf_validate_test, rdf_transform, rdf_reason) today, against a no-op default toolchain, then swap in a real implementation (e.g. rules_jena) without touching their BUILD files. This makes rules_rdf adoptable incrementally — consumers can wire their build graph before any engine is integrated.

Deliverables:

  • Plugin contract document at rdf/plugin_contract.md (draft already in tree). Same shape as rules_jsonschema’s plugin_contract.md, adjusted for RDF semantics:
    • stdin = the RDF document bytes (the dataset; format declared via --in-format), not a JSON schema.
    • argv = --key=value pairs (same as jsonschema). Standard flags: --rule-name, --in-format. Per-toolchain flags: --query, --shapes, --out-format, --profile.
    • stdout = generated output (query results / validation report / converted graph / inferred triples). Same single-file-per- invocation discipline.
    • stderr = diagnostics.
    • exit = 0 / non-zero.
  • All four toolchain types defined in //rdf:BUILD.bazel: sparql_engine_toolchain_type, rdf_validator_toolchain_type, rdf_serializer_toolchain_type, rdf_reasoner_toolchain_type.
  • Providers: RdfDatasetInfo, RdfEngineToolchainInfo, RdfValidatorToolchainInfo, RdfSerializerToolchainInfo, RdfReasonerToolchainInfo. Each toolchain info wraps a single binary File, matching the jsonschema pattern.
  • Default user-facing rules implemented as _no_op placeholders:
    • rdf_dataset — real (returns RdfDatasetInfo; no toolchain needed).
    • sparql_query_test, sparql_query_run, rdf_validate_test, rdf_transform, rdf_reason — declare their toolchain dependency and accept all their final attrs, but the in-repo default toolchain points at a _no_op binary that writes an empty stdout and exits 0. Consumers can declare targets and they build; swapping in rules_jena makes them actually run.
  • Conformance test driver rdf_plugin_contract_test covering the same scenarios as the jsonschema driver — valid_minimal (small dataset round-trips), malformed_input (garbage on stdin → exit non-zero, empty stdout), unknown_flag (rejects unknown argv), determinism (byte-identical stdout on identical invocations). One driver, parameterised by toolchain type.
  • stardoc for the public surface, with diff_test freshness.

Out of scope for v0.1: chained pipelines, real-engine examples, result-set diff helpers.

v0.2 — cross-toolchain wiring + real-engine examples

Once rules_jena is published and registered, rules_rdf grows the glue that ties multiple toolchains together in one pipeline.

Deliverables:

  • Chained pipelinesrdf_validate_test and sparql_query_test accept the output of rdf_reason as their dataset, so a consumer can express “materialise inferences, then run shape validation on the closure” as a typed build graph. The intermediate inferred graph is a real RdfDatasetInfo-bearing target, not a hidden side effect.
  • Result-set helpers — a small Starlark helper for the common zero-row-CSV gate pattern, plus an rdf_results_diff_test for golden SPARQL result sets (SRX/JSON normalisation).
  • Examples directory using a real RDF corpus:
    • W3C example datasets fetched via http_file with a pinned sha256 (the same fetch-and-pin discipline rules_docker_compose uses for the compose-spec schema).
    • One end-to-end smoke target per toolchain type, registered against rules_jena.
  • CI matrix running the conformance test driver against every registered concrete implementation we know about, gating rules_rdf releases on at least one concrete backend passing.

After v0.2 the abstract layer is feature-complete; further work moves into the concrete-implementation repos.


rdf_plugin_contract_test(name, plugin, toolchain_type) runs the rules_rdf conformance test driver against any executable claiming to implement the plugin contract for the named toolchain type. See plugin_contract.md for what the driver asserts.

Plugin authors gate toolchain registration on it:

load("@rules_rdf//rdf:contract_test.bzl", "rdf_plugin_contract_test")

rdf_plugin_contract_test(
    name = "jena_sparql_conforms",
    plugin = "//jena:jena_sparql",
    toolchain_type = "sparql_engine",
)

The four toolchain types each have their own minimum-valid input inside the driver; pass the bare name (without the _toolchain_type suffix or @rules_rdf//rdf: prefix).

rdf_plugin_contract_test

load("@rules_rdf//rdf:contract_test.bzl", "rdf_plugin_contract_test")

rdf_plugin_contract_test(name, plugin, toolchain_type)

Run the rules_rdf conformance test driver against a plugin binary. See plugin_contract.md.

ATTRIBUTES

NameDescriptionTypeMandatoryDefault
nameA unique name for this target.Namerequired
pluginThe plugin binary to test. Any executable that claims to implement the rules_rdf plugin contract.Labelrequired
toolchain_typeWhich toolchain type’s scenarios to run: one of sparql_engine, rdf_validator, rdf_serializer, rdf_reasoner.Stringrequired

rdf_dataset(name, srcs, in_format) — declare a labeled collection of RDF files.

This is the single source of “what triples are in this graph?” that every other rule consumes. Carrying both the file depset and the format string up-front lets sparql_query_test / rdf_validate_test / … avoid sniffing extensions at action time and lets consumers mix datasets with declared formats in one BUILD target without ambiguity.

Multi-file datasets are concatenated by the consuming rule in lexicographic order before being piped to the plugin’s stdin (see rdf/plugin_contract.md). Consumers that care about ordering should name files to sort accordingly.

rdf_dataset

load("@rules_rdf//rdf:dataset.bzl", "rdf_dataset")

rdf_dataset(name, deps, srcs, in_format)

A labeled collection of RDF source files + linked-graph deps.

ATTRIBUTES

NameDescriptionTypeMandatoryDefault
nameA unique name for this target.Namerequired
depsOther rdf_datasets this graph links to (imported ontologies, vocabulary modules). Their files are folded into this dataset’s transitive_files closure, so reasoning/query over the linked vocabularies resolves. Deps should share in_format (normalize otherwise).List of labelsoptional[]
srcsRDF source files. Concatenated in lexicographic order by consuming rules before being piped to the plugin’s stdin.List of labelsrequired
in_formatSerialization of every file in srcs. Mixed-format datasets aren’t supported in v0.1 — use rdf_transform first.Stringoptional"turtle"

Providers for the four rules_rdf toolchain types.

Each provider wraps both the executable and the runfiles needed to invoke it. Carrying runfiles in the provider matters for plugin implementations that aren’t a single self-contained binary — py_binary, java_binary, sh_binary all stage helper files via runfiles. Consuming rules merge the provider’s runfiles into their own to make the plugin actually executable inside a Bazel sandbox.

RdfDatasetInfo

load("@rules_rdf//rdf:providers.bzl", "RdfDatasetInfo")

RdfDatasetInfo(files, transitive_files, in_format)

A declared RDF dataset.

FIELDS

NameDescription
filesdepset[File]: this dataset’s own source files (excludes deps).
transitive_filesdepset[File]: the full graph closure — this dataset’s files plus the transitive closure of every deps dataset. Consumers needing all linked triples (sparql_query, rdf_reason, rdf_validate) operate over this; the subclass/import closure of a grounding ontology (schema.org + SKOS + DC + modules) is assembled here.
in_formatstr: serialization of the dataset files. One of turtle, ntriples, nquads, trig, jsonld, rdfxml. The whole closure must share this format (normalize a differing dep with rdf_transform first).

RdfReasonerToolchainInfo

load("@rules_rdf//rdf:providers.bzl", "RdfReasonerToolchainInfo")

RdfReasonerToolchainInfo(binary, runfiles, files_to_run)

An RDF inference engine. Resolved by rdf_reason.

FIELDS

NameDescription
binaryFile: an executable that runs RDFS / OWL / custom-rule inference and emits derived triples.
runfilesrunfiles: the plugin binary’s runfiles bundle.
files_to_runFilesToRunProvider: pass in an action’s tools= to materialize the plugin’s runfiles tree.

RdfSerializerToolchainInfo

load("@rules_rdf//rdf:providers.bzl", "RdfSerializerToolchainInfo")

RdfSerializerToolchainInfo(binary, runfiles, files_to_run)

An RDF format converter. Resolved by rdf_transform.

FIELDS

NameDescription
binaryFile: an executable that converts between RDF serializations (Turtle / N-Triples / N-Quads / JSON-LD / RDF/XML / TriG).
runfilesrunfiles: the plugin binary’s runfiles bundle.
files_to_runFilesToRunProvider: pass in an action’s tools= to materialize the plugin’s runfiles tree.

RdfValidatorToolchainInfo

load("@rules_rdf//rdf:providers.bzl", "RdfValidatorToolchainInfo")

RdfValidatorToolchainInfo(binary, runfiles, files_to_run)

An RDF validator (SHACL today; ShEx in scope for v0.2). Resolved by rdf_validate_test.

FIELDS

NameDescription
binaryFile: an executable that validates an RDF dataset against a shapes graph per the contract.
runfilesrunfiles: the plugin binary’s runfiles bundle.
files_to_runFilesToRunProvider: pass in an action’s tools= to materialize the plugin’s runfiles tree.

SparqlEngineToolchainInfo

load("@rules_rdf//rdf:providers.bzl", "SparqlEngineToolchainInfo")

SparqlEngineToolchainInfo(binary, runfiles, files_to_run)

A SPARQL query engine. Resolved by sparql_query_test and sparql_query_run.

FIELDS

NameDescription
binaryFile: an executable that runs SPARQL queries per the rules_rdf plugin contract.
runfilesrunfiles: the plugin binary’s runfiles bundle.
files_to_runFilesToRunProvider: pass in an action’s tools= so Bazel materializes the plugin’s runfiles tree (java_binary / py_binary plugins fail to locate runfiles otherwise).

User-facing inference rules.

rdf_reason runs the registered rdf_reasoner toolchain over an RDF dataset and emits the derived-triples graph (Turtle) as a build artifact. Unlike sparql_query_test / rdf_validate_test, this is a regular rule — its output is a file that downstream rules can declare as a src or data dependency.

load("@rules_rdf//rdf:dataset.bzl", "rdf_dataset")
load("@rules_rdf//reason:defs.bzl", "rdf_reason")

rdf_dataset(name = "ontology", srcs = glob(["*.ttl"]))

rdf_reason(
    name = "inferred",
    base = ":ontology",
    profile = "rdfs",
)

For custom rule sets (Jena RETE rules):

rdf_reason(
    name = "inferred",
    base = ":ontology",
    profile = "custom",
    rules = "rules/transitive.rule",
)

The reasoner toolchain implementation decides which profiles are supported; the abstract layer only validates that profile = "custom" is paired with rules and vice versa.

rdf_reason

load("@rules_rdf//reason:defs.bzl", "rdf_reason")

rdf_reason(name, base, include_base, profile, rules)

Run inference over an RDF dataset; emit the derived-triples graph (Turtle).

ATTRIBUTES

NameDescriptionTypeMandatoryDefault
nameA unique name for this target.Namerequired
baseRDF dataset to run inference over.Labelrequired
include_baseIf True, emit base + derived triples; otherwise only the derived (default).BooleanoptionalFalse
profileReasoning profile. custom requires rules.Stringoptional"rdfs"
rulesCustom rule file (Jena RETE syntax). Required iff profile = ‘custom’.LabeloptionalNone

User-facing SPARQL rules.

sparql_query_test is the zero-row gate idiom: declare an invariant as a SPARQL query whose result set is empty when the graph satisfies the invariant. CI runs it as a Bazel test; any non-empty row triggers a failure.

It’s the rules_rdf analog of the production GateZeroRows.java pattern in the Aion RFC repo’s kg/java/. v0.1 wires the rule through sparql_engine_toolchain_type; the actual SPARQL execution comes from whichever concrete toolchain the consumer registered (rules_jena, a future rules_rdflib, etc.).

load("@rules_rdf//rdf:dataset.bzl", "rdf_dataset")
load("@rules_rdf//sparql:defs.bzl", "sparql_query_test")

rdf_dataset(name = "corpus", srcs = glob(["*.ttl"]))

sparql_query_test(
    name = "no_dangling_refs",
    dataset = ":corpus",
    query = "queries/dangling.rq",
)

sparql_query

load("@rules_rdf//sparql:defs.bzl", "sparql_query")

sparql_query(name, dataset, out_format, query)

Run a SPARQL query and emit the results as a build artifact (the producer counterpart to sparql_query_test’s gate). Turns a reasoned graph into queryable, downstream-consumable data — e.g. grounding tuples for training-data generation.

ATTRIBUTES

NameDescriptionTypeMandatoryDefault
nameA unique name for this target.Namerequired
datasetThe rdf_dataset (closure) to query.Labelrequired
out_formatResult serialization. Tabular (tsv/csv/json/xml) for SELECT/ASK; RDF (turtle/ntriples/…) for CONSTRUCT/DESCRIBE (also yields an rdf_dataset).Stringrequired
queryThe SPARQL query file (SELECT/ASK → tabular; CONSTRUCT/DESCRIBE → graph).Labelrequired

sparql_query_smoke_test

load("@rules_rdf//sparql:defs.bzl", "sparql_query_smoke_test")

sparql_query_smoke_test(name, dataset, queries)

Assert that a set of SPARQL queries all parse + execute against a dataset. The query-smoke gate idiom — catches syntax errors and reference rot after schema changes.

ATTRIBUTES

NameDescriptionTypeMandatoryDefault
nameA unique name for this target.Namerequired
datasetAn rdf_dataset the queries run against.Labelrequired
queriesSPARQL query files. The test passes iff every one parses and executes without error (no row-count assertion — that’s sparql_query_test).List of labelsrequired

sparql_query_test

load("@rules_rdf//sparql:defs.bzl", "sparql_query_test")

sparql_query_test(name, dataset, query)

Run a SPARQL query against an RDF dataset; fail if the result set is non-empty. The zero-row gate idiom.

ATTRIBUTES

NameDescriptionTypeMandatoryDefault
nameA unique name for this target.Namerequired
datasetAn rdf_dataset whose triples the query runs against.Labelrequired
queryThe SPARQL query file. Result set must be empty for the test to pass (per --fail-on-nonempty).Labelrequired

Toolchain registration rules for rules_rdf.

One rule per toolchain type. Each takes the plugin binary as a mandatory exec-config label and exposes the matching *ToolchainInfo provider with both the binary File and its runfiles bundle.

Concrete plugins (rules_jena, rules_rdflib, …) register via:

sparql_engine_toolchain(
    name = "jena_arq_sparql_toolchain",
    binary = ":jena_sparql",
)

toolchain(
    name = "jena_arq_sparql",
    toolchain = ":jena_arq_sparql_toolchain",
    toolchain_type = "@rules_rdf//rdf:sparql_engine_toolchain_type",
)

rdf_reasoner_toolchain

load("@rules_rdf//rdf:toolchains.bzl", "rdf_reasoner_toolchain")

rdf_reasoner_toolchain(name, binary)

Declare an RDF reasoner (inference) toolchain.

ATTRIBUTES

NameDescriptionTypeMandatoryDefault
nameA unique name for this target.Namerequired
binaryThe plugin executable. Must conform to the contract in rdf/plugin_contract.md.Labelrequired

rdf_serializer_toolchain

load("@rules_rdf//rdf:toolchains.bzl", "rdf_serializer_toolchain")

rdf_serializer_toolchain(name, binary)

Declare an RDF serializer (format-converter) toolchain.

ATTRIBUTES

NameDescriptionTypeMandatoryDefault
nameA unique name for this target.Namerequired
binaryThe plugin executable. Must conform to the contract in rdf/plugin_contract.md.Labelrequired

rdf_validator_toolchain

load("@rules_rdf//rdf:toolchains.bzl", "rdf_validator_toolchain")

rdf_validator_toolchain(name, binary)

Declare an RDF validator toolchain.

ATTRIBUTES

NameDescriptionTypeMandatoryDefault
nameA unique name for this target.Namerequired
binaryThe plugin executable. Must conform to the contract in rdf/plugin_contract.md.Labelrequired

sparql_engine_toolchain

load("@rules_rdf//rdf:toolchains.bzl", "sparql_engine_toolchain")

sparql_engine_toolchain(name, binary)

Declare a SPARQL engine toolchain.

ATTRIBUTES

NameDescriptionTypeMandatoryDefault
nameA unique name for this target.Namerequired
binaryThe plugin executable. Must conform to the contract in rdf/plugin_contract.md.Labelrequired

User-facing format-conversion rule.

rdf_transform re-serializes an RDF dataset into a different format via the registered rdf_serializer toolchain. The output is a regular build artifact.

load("@rules_rdf//rdf:dataset.bzl", "rdf_dataset")
load("@rules_rdf//transform:defs.bzl", "rdf_transform")

rdf_dataset(name = "src_turtle", srcs = ["data.ttl"], in_format = "turtle")

rdf_transform(
    name = "data_ntriples",
    dataset = ":src_turtle",
    out_format = "ntriples",
)

Output filename = <name>.<ext> where <ext> is the canonical extension for out_format (.ttl, .nt, .nq, .trig, .jsonld, .rdf).

rdf_transform

load("@rules_rdf//transform:defs.bzl", "rdf_transform")

rdf_transform(name, dataset, out_format)

Convert an RDF dataset between serializations.

ATTRIBUTES

NameDescriptionTypeMandatoryDefault
nameA unique name for this target.Namerequired
datasetRDF dataset to convert.Labelrequired
out_formatTarget serialization.Stringrequired

User-facing RDF validation rules.

rdf_validate_test runs a SHACL shapes graph against an RDF dataset and fails the build if any violations are reported. Resolves through rdf_validator_toolchain_type so the actual SHACL engine is pluggable (rules_jena’s org.apache.jena.shacl.ShaclValidator, a future rules_pyshacl, …).

load("@rules_rdf//rdf:dataset.bzl", "rdf_dataset")
load("@rules_rdf//validate:defs.bzl", "rdf_validate_test")

rdf_dataset(name = "ontology", srcs = glob(["ontology/*.ttl"]))

rdf_validate_test(
    name = "ontology_conforms",
    dataset = ":ontology",
    shapes = "shapes.ttl",
)

ShEx support is in scope for v0.2 (the toolchain contract leaves room for it via the --shapes-language arg, but for v0.1 the shapes file is assumed Turtle-encoded SHACL).

rdf_validate_test

load("@rules_rdf//validate:defs.bzl", "rdf_validate_test")

rdf_validate_test(name, dataset, severity, shapes)

Validate an RDF dataset against a SHACL shapes graph.

ATTRIBUTES

NameDescriptionTypeMandatoryDefault
nameA unique name for this target.Namerequired
datasetAn rdf_dataset to validate.Labelrequired
severityMinimum severity that fails the build.Stringoptional"violation"
shapesSHACL shapes graph (Turtle).Labelrequired