Skip to main content

Architecture

Dozer takes an opinionated and horizontal approach and cuts across different categories. In Dozer, you would find modules and functionality comparable to streaming databases, caches, search engines and API generation tools.

architecture

Key Entities

Connections

A Connection describes one connection to each data store. One Connection can have multiple sources. Typically you describe the connection details and credentials within the configuration section.

Connectors are implemented in dozer-ingestion module.

Sources

Each Source essentially describes one unique table with a name and schema.

Endpoints

Each Endpoint describes one API Endpoint that will be deployed when Dozer is running. You can find the configuration reference here

Every Endpoint attaches REST and gRPC API routes on a Cache Reader instance. Every endpoint also creates a Sink in the pipeline where a Cache Writer is initialized.

Pipeline

Dozer instantiates a data pipeline which is essentially a DAG. The pipeline contains sources, processors and sinks.

  • Every source explained above acts as a pipeline source.
  • SQL is transformed into a collection of several processors.
  • A Sink is initialized for each Endpoint.

Pipeline and DAG construction is implemented under dozer-core.

Cache

The cache interface exposes methods to insert, update, delete and query records. The cache also creates secondary and full-text indexes for fast lookups and queries. Cache Writer is initialized within a Sink and data gets processed and committed in bulk as part of the pipeline. Both Rest and gRPC API Servers initialize Cache Readers and interact with data stored in the storage layer.

Cache Reader has also support for Authorization based on properties.

This is implemented under dozer-cache.

Authorization

JWT Tokens can be initialized using APIs that have narrowed down permissions to access data. This could be per Endpoint or even based on Document Properties.