Skip to main content

Breaking down microservices silos: Building real-time cohesive APIs with Dozer

· 9 min read
Matteo
Vivek

As a developer, you're no stranger to the challenges of creating a cohesive API from multiple microservices. But with Dozer, you can say goodbye to the complexity and hello to lightning-fast results. Dozer automates every step of the process, from building data pipelines to real-time data aggregation, indexing, and lightning-quick queries. With just a simple configuration, you can generate gRPC and REST APIs and integrate with your customer-facing products. Let's walk through the experience of the speed and ease of real-time API creation with Dozer.


Diagram

Introduction

Microservices architecture is a software design approach that focuses on breaking down large, monolithic applications into smaller, independent services that can be developed, deployed, and maintained independently. However, this approach can introduce new challenges, such as dealing with the complexity of distributed systems, ensuring consistency and reliability of data, and handling failures.

One solution to these challenges is the Saga pattern, which provides a way to manage distributed transactions in a microservices architecture. The Saga pattern is a design pattern that coordinates multiple services to complete a transaction, even when failures occur.

While the Saga pattern is a powerful tool for managing distributed transactions, it can introduce additional challenges when data from multiple services need to be brought together to create a cohesive API.

All configuration files and sample databases are avilable on our GitHub samples repo

Scenario

To start, let's consider an example of an airline booking website. The entire application is split across two main microservices:

  • A booking microservice: handling all the bookings, tickets and boarding passes
  • A flight master microservice: maintaining all the flights' master data including routes, aircraft, etc.

Each service maintains its own database:

  • BOOKINGS, TICKETS, TICKET_FLIGHTS tables are part of the booking microservice database
  • AIRPORTS, FLIGHTS are part of the flights' master microservice database.

Below is an ER diagram of our databases:

ER diagram

To improve the user experience we want to create a few new APIs:

  • Booking retrieval APIs: providing booking details for a specific passenger
  • Routes listing APIs: providing a list of all available routes

Being these APIs customer-facing, it is a requirement for the latency to be negligible and able to sustain a high throughput. For obvious reasons, this cannot be achieved by querying the Postgres database table every time. We could alternatively think about creating materialized views, but these would have to be refreshed every time a record is updated, which is also an unfeasible option.

Traditionally, these use cases are solved by introducing an intermediate caching layer such as Redis, but, even in this case, it is not trivial to keep the cache always fresh and to decide the correct eviction policy. On top of all this, we need to build a highly efficient and scalable API layer, ideally using protocols like gRPC.

In short, there is no easy solution and the implementation will require some non-trivial engineering effort.

How Dozer can help

The requirements above represent the perfect use case for Dozer. Dozer is a low code open source platform that allows you to automagically build blazing-fast gRPC and REST read-only APIs, with always up-to-date data ready to be integrated with frontend code, such as React. To keep the caching layer always up-to-date, Dozer relies on Postgres CDC and implements a SQL streaming engine that can be used to pre-process the data, while in transit.

Let's dig more into each of the APIs and how we can we build them with Dozer. The following is the list of APIs we want to implement:

PathCategoryNotes
GET /bookingsBookingsRetrives all bookings for a passenger
GET /bookings/detailsBookingsRetrieves detailed information about a booking
GET /routesRoutesRetrives a summary of all routes available

Configuring connections and sources

To get started we need to create a new dozer-config.yaml. The file has 4 main sections: connections, sources, sql and endpoints. Let's start defining two connections to our microservices databases:

app_name: flight-microservices
connections:
- config: !Postgres
user: postgres
password: postgres
host: 0.0.0.0
port: 5437
database: flights
name: bookings_conn
- config: !Postgres
user: postgres
password: postgres
host: 0.0.0.0
port: 5437
database: flights
name: flights_conn

Once connections are defined, we need to specify the source tables we will be fetching data from:

sources:
- name: bookings
table_name: bookings
columns:
connection: !Ref bookings_conn
- name: flights
table_name: flights
columns:
connection: !Ref flights_conn
- ...

Above are just a couple of source definitions, as an example. For the complete list, you can refer to the full configuration file here.

Configuring and building endpoints

Once sources are defined, we finally get to the interesting part of building APIs. Dozer configuration has a section endpoints where we can define the list of our endpoints. In our case we will have 3 definitions:

endpoints:
- name: bookings_details
path: /bookings/details
table_name: bookings_details
index:
primary_key:
- book_ref
- ticket_no
- flight_no
- name: bookings_summary
path: /bookings
table_name: bookings_summary
index:
primary_key:
- book_ref
- name: routes
path: /routes
table_name: routes
index:
primary_key:
- flight_no
- days_of_week

This is all you need to define new API endpoints. You are now probably wondering how sources and endpoints get connected. Let's dig deeper into the endpoint configuration: among all the parameters of each endpoint, we have a table_name property that defines the actual source of the data to be cached. This can either match the table_name property of a source (if we want to replicate in the cache the full source table), or the name of a SQL temporary table (explained below).

Preprocessing data using SQL

In our scenario, we want to pre-process data before inserting it into the cache. For this purpose Dozer provides a sql section where transformations can be expressed. All SQL SELECT statements must include an INTO clause, which will match the table-name property of an endpoint. Continuing on our sample here is what our sql section looks like:

sql: |
-- BOOKING DETAILS
select passenger_id, passenger_name,
b.book_ref, book_date, total_amount,
t.ticket_no, tf.flight_id, fare_conditions, tf.amount,
f.flight_no, f.scheduled_arrival, f.scheduled_departure,
f.departure_airport, f.arrival_airport, f.actual_arrival, f.actual_departure
into bookings_details
from bookings b
inner join tickets t on t.book_ref = b.book_ref
inner join ticket_flights tf on tf.ticket_no = t.ticket_no
inner join flights f on tf.flight_id = f.flight_id;
...

The SELECT statement is followed by an INTO clause, where the value is matching the cache name we want to build.

This is all you need to create low-latency REST and gRPC APIs using Dozer. All APIs come with a pre-embedded authorization layer, to allow direct integration with your frontend application. We will cover this topic in more detail in another upcoming article.

Keeping the cache in sync

Because Dozer directly interfaces with Postgres CDC, no special action is needed to keep the cache up to date. Whenever any UPDATE, INSERT or DELETE operation happens in Postgres, changes are automatically pre-processed based on the SQL query and results are automatically propagated to the cache in a matter of milliseconds.

Dozer is built entirely in RUST to give you low data and query latencies.

Querying the data

It is straightforward to query Dozer using gRPC or REST. The gRPC endpoint also features reflection, which is particularly useful to enforce schemas and discover services. Let's retrieve, for example, all the booking details from a passenger_id using the grpcurl cli tool:

List Services

grpcurl -plaintext localhost:50051 list
dozer.common.CommonGrpcService
dozer.generated.bookings_details.BookingsDetails
dozer.generated.bookings_summary.BookingsSummaries
dozer.generated.routes.Routes
dozer.health.HealthGrpcService
grpc.reflection.v1alpha.ServerReflection

Query an Endpoint with filters

grpcurl  -d '{"query":"{\"$filter\": {\"passenger_id\": \"3986 620108\"}}"}' \
-plaintext localhost:50051 \
dozer.generated.bookings_details.BookingsDetails/query
{
"records": [
{
"id": "3682",
"record": {
"passengerId": "3986 620108",
"passengerName": "IGOR KARPOV",
"bookRef": "0002E0",
"bookDate": "2017-07-11T13:09:00Z",
"totalAmount": {
"lo": 8960000
},
"ticketNo": "0005434407173",
"flightId": "26920",
"fareConditions": "Economy",
"amount": {
"lo": 1640000
},
"flightNo": "PG0678",
"scheduledArrival": "2017-08-01T13:45:00Z",
"scheduledDeparture": "2017-08-01T11:30:00Z",
"departureAirport": "MCX",
"arrivalAirport": "SVO",
"actualArrival": "2017-08-01T13:51:00Z",
"actualDeparture": "2017-08-01T11:33:00Z",
"DozerRecordVersion": 1
}
},
...
]
}

Since the entire object has already been pre-computed and stored in a low-latency hybrid (memory + disk) cache, response times are instantaneous.

Under the hood

We have so far explained how easy is to get started with Dozer, but let's dig more under the hood and explain more about the architecture of Dozer and what is happening behind the scenes.

In order to process data in real-time, Dozer transforms all the queries in dozer-config.yaml into a DAG (Directed Acyclic Graph). The DAG defines the streaming execution of the query where each node is a source, a processor or a sink. Below is, for instance, the generated DAG for the BOOKING DETAILS query:

ER diagram

You will notice we have two connectors (bookings_conn and flights_conn), each one having multiple output ports, corresponding to a database table. All the connections enter a product node, responsible for executing the join operation, followed by a projection node, which will execute the field selection.

Upon startup, Dozer connects to the source databases, optionally takes a snapshot of the database tables and starts listening to CDC events. These events are then pumped into the DAG.

Each node is fundamentally a processor capable of handling 3 types of operations: INSERTs, DELETEs and UPDATEs. Whenever CDC messages are received, each processor updates its internal state and stops, propagates or generates new messages for the downstream nodes. This flow of events is eventually propagated to the cache so that its state is always synchronized with the sources.

If you are interested in learning more about the internals of Dozer, feel free to check out our GitHub repo here. And please do remember to ⭐️ us and share with your peers 🙏! Feel free to drop by our Discord channel and share your feedback.

If you would like to try out this sample, please follow the link to our GitHub samples repo.