Skip to main content

Real-Time Data Integration: Ingesting Data From MySQL and Powering APIs with Dozer

· 7 min read
Aarya

In this tutorial, we'll walk you through configuring Dozer to seamlessly ingest real-time MySQL data and make it accessible via queryable APIs. This comprehensive guide covers everything you need, from environment setup to querying data from MySQL using Dozer. While our demonstration employs the Chinook database, rest assured, these principles can be universally applied to any MySQL database.

Setting up and configuring Dozer is a breeze, as we'll illustrate in this blog post. Furthermore, we'll introduce you to the convenience of Dozer Live, a tool that simplifies pipeline visualization and API execution, all in one integrated platform. Join us on this journey towards harnessing the power of Dozer for real-time data transformation and accessibility.

What is Dozer?

Dozer is a powerful open-source data API backend that simplifies the process of ingesting realtime data from various sources and serving it through REST and gRPC endpoints. It supports a wide range of data sources, including databases, and object storage services etc. You can learn more about Dozer from its official documentation and its GitHub repository.

Why use Dozer?

Dozer is designed to simplify the process of ingesting data from various sources and serving it through APIs.Through one single configuration file, it is possible to create Real-time APIs that perform complex data transformations. For this example we will be using the Dozer MySQL connector which allows us to continously monitor and ingest data from MySQL databases, and serve it through APIs. With a streaming SQL engine and cache fully built in Rust, APIs can be built easily which update not in hours or minutes but in seconds!

Prerequisites

Before we begin, make sure you have the following:

  • A MySQL server configured on your local machine.
  • Dozer installed. You can install Dozer using the following command: cargo install --git https://github.com/getdozer/dozer dozer-cli --locked. For more installation instructions, visit the Dozer documentation.
  • Basic knowledge of SQL for writing data transformation queries.

Project Structure

You can find a sample on MySQL connector in the dozer-samples repository, complete with instructions on running it. In this tutorial, we'll build upon the capabilities demonstrated in that sample.

Our project consists of the following:

  • The Chinook database which is our primary data source.
  • A Dozer configuration file (YAML) that defines the data sources, transformations, and APIs.

Step 1: Setup Chinook Database on your MySQL server

If you don't have the database ready, you can download the SQL script from here. The script initialises the database for us. Execute the following command in your terminal.

mysql -u <username> -p < Chinook_MySql.sql

Verify that Chinook database has been created.

Understanding the Dataset

The Chinook data model simulates a digital media store and encompasses various tables for managing information related to artists, albums, media tracks, invoices, and customers. The media-related data has been sourced from actual iTunes Library data.

For customer and employee details, fabricated information has been thoughtfully crafted, including fictitious names, addresses that can be located on Google Maps, and meticulously formatted contact information such as phone numbers, fax numbers, and email addresses.

Sales information is automatically generated with random data spanning a four-year period, adding to the realism and comprehensiveness of the database.

Below, you'll find an entity-relationship diagram illustrating the structure of this database.

Chinook

Step 2: Configure Dozer

The next step is to configure Dozer. This involves creating a YAML configuration file that defines the data sources, transformations, and APIs.

Dozer uses a YAML configuration file to specify the data sources, the tables to ingest, the SQL queries to run on the ingested data, and the endpoints to serve the results.

Here's an example of a Dozer configuration file:

version: 1
app_name: mysql-chinook-sample
home_dir: ./.dozer
cache_dir: ./.dozer/cache
connections:
- config: !MySQL
url: mysql://<username>:<password>@<host>:<port>/Chinook
name: chinook_mysql
sources:
- name: invoice
table_name: Invoice
columns:
- InvoiceId
- CustomerId
- InvoiceDate
- Total
connection: chinook_mysql

- name: customer
table_name: Customer
connection: chinook_mysql
- name: InvoiceLine
table_name: InvoiceLine
connection: chinook_mysql

- name: Track
table_name: Track
connection: chinook_mysql

- name: Genre
table_name: Genre
connection: chinook_mysql

sql:

SELECT C.CustomerId AS customer_id, C.FirstName as Name,
SUM(I.Total) AS Money_Spent
INTO money_spent
FROM customer C
JOIN invoice I ON C.CustomerId = I.CustomerId
GROUP BY 1;

SELECT DISTINCT C.Email as email,
C.FirstName as firstname,
C.LastName as lastname,
G.Name as Genre
INTO rock_listeners
FROM customer C
JOIN invoice I ON C.CustomerId = I.CustomerId
JOIN InvoiceLine IL ON IL.InvoiceLineId = I.InvoiceId
JOIN Track T ON IL.TrackId = T.TrackId
JOIN Genre G ON T.GenreId = G.GenreId
WHERE G.Name = 'Rock';

endpoints:

- name: money_spent
path: /money_spent
table_name: money_spent

- name: rock_listeners
path: /rock_listeners
table_name: rock_listeners

In this configuration file, we establish a connection to our MySQL server effortlessly using Dozer. All that's required is specifying a single URI-like string. Additionally, we define a set of SQL queries designed to operate on the ingested data and specify the endpoints where the query results will be accessible.

Step 3: Running Dozer

Once the configuration file is set up, we can start Dozer by running the following command in the terminal:

dozer run

Running this command spins up Dozer, initiating the process of data ingestion from the designated MySQL database and populating the cache. Logs during execution can be observed in the console. The outcomes of the SQL queries will subsequently be accessible at the designated endpoints.

Step 4: Querying the Dozer APIs

Dozer generates automatic REST and gRPC APIs based on the endpoint configuration provided in the dozer config. We can now query the Dozer endpoints to get the results of our SQL queries. You can query the cache using gRPC - both typed and common service or REST. For example, to get the list of rock listeners, we can send a POST request to the /rock_listeners endpoint .

gRPC grpc-typed

REST

rest

This will return the list of rock listeners.

Dozer also supports other methods like count , and filters like $order_by and $limit in the query. To know more about querying Dozer APIs, check out the Dozer documentation.

Streamlining the Process with Dozer Live

In the second segment of this tutorial example, we will illustrate how Dozer Live can be utilized for visualizing data pipelines, debugging them, and executing API requests, all conveniently within a unified playground.

All you need to do is run the following command in your terminal:

dozer live

This will open up the Dozer Live playground in your browser. You can now start ingesting data and querying it.

dozer-live

Here you can visualise all the data sources, transformations, and endpoints in a single view. You can also execute the API requests and view the results in the same view. This makes it easy to debug your data pipelines and APIs without using any third party tools.

Real Time Data Ingestion

Dozer, being designed for real-time data ingestion, will automatically detect the newly added data in the database and start ingesting it. This means you don't need to change any configuration if you have recurring data to process. You can view the real time data ingestion in the Dozer Live through the OnEvent method.

To simulate a real-time data ingestion scenario, we setup a MySQL event which inserted 3 rows into the Artist table.

This illustrates how data becomes instantly accessible to APIs, ensuring that any queries made reflect the most up-to-date information.

onevent

It's important to note that the ingestion time can fluctuate, influenced by factors like record size, quantity, data intricacy, and Dozer's available resources. Nonetheless, the remarkable feature of Dozer lies in its capacity to enable real-time querying of data as it's being ingested, enabling you to gain valuable insights nearly instantaneously.

Conclusion

Clearly, Dozer simplifies the task of seamlessly ingesting real-time data from MySQL databases and transforming it into accessible APIs for querying. With minimal setup, it facilitates the connection of diverse data sources, their real-time integration, and the immediate availability of low-latency data APIs. This characteristic positions Dozer as a formidable instrument for the swift development of data-driven products.

For more information and examples, check out the Dozer GitHub repository and dozer-samples repository.

Stay tuned for more captivating articles coming your way!