Skip to main content

Powering E-Commerce Sales in Real Time with Dozer and Supabase

· 12 min read
Aarya

In the dynamic world of modern application development, data-driven insights and real-time processing have become pivotal to building successful and competitive solutions. This is where Dozer, an open-source data API backend, emerges as a powerful tool. Dozer simplifies the intricate task of ingesting real-time data from a diverse range of sources and seamlessly delivers it through REST and gRPC endpoints. It offers a robust streaming SQL engine that enables real-time data transformations, empowering developers and analysts to derive valuable insights on the fly.

Dozer's support for a wide array of data sources, from databases to object storage services, makes it a versatile solution for organizations looking to harness the potential of their data. With Dozer, you can streamline data processing, debug and test queries with its built-in Dozer Live playground, and ensure your cache is always up-to-date with its comprehensive features.

This article explores how Dozer empowers stream processing, its core functionalities, and its potential to reshape data-driven application development. We'll also dive into a practical use case of building a real-time e-commerce pipeline with Dozer, showcasing its capabilities in action.

What is Dozer?

Dozer stands as a robust open-source data API backend, streamlining the intricate task of ingesting real-time data from diverse sources and delivering it effortlessly through REST and gRPC endpoints. Its versatility is evident in its support for a vast array of data sources, ranging from databases to object storage services and beyond. Furthermore, Dozer simplifies the development process with its built-in Dozer Live playground, accessible through the CLI. This feature serves as a valuable tool, aiding developers in seamless debugging and application development.You can learn more about Dozer from its official documentation and its GitHub repository.

Why Supabase?

Supabase is a powerful and open-source platform for modern application development. It provides a comprehensive suite of backend services, including a PostgreSQL-based database, authentication, real-time data synchronization, and serverless functions. Developers can use Supabase to streamline the creation and scaling of web and mobile applications, all while benefiting from its real-time capabilities and the flexibility to customize and extend its functionality. Supabase simplifies the development process and empowers developers to build innovative, collaborative, and data-driven applications with ease. OLTP databases are optimized for handling numerous short, read and write transactions, making them ideal for applications that require real-time data processing, such as e-commerce systems, financial applications, and user management systems. Supabase, by using PostgreSQL, inherits these OLTP capabilities, making it a robust choice for applications with high transactional workloads. You can learn more about Supabase from its official documentation.

How Dozer empowers stream processing ?

Built-in Streaming SQL Engine:

Dozer features its own streaming SQL engine, enabling users to perform real-time data transformations on the data ingested from various sources. This capability is valuable for developers and analysts as it allows them to generate valuable insights from the data in real-time. The SQL engine supports a wide range of standard SQL functions, including aggregations, joins, and window functions. The output of DozerSQL is a stream of data that is ready to be inserted into a cache, in order to increase performance and scalability of the data pipeline. DozerSQL works in real-time and only SELECT statement are supported, with the goal of providing optimized data preparation step prior to be inserted into cache.

Time Window Functionalities:

Effective management of time windows is a pivotal aspect of any streaming system, and Dozer excels in this regard by offering a versatile range of time window functions, including TUMBLE, HOP, and TTL. These functions empower users to perform recurring aggregations on streaming data, transforming raw data into valuable insights.

  • TUMBLE(): TUMBLE is instrumental in creating distinct and non-overlapping time windows for streaming data. Each window has a predetermined duration, and data within each window is aggregated. This feature proves invaluable when conducting repetitive aggregations or computations on streaming data.

  • HOP(): HOP, on the other hand, establishes overlapping time windows for streaming data. Each window has both a defined duration and a specific slide, which dictates how frequently windows are generated. Data within these windows is aggregated, making HOP a powerful tool for advanced data processing.

  • TTL(): Time-To-Live (TTL) is a function that grants users the ability to set a lifespan for their data. Data that surpasses the specified time limit is automatically purged from the cache. This feature is particularly beneficial when managing continuous streams of data, ensuring efficient memory utilization and streamlined data maintenance.

Keeping the Cache Fresh:

Dozer employs two methods to keep the cache consistently up-to-date:

Change Data Capture (CDC):

In cases where data sources support CDC, such as Postgres, Dozer utilizes this feature to actively monitor the data source for any alterations, including updates, insertions, and deletions. It then promptly updates the cache to reflect these changes. This approach guarantees that the cache always contains the latest and most accurate data.

Polling:

For other data sources, such as object stores and Snowflake, Dozer employs a polling mechanism. At regular intervals, Dozer checks these data sources to identify any modifications to the data. If any changes are detected, Dozer promptly updates the cache to ensure that the data remains current and up-to-date.

These methods collectively ensure that the cache remains synchronized with the underlying data sources, making certain that the data is consistently fresh and accurate for use in real-time applications.

Building a Real Time E-Commerce Pipeline with Dozer

DataFlow

Let's explore an e-commerce scenario where Supabase serves as an OLTP database for real-time sales transaction recording. We'll harness Dozer to seamlessly ingest data from Supabase. Then, using DozerSQL, we'll conduct real-time aggregations on this data and store the outcomes in the cache. To complete the picture, we'll expose these results through gRPC/REST APIs, enabling other services within our e-commerce ecosystem to drive sales and make informed decisions based on the transformed data.

In essence, Dozer emerges as the all-in-one data infrastructure backbone for your business needs. There's no need for separate source and sink databases; Dozer provides direct APIs for data querying. It functions as a unified data access layer, simplifying and streamlining your data processing requirements across the board.

For production environments, Dozer Cloud comes into play as a fully managed, hosted experience provided by the Dozer team. It eliminates the need to worry about infrastructure concerns and allows you to build robust real-time data pipelines effortlessly. To embark on your journey with Dozer Cloud, you can explore detailed guidance in the official documentation.

Prerequisites

Before we begin, make sure you have the following:

  • A Supabase account.
  • Dozer installed. For installation instructions, visit the Dozer documentation.
  • Basic knowledge of SQL for writing data transformation queries.

Understanding The Data

To simulate sales transactions, we'll be utilizing a sample sales dataset with the following schema: Schema

Our e-commerce system stores transaction data in a Postgres database, which is hosted on the Supabase platform. To establish the connection between Dozer and Supabase for data ingestion, you can include the following configuration details in your Dozer configuration file:

connections:
- config: !Postgres
user: {{your-supabase-user}}
password: {{your-password}}
host: {{your-supabase-host-url}}}
port: 5432
database: {{database_name}}
sslmode: null
connection_url: null
name: postgres

sources:
- name: sales_data
table_name: {{TABLE_NAME}}
connection: postgres

Replace "your-supabase-host-url", "your-password" , "your-supabase-user" and "database_name" with your specific Supabase connection details. Replace TABLE_NAME with your actual table that you wish to reference. These configurations will allow Dozer to establish a real time connection to your Postgres DB hosted on Supabase and ingest data.

Notice how effortlessly Dozer simplifies the process of adding data sources. Adding an additional data source, is as simple as adding another similar connection block to the configuration file. Dozer supports a wide range of data sources, including databases, object storage services, and more. You can learn more about the supported data sources from the official documentation.

Adding Transformations

Now that we have setup the data source, we can add transformations to generate useful data.

For this article, we will be exploring the following scenarios:

Customer Lifetime Value

Customer Lifetime Value (CLV) is a key metric that businesses can leverage to boost sales, enhance personalization, and improve recommendations. By understanding the long-term value of each customer, companies can tailor marketing campaigns, offer personalized incentives, and optimize product recommendations, encouraging repeat purchases and higher spending. High CLV customers can be targeted with exclusive offers and loyalty programs, fostering brand loyalty and increasing revenue. Additionally, CLV insights enable businesses to allocate resources effectively, adjust pricing strategies, and gather valuable feedback, ultimately driving more sales and ensuring a superior customer experience.

Best Selling Products

Identifying Best-Selling Products is a fundamental strategy for businesses to maximize sales and profitability. By analyzing sales data and customer preferences, companies can pinpoint their most popular items and strategically promote them. This knowledge allows for focused marketing campaigns and personalized recommendations, driving higher sales as customers are more likely to purchase these well-received products. It also aids in optimizing inventory management, ensuring that high-demand items are consistently available, reducing stockouts, and increasing revenue.

Best Selling Products In Each Country

By analyzing sales data on a regional basis, businesses can tailor their product offerings to match local preferences and trends. This enables them to optimize their inventory and marketing strategies to meet the specific demands of each area. It also empowers them to provide localized recommendations, advertising, and promotions, resulting in increased sales and improved customer engagement.

Daily Sales Data

As mentioned above, Dozer provides time window functions that enable users to perform recurring aggregations on streaming data. Dozer also provides a TTL function which enables developers to keep their memory in check, when dealing with infinite streams of data. This feature is particularly useful for generating daily sales data, which can be leveraged to identify trends and patterns, optimize inventory management, and enhance marketing strategies. By analyzing daily sales data, businesses can identify their peak sales periods and adjust their inventory accordingly. They can also tailor their marketing campaigns to target customers during these peak periods, driving sales and increasing revenue.

By including the following sql and endpoint blocks in the configuration file, we can add these transformations to our pipeline:

sql: |

SELECT CustomerID, SUM(Quantity*UnitPrice) AS LifetimeValue
INTO customer_lifetime_value
FROM sales_data
GROUP BY CustomerID;

SELECT StockCode, SUM(Quantity) AS BuyCount
INTO bestsellers
FROM sales_data
GROUP BY StockCode;

SELECT Country, SUM(Quantity * UnitPrice) AS TotalSales
INTO sales_by_country
FROM sales_data
GROUP BY Country;

SELECT SUM(t.Quantity * t.UnitPrice) as total_sales, t.window_start as start, t.window_end AS end
INTO daily_sales
FROM TTL(TUMBLE(sales_data, InvoiceDate, '1 DAY'), InvoiceDate, '1 DAY') t;

endpoints:
- name: customer_lifetime_value
path: /clv
table_name: customer_lifetime_value

- name: bestsellers
path: /bestsellers
table_name: bestsellers

- name: sales_by_country
path: /sales_by_country
table_name: sales_by_country

- name: daily_sales
path: /daily_sales
table_name: daily_sales

In addition, you can also explore the following usecases:

  1. Customer Segmentation: Directly classify customers into different segments based on their spending patterns.
  1. Customer Churn Prediction: Attach a ML model using the inbuilt support for ONNX UDFs to predict which customers are likely to churn, enabling businesses to take proactive measures to retain them.

  2. Adding more data sources: One of the most powerful features of Dozer is its ability to ingest data from multiple sources and merge them together. You can add more data sources to get more actionable insights. For example, you can add a data source for product reviews and ratings, and merge it with the sales data to get a better understanding of the customer preferences.

Running Dozer

Once the configuration file is set up, we can start Dozer by running the following command in the terminal:

dozer run

This will start Dozer and it will begin ingesting data from our data source and populating the cache. Alternatively, you can also start Dozer Live to see the ingestion progress in real-time. Dozer Live also enables you to visualise the transformations you have put in place, allowing you to debug and test your queries. You can start Dozer Live by running the following command in the terminal:

dozer live

Run

Querying the Dozer APIs

Dozer generates automatic REST and gRPC APIs based on the endpoint configuration provided. We can now query the Dozer endpoints to get the results of our SQL queries. You can query the cache using gRPC (using typed/common service) or REST.

By default, the gRPC service starts on port 50051 and the REST service starts on port 8080. You can change these ports in the configuration file.

Dozer APIs also support filters for pagination, sorting, and filtering. You can learn more about these filters from the official documentation.

Dozer also provides client libraries in Python and React making it incredibly easy to connect to the Dozer APIs.

Conclusion

Dozer, with its innovative features and seamless data processing capabilities, stands as a powerful tool for modern application development. Its built-in streaming SQL engine enables real-time data transformations, allowing organizations to extract valuable insights from their data as it flows in from various sources.

The versatile time window functionalities offered by Dozer, provide developers with the means to effectively manage streaming data and perform recurring aggregations. This empowers organizations to make data-driven decisions and enhance their applications in real-time.

As demonstrated in the e-commerce pipeline use case, Dozer can seamlessly integrate with various data sources, making it a valuable asset in building data-driven applications. Its ability to generate APIs for querying data, coupled with client libraries in Python and React, simplifies the process of extracting and utilizing insights.

Dozer, coupled with platforms like Supabase, opens up new horizons for businesses seeking to harness the potential of real-time data. In a fast-paced digital landscape, the ability to ingest, transform, and leverage data on the fly can be a game-changer, and Dozer serves as a powerful tool on that journey.