Skip to main content

This week at Dozer #3

· 4 min read
Isabel

Welcome to this week's update on dozer! We are excited to share with you the latest developments and progress that we have made. Here are the updates for this week.

Release v.0.1.13

Dozer v.0.1.13 is avaiable. Checkout the release notes here.

Insert and Update Conflict Resolution #1267

Now Dozer supports conflict resolution while writing data to the sinks. Depending on the type of data, developers can control app behavior. If consistency and accuracy is far more important vs speed and estimates.

endpoints:
- name: data_api
conflict_resolution:
# Options: nothing | update | panic
on_insert: update
# Options: nothing | upsert | panic
on_update: upsert
# Options: nothing | panic
on_delete: nothing

Parallelized Joins #1180

Performance improvement in the order of 4x to 5x.

We have simplified and optimized Join implementation which resulted in a significant peformance boost. In case of a single source of the query the Processor is simply bypassed, since any operation on the record is necessary on this case. In case of one or more JOIN operators in the SQL one Product Processor for each join is created and connected.

Eg:

SELECT  name, department.name as dep, salary
FROM user
JOIN department ON user.department_id = department.id
JOIN country ON user.country_id = country.id;

This query is converted to a pipeline:

Testing Strategy

Our focus has been introducing a number of test cases to increase the stability of Dozer.

Data type tests for connectors

Populate an external data source with all possible data types the connector supports, Dozer will automatically check if all conversion works without bug. Put the data populating code in DataReadyConnectorTest::new and you are done!

pub trait DataReadyConnectorTest: Send + Sized + 'static {
type Connector: Connector;

fn new() -> (Self, Self::Connector);
}

For example, local storage connector implements it like this:

pub struct LocalStorageObjectStoreConnectorTest {
_temp_dir: TempDir,
}

impl DataReadyConnectorTest for LocalStorageObjectStoreConnectorTest {
type Connector = ObjectStoreConnector<LocalStorage>;

fn new() -> (Self, Self::Connector) {
let record_batch = record_batch_with_all_supported_data_types();
let (temp_dir, connector) = create_connector("sample".to_string(), &record_batch);
(
Self {
_temp_dir: temp_dir,
},
connector,
)
}
}

Ingestion tests for connectors

Test if a connector ingests data as expected. Implement InsertOnlyConnectorTest (optionally CudConnectorTest) to test the most common connector methods used in Dozer. The test suite simulates a full run of Dozer to make sure the tested connector ingests and outputs data correctly.

For example, PostgresConnectorTest implmenets CudConnectorTest by executing sql against the postgres database.

impl CudConnectorTest for PostgresConnectorTest {
fn start_cud(&self, operations: Vec<Operation>) {
...
std::thread::spawn(move || {
for operation in operations {
client
.batch_execute(&operation_to_sql(
schema_name.as_deref(),
&table_name,
&operation,
&schema,
))
.unwrap();
}
});
}
}

As long as a connector passes this test suite, Dozer can guarantee data integrity using that connector. Local storage and postgres connector have passed the test.

Integration Tests for Dozer Samples

We've added an integration test for each of the samples, so they won't break unexpectedly! Sql integration tests #1282

Prop Tests

We have complemented our unit tests with a range of prop tests. Read more about prop tests here

We have included various data type tests using the following approach 1245

  proptest!(ProptestConfig::with_cases(1000), |(a in ".*", b in ".*")| {
// Tests
});

Other Improvements & Fixes

Local Storage test #1290 This PR adds the necessary mechanism for setting up a local storage connector in e2e tests, and adds a new e2e test according to dozer-samples.

DataReadyConnectorTest #1296
Postgres test #1299 Graceful Handling of grpc API errors #1289 Add ny taxi sample to e2e test #1263 Add postgres connector sample to e2e tests #1278

Changelog

https://github.com/getdozer/dozer/compare/v0.1.12...v0.1.13

Contact us