DataGathering¶
Data Gathering feature is meant to support data synchronization between services. More specifically, between a service that owns the data (source) and a service that requires a customized view of that data (client). Following this premise, DataGathering feature is composed of two large modules:
DataGathering
module, which is intended to be implemented on the client-side.DataProviding
module for the source-side.
The contact point between these two components is an event
, this is, a DTO that will be produced after the required data and then served by the DataProviding
module, and that should be understood and handled on the client side by the DataGathering
module. Lets take a closer look at how each module should be configured.
DataProviding module¶
The DataProviding
module is responsible for exposing whatever data we expect to be available for clients. First we need to depend on the DataProviding
project and module from the service that actually owns the data we require.
XML | |
---|---|
C# | |
---|---|
This module will configure our service to act as the source of truth for the entities we decide to expose. The DataProvidingModuleOptions
provide an API for exposing our entities.
Lets take a closer look:
Entity
(required) is the entity type that we are exposing.Event
(required) is the DTO through which we are exposing the entity.IncludeSpecification
(optional) lets us provide a specification that will be applied by the module when querying the database in case some relational data is required.
Important
An AutoMapper
profile from the entity to the event type is also required for the feature to know how to map the data to the DTO. No further configuration is required but to define the profile, which will be auto-discovered.
Note
The DataProviding
module handles exposed entities by sorting them according to its CorrelationId
. This way we can ensure data is migrated in the same order it was originally created.
This implies that the exposed entity is an ICorrelatedEntity
and that sequential GUIDs are used.
DataGathering module¶
This is the side where most of the magic happens. Following the common steps for depending on a Suite Module
:
XML | |
---|---|
C# | |
---|---|
The module includes a state machine that is capable of starting and handling data requests against the source based on the event type that we are requesting. For this reason we need to provide the DbContext
type in the configuration.
We can now define which events we are expecting to be handled:
A Consumer
needs to be determined as well. It can be any consumer capable of handling the specified event type. In most cases, the default consumer provided by the services client modules should be enough, but nothing stops you from providing your own, if it fits. The only requirement for it is to handle request-response for data-gathering coordination process:
C# | |
---|---|
Additionally you can configure the batch size
for the request. This will determine how many rows of data will be included per response once the process starts. We discourage modifying this value unless it is strictly required.
DataGathering process¶
By design, depending on the data-gathering module means you require to synchronize your data against the source of truth as soon as your application starts. This is specially useful for services that come alive at a late stage and need to catch-up on existing data. Once this process successfully finishes it wont be triggered again from service restarts, but only from manual activation.
Flat vs Hierarchical¶
In our current ecosystem we encounter several different entities, each with it's unique concepts and constraints, but most of them can be classified in two general cases: those that are arranged in a hierarchical setting (in most cases, via a parent-children relation), and those that do not hold any relation nor dependency with other entities. DataGathering module supports both cases and exposes a configurable API in consequence. Lets take a closer look on each one.
Flat¶
This is the simplest scenario, where no row of data holds any dependency with any other. For this case the algorithm will just order the data by insertion time and provide them with no constraints or filtering criteria but the configured batch size.
Hierarchical¶
Some entities might hold a "parent" relationship, ultimately arranging them in a tree-like hierarchy graph. In such cases, a more sophisticated migration strategy might be preferred, or even required, to ensure data consistency between services.
A hierarchical entity can be easily exposed for data-providing:
The DataProviding
module takes advantage of the natural tree-like arrangement and slice the data in levels. Taking this structure in consideration, the process is as follows:
- A saga will start the process orchestration, initially requesting all rows that do not have a defined parent, according to the parent id property that we specify when configuring the entity for the module.
- For each level, the data-gathering process behaves like it was a flat-configured entity, since rows at the same level of the hierarchy do not hold any dependencies between them.
- Once a level is completely migrated, the saga can safely proceed to the lower level and repeat the process, until no lower rows of data are found.
On client side, as we only are aware of the event we're listening to, we cannot infer (at least for the moment) the behavior used by the data-provider for providing each entity type. Hence, explicit configuration is required:
The strategy to be used can be configured for each event type, with Flat
being the default choice.
HealthChecks¶
DataGathering module contributes to the service readiness by default, while allowing us to configure which entities we want to be taken into account when executing the health check.
The readiness probe will only return ready
when the last data-gathering process for every event marked for health-check has been completed successfully.