Skip to content

DataGathering

Data Gathering feature is meant to support data synchronization between services. More specifically, between a service that owns the data (source) and a service that requires a customized view of that data (client). Following this premise, DataGathering feature is composed of two large modules:

  • DataGathering module, which is intended to be implemented on the client-side.
  • DataProviding module for the source-side.

The contact point between these two components is an event, this is, a DTO that will be produced after the required data and then served by the DataProviding module, and that should be understood and handled on the client side by the DataGathering module. Lets take a closer look at how each module should be configured.

DataProviding module

The DataProviding module is responsible for exposing whatever data we expect to be available for clients. First we need to depend on the DataProviding project and module from the service that actually owns the data we require.

XML
<ProjectReference Include="$(ModulesPath)ITsynch.Suite.DataGathering\ITsynch.Suite.DataGathering.csproj" />
C#
1
2
3
4
5
6
7
8
9
public class MyServiceApplicationModule : SuiteAspNetApplicationModule
{
    public override void SetupModule(IModuleBuilder builder)
    {
        base.SetupModule(builder);
        builder.DependsOn<DataProvidingModule>();

    }
}

This module will configure our service to act as the source of truth for the entities we decide to expose. The DataProvidingModuleOptions provide an API for exposing our entities.

C#
public class MyServiceApplicationModule : SuiteAspNetApplicationModule
{
    public override void SetupModule(IModuleBuilder builder)
    {
        base.SetupModule(builder);
        builder.DependsOn<DataProvidingModule, DataProvidingModuleOptions>(
            opts =>
            {
                opts.ExposeEntity<Entity, Event, IncludeSpecification>();
            });

    }
}

Lets take a closer look:

  • Entity (required) is the entity type that we are exposing.
  • Event (required) is the DTO through which we are exposing the entity.
  • IncludeSpecification (optional) lets us provide a specification that will be applied by the module when querying the database in case some relational data is required.

Important

An AutoMapper profile from the entity to the event type is also required for the feature to know how to map the data to the DTO. No further configuration is required but to define the profile, which will be auto-discovered.

Note

The DataProviding module handles exposed entities by sorting them according to its CorrelationId. This way we can ensure data is migrated in the same order it was originally created. This implies that the exposed entity is an ICorrelatedEntity and that sequential GUIDs are used.

DataGathering module

This is the side where most of the magic happens. Following the common steps for depending on a Suite Module:

XML
<ProjectReference Include="$(ModulesPath)ITsynch.Suite.DataGathering\ITsynch.Suite.DataGathering.csproj" />
C#
public class MyClientApplicationModule : SuiteAspNetApplicationModule
{
    public override void SetupModule(IModuleBuilder builder)
    {
        base.SetupModule(builder);
        builder.DependsOn<DataGatheringModule, DataGatheringModuleOptions>(
            opts =>
            {
                opts.SetDbContext<DbContext>();
            });
    }
}

The module includes a state machine that is capable of starting and handling data requests against the source based on the event type that we are requesting. For this reason we need to provide the DbContext type in the configuration.

We can now define which events we are expecting to be handled:

C#
public class MyClientApplicationModule : SuiteAspNetApplicationModule
{
    public override void SetupModule(IModuleBuilder builder)
    {
        base.SetupModule(builder);
        builder.DependsOn<DataGatheringModule, DataGatheringModuleOptions>(
            opts =>
            {
                opts.SetDbContext<DbContext>();
                opts.AddRequiredEventType<Event, EventConsumer>();
            });
    }
}

A Consumer needs to be determined as well. It can be any consumer capable of handling the specified event type. In most cases, the default consumer provided by the services client modules should be enough, but nothing stops you from providing your own, if it fits. The only requirement for it is to handle request-response for data-gathering coordination process:

C#
public class MyConsumer : IConsumer<MyEvent>
{
    public Task Consume(ConsumeContext<MyEvent> context)
    {
        // Your custom logic goes here.
        ...

        // Respond with the newly-created entity correlation id.
        return context.RespondConsumedAsync(correlationId);
    }
}

Additionally you can configure the batch size for the request. This will determine how many rows of data will be included per response once the process starts. We discourage modifying this value unless it is strictly required.

DataGathering process

By design, depending on the data-gathering module means you require to synchronize your data against the source of truth as soon as your application starts. This is specially useful for services that come alive at a late stage and need to catch-up on existing data. Once this process successfully finishes it wont be triggered again from service restarts, but only from manual activation.

Flat vs Hierarchical

In our current ecosystem we encounter several different entities, each with it's unique concepts and constraints, but most of them can be classified in two general cases: those that are arranged in a hierarchical setting (in most cases, via a parent-children relation), and those that do not hold any relation nor dependency with other entities. DataGathering module supports both cases and exposes a configurable API in consequence. Lets take a closer look on each one.

Flat

This is the simplest scenario, where no row of data holds any dependency with any other. For this case the algorithm will just order the data by insertion time and provide them with no constraints or filtering criteria but the configured batch size.

Hierarchical

Some entities might hold a "parent" relationship, ultimately arranging them in a tree-like hierarchy graph. In such cases, a more sophisticated migration strategy might be preferred, or even required, to ensure data consistency between services.

A hierarchical entity can be easily exposed for data-providing:

C#
public class MyServiceApplicationModule : SuiteAspNetApplicationModule
{
    public override void SetupModule(IModuleBuilder builder)
    {
        base.SetupModule(builder);
        builder.DependsOn<DataProvidingModule, DataProvidingModuleOptions>(
            opts =>
            {
                opts.ExposeHierarchicalEntity<Entity, Event, IncludeSpecification>(entity => entity.ParentId);
            });

    }
}

The DataProviding module takes advantage of the natural tree-like arrangement and slice the data in levels. Taking this structure in consideration, the process is as follows:

  • A saga will start the process orchestration, initially requesting all rows that do not have a defined parent, according to the parent id property that we specify when configuring the entity for the module.
  • For each level, the data-gathering process behaves like it was a flat-configured entity, since rows at the same level of the hierarchy do not hold any dependencies between them.
  • Once a level is completely migrated, the saga can safely proceed to the lower level and repeat the process, until no lower rows of data are found.

On client side, as we only are aware of the event we're listening to, we cannot infer (at least for the moment) the behavior used by the data-provider for providing each entity type. Hence, explicit configuration is required:

C#
public class MyClientApplicationModule : SuiteAspNetApplicationModule
{
    public override void SetupModule(IModuleBuilder builder)
    {
        base.SetupModule(builder);
        builder.DependsOn<DataGatheringModule, DataGatheringModuleOptions>(
            opts =>
            {
                opts.SetDbContext<DbContext>();
                opts.AddRequiredEventType<Event, EventConsumer>(strategy =  DataGatheringStrategy.Hierarchical);
            });
    }
}

The strategy to be used can be configured for each event type, with Flat being the default choice.

HealthChecks

DataGathering module contributes to the service readiness by default, while allowing us to configure which entities we want to be taken into account when executing the health check.

C#
public class MyClientApplicationModule : SuiteAspNetApplicationModule
{
    public override void SetupModule(IModuleBuilder builder)
    {
        base.SetupModule(builder);
        builder.DependsOn<DataGatheringModule, DataGatheringModuleOptions>(
            opts =>
            {
                opts.SetDbContext<DbContext>();
                opts.AddRequiredEventType<Event, EventConsumer>(includeInHealthChecks: false);
            });
    }
}

The readiness probe will only return ready when the last data-gathering process for every event marked for health-check has been completed successfully.