Working with databases¶

Due to the nature of the DataSync project, which is in charge of manipulating the data of any ITsynch application that uses replication, we need to abstract a way to achieve such manipulation without coupling to the technology that those applications use for their persistence.
This of course also comes with the impossibility of knowing details about the models these applications use, and how they are mapped to their respective tables in the database, preventing the use of any ORM for us.
In addition, the no use of an ORM also leads to the need to be careful concerning the database's safety, since if done incorrectly, we could end being victims of SQL injection attacks.

Multiple DBMS support¶

These impediments lead us to work entirely with ADO.Net abstractions, allowing us to achieve that underlying database engine decoupling we need in our replication engine.

DataAccessService¶

This service is the core of our replication engine, whose responsibility is to execute the rules as commands/queries to the database of the applications. For this purpose, we make use of multiple abstractions, making it so that we do not have to worry about the underlying DBMS, but using the best tools that ADO.Net can bring us nonetheless.

C#
class DataAccessService : IDataAccessService
{
    private readonly IDbConnectionProvider dbConnectionProvider;
    private readonly IDbOperationsProvider dbOperationProvider;
    private readonly ILogger<DataAccessService> logger;

    public DataAccessService(
        IDbConnectionProvider dbConnectionProvider,
        IDbOperationsProvider dbOperationProvider;
        ILogger<DataAccessService> logger)
    {
        this.dbConnectionProvider = dbConnectionProvider;
        this.dbOperationProvider = dbOperationProvider;
        this.logger = logger;
    }

    /// [...]
}

Abstractions¶

DataSync defines a series of abstractions that serve the purpose of providing us with control over things that are hopelessly dependent on the underlying DBMS. The implementations are defined in different modules for each specific DBMS. To use them, it is only necessary to add a dependency on the module you need in the DataSyncApplicationModule:

DBMS	Module	Support
SQL Server	DataSyncSqlServerModule	For now, it only supports MS SQL Server (2016 and onwards) and Azure.
Oracle	[Not implemented yet]

IDbOperationsProvider¶

The IDbOperationsProvider abstraction is the main service whose responsibility is to bring "operation handlers" following the CQRS pattern:

C#
/// <summary>
/// Provides handlers to perform queries and operations to the underlying
/// database engine, following CQRS concerns.
/// </summary>
public interface IDbOperationsProvider
{
    /// <summary>
    /// Gets a <see cref="IDbQueryHandler"/> implementation to perform QUERIES
    /// to the database, based on the provided <paramref name="statement"/> for the
    /// provided <paramref name="connectionString"/> and <paramref name="context"/>.
    /// </summary>
    /// <param name="connectionString">Connection string to connect to the database.</param>
    /// <param name="statement"><see cref="SelectStatementParameters"/> from which to create the query.</param>
    /// <param name="context"><see cref="ReplicationContext"/> to use in the query.</param>
    /// <returns>A <see cref="IDbQueryHandler"/>.</returns>
    IDbQueryHandler GetQuery(
        string connectionString,
        SelectStatementParameters statement,
        ReplicationContext context);

    /// <summary>
    /// Gets a <see cref="IDbCommandHandler"/> implementation to perform OPERATIONS
    /// to the database, based on the provided <paramref name="statement"/> for the
    /// provided <paramref name="connectionString"/> and <paramref name="context"/>.
    /// </summary>
    /// <param name="connectionString">Connection string to connect to the database.</param>
    /// <param name="statement"><see cref="IDBStatement"/> from which to create the command.</param>
    /// <param name="table"><see cref="Table"/> that the command operates onto.</param>
    /// <param name="context"><see cref="ReplicationContext"/> to use in the operation.</param>
    /// <param name="optionsOverride">(Optional) <see cref="DbCommandOptions"/> to configure the operation behavior.</param>
    /// <returns>A <see cref="IDbQueryHandler"/>.</returns>
    /// <returns>A <see cref="IDbCommandHandler"/>.</returns>
    IDbCommandHandler GetCommand(
        string connectionString,
        IDBStatement statement,
        Table table,
        ReplicationContext context,
        DbCommandOptions? optionsOverride = null);
}

These handlers follow the command pattern and allow to perform a query or command to the database engine. By default, these handlers use DbCommand and DbParameter classes to execute their logic. In the case of the ICommandHandler it also may use another ADO.Net tool to perform bulk operations.

IDbConnectionProvider¶

The IDbConnectionProvider abstraction is a service in charge of providing the type of DbConnection needed, based on the underlying database engine (Oracle or SQLServer), using the connection string provided.

C#
/// <summary>
/// Provides connections to databases.
/// </summary>
interface IDbConnectionProvider
{
    /// <summary>
    /// Creates a connection to a database based on the provided connection string.
    /// </summary>
    /// <param name="connectionString">Connection string used to connect.</param>
    /// <returns>A <see cref="DbConnection"/>.</returns>
    DbConnection CreateDbConnection(string connectionString);
}

IDbSyntaxProvider¶

The IDbSyntaxProvider abstraction, on the other hand, is in charge of building the SQL statements of the DbCommand that will be executed against the database. It also provides any other DBMS' syntax-dependent logic that could be needed, like handling the format of the placeholder's names used in the params (as DbParameters) of the DbCommand (explained more below).

C#
/// <summary>
/// Provides SQL syntax-dependent statements and logic for different operations for the underlying database engine.
/// </summary>
public interface IDbSyntaxProvider
{
    public string GetSelectStatement(SelectParameters parameters);

    public string GetUpsertStatement(UpsertParameters parameters);

    public string FormatPlaceholder(string placeholderName);
}

Security¶

Since we are talking about executing commands directly against the database without the security that we would get using ORMs, it is necessary to take measures to avoid SQL injection attacks that may surge from the information submitted by the users of the applications.

To prevent this the DbCommandExecutor uses DbParameters in the DbCommand it executes, whose statements generated by the IDbSyntaxProvider uses only placeholders and not the values directly:

C#
private readonly IDbSyntaxProvider dbSyntaxProvider;

// [...]

public async Task<TableContent> UpsertAsync(...)
{
    //[...]
    var upsertParams = new UpsertParameters
    {
        TableName = table,
        ComparisonColumn = comparisonColumn,
        Columns = columnNames,
    };

    // Note how the statement is generated without knowing any value
    // it needs to put in the columns either in the INSERT or UPDATE case.
    var sql = this.dbSyntaxProvider.GetUpsertStatement(upsertParams);

    using var dbCommand = connection.CreateCommand();

    dbCommand.CommandText = sql;

    // Some generic placeholders are added for the use.
    this.AddContextualParameter(dbCommand, PlaceholderNames.TableName, table);
    this.AddContextualParameter(dbCommand, PlaceholderNames.ComparisonColumn, comparisonColumn);
    // Then, all the (user-input) values are transformed into DbParameter.
    this.AddContextualParameters(dbCommand, [content to insert/update]);
}

private void AddContextualParameter(DbCommand command, string name, object value)
{
    var parameter = command.CreateParameter();

    // Since the syntax of the placeholder name varies depending on the database engine
    // the name is formatted by the IDbSyntaxProvider.
    string parameterName = this.dbSyntaxProvider.FormatPlaceholder(name);

    parameter.ParameterName = parameterName;
    parameter.Value = value;

    command.Parameters.Add(parameter);
}

This practice is known as "parametrized query" and is the first and most important step to prevent SQL injection. In a nutshell, this makes the underlying database engine interpret the user-input values as literal ones instead of executable code, preventing any malicious code hidden in some column's value from being executed.

Connection management¶

DataSync needs to handle the connections to the database in a multi-threaded way to take advantage of asynchronous programming, involving the need of using these the best way possible.
There are two main approaches about how to handle the DbConnection usage in multi-threaded applications with ADO.Net, each one having its advantages and disadvantages.

In DataSync we follow the approach of having a DbConnection in each thread: We leave the responsibility of creating, using, and disposing of the connection to the logic executed on each thread inside the DbCommandExecutor service.

The reasons to do so are:

Ensures the correct handling of IDisposables, like the DbConnection, DbTransaction, etc.
It ease the tests, leaving all these ADO.Net low-level abstractions which are problematic to mock in just one place.
Better performance, since it makes full use of ADO.Net connection pooling feature.

Transaction management¶

At the time being, DataSync doesn't provide a way to configure when a transaction should be committed.
Currently, the strategy followed is that every SQL command for an insert/update of a certain row is executed to the database on its own transaction, the reason being that it is needed to know exactly what went wrong with that row if an exception arises.

Table management¶

Primary key columns¶

Currently DataSync only works with applications whose tables only have a single primary key column. This PK is used in:

Content repositories: as an index to navigate and find rows.
Auto-Healing requests: as an identifier to trace a tuple's flow, and to obtain the rows that must be re-sended.
SQL statements: as a filter when executing a statement against specific rows, such as in stamp rules.