Working with databases¶
Due to the nature of the DataSync project, which is in charge of manipulating
the data of any ITsynch application that uses replication, we need to abstract a
way to achieve such manipulation without coupling to the technology that those
applications use for their persistence.
This of course also comes with the impossibility of knowing details about the
models these applications use, and how they are mapped to their respective
tables in the database, preventing the use of any ORM for us.
In addition, the no use of an ORM also leads to the need to be careful
concerning the database's safety, since if done incorrectly, we could end being
victims of SQL injection attacks.
Multiple DBMS support¶
These impediments lead us to work entirely with ADO.Net abstractions, allowing us to achieve that underlying database engine decoupling we need in our replication engine.
DataAccessService¶
This service is the core of our replication engine, whose responsibility is to execute the rules as commands/queries to the database of the applications. For this purpose, we make use of multiple abstractions, making it so that we do not have to worry about the underlying DBMS, but using the best tools that ADO.Net can bring us nonetheless.
Abstractions¶
DataSync defines a series of abstractions that serve the purpose of providing us
with control over things that are hopelessly dependent on the underlying DBMS.
The implementations are defined in different modules for each specific DBMS. To
use them, it is only necessary to add a dependency on the module you need in the
DataSyncApplicationModule
:
DBMS | Module | Support |
---|---|---|
SQL Server | DataSyncSqlServerModule | For now, it only supports MS SQL Server (2016 and onwards) and Azure. |
Oracle | [Not implemented yet] |
IDbOperationsProvider¶
The IDbOperationsProvider
abstraction is the main service whose responsibility
is to bring "operation handlers" following the
CQRS pattern:
These handlers follow the command pattern and allow to perform a query or command to the database engine. By default, these handlers use DbCommand and DbParameter classes to execute their logic. In the case of the ICommandHandler it also may use another ADO.Net tool to perform bulk operations.
IDbConnectionProvider¶
The IDbConnectionProvider
abstraction is a service in charge of providing the
type of DbConnection
needed, based on the underlying database engine (Oracle
or SQLServer), using the connection string provided.
IDbSyntaxProvider¶
The IDbSyntaxProvider
abstraction, on the other hand, is in charge of
building the SQL statements of the DbCommand
that will be executed against
the database. It also provides any other DBMS' syntax-dependent logic that could
be needed, like handling the format of the
placeholder's names
used in the params (as DbParameters
) of the DbCommand
(explained more
below).
Security¶
Since we are talking about executing commands directly against the database without the security that we would get using ORMs, it is necessary to take measures to avoid SQL injection attacks that may surge from the information submitted by the users of the applications.
To prevent this the DbCommandExecutor uses DbParameters
in the DbCommand it
executes, whose statements generated by the IDbSyntaxProvider
uses only
placeholders and not the values directly:
This practice is known as "parametrized query" and is the first and most important step to prevent SQL injection. In a nutshell, this makes the underlying database engine interpret the user-input values as literal ones instead of executable code, preventing any malicious code hidden in some column's value from being executed.
Connection management¶
DataSync needs to handle the connections to the database in a multi-threaded way
to take advantage of asynchronous programming, involving the need of using these
the best way possible.
There are two main approaches about how to handle the
DbConnection
usage in multi-threaded applications
with ADO.Net, each one having its advantages and disadvantages.
In DataSync we follow the approach of having a DbConnection
in each thread: We
leave the responsibility of creating, using, and disposing of the connection to
the logic executed on each thread inside the DbCommandExecutor
service.
The reasons to do so are:
- Ensures the correct handling of
IDisposables
, like theDbConnection
,DbTransaction
, etc. - It ease the tests, leaving all these ADO.Net low-level abstractions which are problematic to mock in just one place.
- Better performance, since it makes full use of ADO.Net connection pooling feature.
Transaction management¶
At the time being, DataSync doesn't provide a way to configure when a
transaction should be committed.
Currently, the strategy followed is that every SQL command for an insert/update
of a certain row is executed to the database on its own transaction, the reason
being that it is needed to know exactly what went wrong with that row if an
exception arises.
Table management¶
Primary key columns¶
Currently DataSync only works with applications whose tables only have a single primary key column. This PK is used in:
- Content repositories: as an index to navigate and find rows.
- Auto-Healing requests: as an identifier to trace a tuple's flow, and to obtain the rows that must be re-sended.
- SQL statements: as a filter when executing a statement against specific rows, such as in stamp rules.