Error Handling¶
We all know that bad things can (will) happen. Error handling is not about avoiding errors, it is about making conscious decisions on how the system should react to them. The error handling implemented in the Suite is based upon and leverages the error handling mechanisms provided by MassTransit
Idempotence¶
Info
Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application. Wikipedia
Before reviewing how errors are handled and published, we should first discuss when an error should be produced, and when it shouldn't.
A common scenario when processing messages is to receive a request to do an operation that was already done. This can happen due infrastructure errors (problems with message delivery, etc.) or by a bug on the requesting side.
For example let's say you have a Saga that listens for a message to complete a
Work Order (let's call it CompleteWorkOrder
), and when it succeeds it
publishes a message (let's call it WorkOrderCompleted
). Now imagine you
receive a message to Complete a Work Order, the message is valid, and the Work
Order can be completed, so the code to do so runs and modifies the entity
accordantly, and then the database transaction is successfully committed, but
right after that, there is a critical system failure (Network disconnection,
power off, etc.). This means that the CompleteWorkOrder
was not removed from
the queue (was not marked as processed) and the WorkOrderCompleted
message was
not published.
Info
Consumers/Sagas need to synchronize two systems: the DB and the broker, which is not an atomic operation. Hence, changes of one getting done without the other are high.
In the scenario above, since the message was left on the queue, when the system
is back online, it will try to process it again because the broker has no way to
tell that the message was already processed. In this cases it might be tempting
to produce an error, for example to throw some kind of
InvalidOperationException
or CannotCompleteWorkOrderException
, or
WorkOrderTransitionException
, etc. you get the point. This is is generally
not a good idea. Instead we should adhere to the Idempotence principle,
producing the same output for a given message. This generally means publishing
the same messages we would have published if it were the first time the message
arrived.
Keeping with the example scenario above, when processing the CompleteWorkOrder
a second time (or a third, or a fourth, etc.), we would publish the
WorkOrderCompleted
message again. There's no need to actually "complete the
Work Order" in the DB, since that's done already cause we are in the "Completed"
state. We just need to make sure the public API reflects what you would have
done in the normal consumption of this message.
You might be wondering, what happen if there is a service listening for the
WorkOrderCompleted
message? wouldn't publishing the same message several times
cause problems? The answer is no, as long as said service is also adhering to
the Idempotence principle.
We also have to consider that, the producer of the original message is probably expecting an outcome. Hence, we need to perform, public API speaking, exactly the same as what we would have done the first time we processed the message.
First line of defence: retry¶
A lot of times errors on our applications are related to issues that are short lived, and are quickly solved by automatic processes or IT personnel. For example a database might be unreachable for a few seconds due to overload, network issues, etc. On these scenarios, when an operation fails, the problem could probably be solved just by retrying later.
MassTransit includes a mechanism to do this automatically. Whenever an exception is thrown, and not caught by the code in the Consumer or Saga, MassTransit will retry processing the message.
Info
How many times to retry, and how much time it should wait between attempts can be configured using the Saga or Consumer Definition, but default values are provided by the Suite, when using the Resilient Definitions, which should be good for most cases.
When Retry is not enough¶
There are two conditions where retry might not be enough to solve the issue.
-
The infrastructure problem (e.g. database, network, etc.) could not be solved in time.
-
The issue is related to a business condition that cannot be met (e.g. the data of the message is invalid.)
For the second case mentioned, it would be great to be able to short circuit the
retry mechanism, to avoid retrying a message that we know cannot be processed.
To support this, the Suite is pre-configured not to retry if the exception
thrown is of type IBusinessException
.
After giving up on retrying, either because all possible attempts were exhausted
or because the error thrown is a IBusinessException
, the system will produce a
Fault.
Faults¶
From MassTransit
When a message consumer throws an exception instead of returning normally, a
Fault<T>
is produced, which may be published or sent depending upon the context. AFault<T>
is a generic message contract including the original message that caused the consumer to fail, as well as the ExceptionInfo, HostInfo, and the time of the exception.
Basically, producers of a message T
can listen to failures of that message by
consuming Fault<T>
.
Instead of creating custom messages for error operations, we use MassTransit Faults by simply throwing Business Exceptions.
Handling Faults¶
MassTransit will route a Fault based on how the message that caused it was consumed.
When doing Publish/Subscribe, the producer needs to Consume Fault<T>
. For
Consumers like Sagas, the Fault will be published like any other message and it
can be listen for using the same mechanisms use to consume any other type of
messages. If an error is raised while consuming a CreateUserMessage
, message a
Fault<CreateUserMessage>
message will be published. This means that the Fault
message can be consume by the same Saga that produced it, or also by any other
Saga. You just need to declare the Event as you do for any other message.
For Request/Response patterns the Fault will be automatically sent to the
requesting client, where it will manifest itself as a RequestFaultException
which can be handle with the classic try/catch
block.
C# | |
---|---|
Faults caused by BusinessException
¶
You can use the extension method HasBusinessException()
on Fault messages to
determine if the Fault was produce by a BusinessException. If you want to check
for a particular BusinessException
Code you can use the
HasBusinessExceptionWithCode
method.
If you need to get all the BusinessException
Codes contained in a Fault you
can use the GetBusinessExceptionCodes
method.
A note on using the Catch Method¶
There is a method available when setting up a Saga that can be used to catch exceptions thrown on the pipe. It might be tempting to do something like this:
C# | |
---|---|
This is generally consider an Anti-Pattern and can cause several issues. If
the Saga has a Catch
call that handles an issue, then said error is consider,
well, handled. This means that the error will not bubble up and the operation
will be consider successful by MassTransit, so any database operation will be
committed, even though the exception could have caused some field to be in an
undefined state.
This Anti-Pattern also adds a lot of unnecessary code that has to be maintained
and tested. Generally there is no need to manually publish a CreatedUserFailed
custom error message, the system will automatically publish the corresponding
Fault<CreateUserMessage>
message.
Important
Avoid using the Catch
method. If you really think that there is an
scenario where you really need to, please contact the Suite team to discuss
the matter.