19 Oct 2017 • in interlok, s3, awsby

Failed messages not on the filesystem

Storing failed messages, not on the filesystem, but somewhere else entirely

As a general rule, we recommend that failed messages are stored somewhere that always exists like the filesystem; this is always a bit more predictable, having to error handle your error handling can lead to a rabbits’ warren of twisty little passages all alike. There are times, when you can’t do that because the filesystem isn’t appropriate (e.g. if you’re running in a container, and you haven’t got permanent volumes); you still want to write the messages into a permanent store, and retry them on demand.

We had a situation where a customer wanted to store all the failed messages in S3. Sadly failed-message-retrier only works with polling consumer implementations and our S3 integration is service based. It is still possible by having an additional channel and a normal filesystem based retrier.

interlok s3 error-handler

Handling the Errors

Essentially your error handling chain is similar to a normal filesystem based error chain, the only difference is that you are uploading to S3 rather writing to the filesystem.

<message-error-handler class="standard-processing-exception-handler">
  <unique-id>S3+HTTP_500</unique-id>
  <always-handle-exception>true</always-handle-exception>
  <processing-exception-service class="service-list">
    <services>
      <generate-unique-metadata-value-service>
        <metadata-key>error-id</metadata-key>
        <generator class="guid-generator"/>
      </generate-unique-metadata-value-service>
      <encoding-service>
        <encoder class="mime-encoder"/>
      </encoding-service>
      <amazon-s3-service>
        <connection class="shared-connection">
          <lookup-name>shared-amazon-s3</lookup-name>
        </connection>
        <operation class="amazon-s3-upload">
          <bucket-name class="constant-data-input-parameter">
            <value>${amazon.s3.bucket}</value>
          </bucket-name>
          <key class="constant-data-input-parameter">
            <value>interlok/%message{error-id}</value>
          </key>
        </operation>
      </amazon-s3-service>
      <exception-report-service>
        <exception-serializer class="exception-as-json-with-stacktrace"/>
      </exception-report-service>
      <!-- Insert your own error reporting system here -->
      <standalone-producer>
        <producer class="jetty-standard-response-producer">
          <status-provider class="http-configured-status">
            <status>INTERNAL_ERROR_500</status>
          </status-provider>
          <content-type-provider class="http-configured-content-type-provider">
            <mime-type>application/json</mime-type>
          </content-type-provider>
          <send-payload>true</send-payload>
        </producer>
      </standalone-producer>
    </services>
  </processing-exception-service>
</message-error-handler>

In this specific instance the following things happen:

Generate a new unique id which is used as the filename in the S3 bucket; we did this so that you can have a record of all the errors, if the workflow is complex then it might fail in different places.
Encode the data using mime-encoder: this preserves the workflowId where the message failed (the retrier uses this to figure out how to retry the message)
Upload the encoded message to S3.
Report the error back to the caller (in this case a HTTP 500 error with a jsonified stacktrace)
- Bonus error reporting using something like rollbar?

Triggering a retry

This is basically the reverse of the error handling, having written to the filesystem, you can wait for a standard failed message retrier to kick in.

Figure out the message we want to download from S3.
Download it from S3.
Write it out to the filesystem in the expected directory for the failed message retrier.
- we don’t need to encode it, because it’s already encoded.

<standard-workflow>
  <unique-id>CopyToRetry</unique-id>
  <consumer class="jetty-message-consumer">
    <unique-id>/retry</unique-id>
    <destination class="configured-consume-destination">
      <destination>/retry/*</destination>
      <configured-thread-name>CopyToRetry</configured-thread-name>
    </destination>
    <parameter-handler class="jetty-http-parameters-as-metadata"/>
    <header-handler class="jetty-http-headers-as-metadata"/>
  </consumer>
  <service-collection class="service-list">
    <services>
      <!-- assume http://localhost:8080/retry?errorId=XXXX -->
      <amazon-s3-service>
        <connection class="shared-connection">
          <lookup-name>shared-amazon-s3</lookup-name>
        </connection>
        <operation class="amazon-s3-download">
          <bucket-name class="constant-data-input-parameter">
            <value>${amazon.s3.bucket}</value>
          </bucket-name>
          <key class="constant-data-input-parameter">
            <value>interlok/%message{errorId}</value>
          </key>
        </operation>
      </amazon-s3-service>
      <standalone-producer>
        <producer class="fs-producer">
          <destination class="configured-produce-destination">
            <destination>${adapter.retry.dir.url}</destination>
          </destination>
          <fs-worker class="fs-overwrite-file"/>
          <create-dirs>true</create-dirs>
        </producer>
      </standalone-producer>
      <standalone-producer>
        <producer class="jetty-standard-response-producer">
          <status-provider class="http-configured-status">
            <status>ACCEPTED_202</status>
          </status-provider>
          <response-header-provider class="jetty-no-response-headers"/>
          <content-type-provider class="http-raw-content-type-provider">
            <content-type>text/plain</content-type>
          </content-type-provider>
          <send-payload>false</send-payload>
        </producer>
      </standalone-producer>
    </services>
  </service-collection>
</standard-workflow>

A full adapter that contains the logic can be found here. Trigger the /fail endpoint with any data that you want, and make a note of the error-id (check for it in the S3 bucket). Use that error-id when posting back to the /retry endpoint, and you will the message fail again, and another message appear in your S3 bucket.