1 1 0 2

OMM Provider Application Crashes Periodically When Sending Status Message

Hi Developers,

We have received several reports from different customers over the past few months of our OMM provider application crashing seemingly at random. Upon examining the memory dumps provided by the customers, the common point of failure is in the rfa::sessionLayer::RSSL_Prov_ChannelSession::removeToken(rfa::sessionLayer::RSSLRequestToken*) method. This is called when posting an OMM status message that closes the stream.

Partial Stack Trace from LLDB:

frame #0: 0x00007f6193959033`rfa::sessionLayer::RSSL_Prov_ChannelSession::removeToken(rfa::sessionLayer::RSSLRequestToken*)
 + 483
frame #1: 0x00007f6193959414`rfa::sessionLayer::RSSL_Prov_ChannelSession::processStatusMsg(RsslMsg*, rfa::sessionLayer::RSSLRequestToken*, unsigned char, unsigned char, rfa::common::RFA_String&) + 260
frame #2: 0x00007f61932f03a8`rfa::sessionLayer::OMMProviderImpl::submitCmd(long, rfa::sessionLayer::OMMSolicitedItemCmd const&, void*) + 664
frame #3: 0x00007f61932f05c1`rfa::sessionLayer::OMMProviderImpl::submit(rfa::sessionLayer::OMMCmd const*, void*) + 337
frame #4: Our code...

From looking through the disassembly of removeToken it appears that the crash occurred outside of a critical section (pthread_mutex_unlock appears to have been called recently), so I don't know if it's some kind of race condition, but if you could let me know what the root cause might be and/or if there is a fix available I'd be very grateful.

The weirdest part is that this can happen twice in a week, then go silent for months. We've had 3 reports of this in the last year from different customers. It is noted that the changelog for 7.5.1.L1 that there was a fix for "OMM RFA provider crashes randomly in removeToken function", could this be a regression from that release or is there something else at fault (either on the RFA side or our application code)?

Other info:

  • We're currently using RFA 7.6.2.E2 on Linux (RHEL6 x64 GCC44). Unfortunately we cannot upgrade to RFA8 since we need to continue supporting a limited number of RHEL5 customers as well as RHEL6.
  • The crashing application is a multi-threaded OMM provider application.
  • The full stack trace indicates that our application attempted to send a Status message that closed the stream (in this case, the requested item was not found on the server.
  • We are not able to reliably replicate the issue in our lab tests.

I am more than happy to provide more information as required.

Many thanks.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

1 Answer

4.4k 10 6 9

A RequestToken is considered closed and deleted when the provider finishes processing the close request from the consumer or the provider application closes the stream (i.e. send StreamState Closed message).

It is the provider application responsibility to avoid using the deleted RequestToken.

RFA does attempt to catch and fire an error message when the application attempts to use the invalid RequestToken but it may cover every situation.

From the Dev guide section Request Tokens

If an application attempts to send data on a request token that is closed, it may receive an OMMCmdErrorEvent in response. The application will receive the error event one time per provider on the first submit after the token has been closed. All data sent on a closed request token will be dropped.

Providers should not attempt to submit data in the following cases:

After the associated client session has become inactive

After the application has unregistered the client session associated with the token

After a complete refresh has been submitted for a non-streaming request

After the provider has received and processed a close request for an item

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.