Hi Developers,
We have received several reports from different customers over the past few months of our OMM provider application crashing seemingly at random. Upon examining the memory dumps provided by the customers, the common point of failure is in the rfa::sessionLayer::RSSL_Prov_ChannelSession::removeToken(rfa::sessionLayer::RSSLRequestToken*) method. This is called when posting an OMM status message that closes the stream.
Partial Stack Trace from LLDB:
frame #0: 0x00007f6193959033 libRFA_RSSL_Prov_Adapter.so`rfa::sessionLayer::RSSL_Prov_ChannelSession::removeToken(rfa::sessionLayer::RSSLRequestToken*) + 483 frame #1: 0x00007f6193959414 libRFA_RSSL_Prov_Adapter.so`rfa::sessionLayer::RSSL_Prov_ChannelSession::processStatusMsg(RsslMsg*, rfa::sessionLayer::RSSLRequestToken*, unsigned char, unsigned char, rfa::common::RFA_String&) + 260 frame #2: 0x00007f61932f03a8 libRFA_SessionLayer_OMM.so`rfa::sessionLayer::OMMProviderImpl::submitCmd(long, rfa::sessionLayer::OMMSolicitedItemCmd const&, void*) + 664 frame #3: 0x00007f61932f05c1 libRFA_SessionLayer_OMM.so`rfa::sessionLayer::OMMProviderImpl::submit(rfa::sessionLayer::OMMCmd const*, void*) + 337 frame #4: Our code...
From looking through the disassembly of removeToken it appears that the crash occurred outside of a critical section (pthread_mutex_unlock appears to have been called recently), so I don't know if it's some kind of race condition, but if you could let me know what the root cause might be and/or if there is a fix available I'd be very grateful.
The weirdest part is that this can happen twice in a week, then go silent for months. We've had 3 reports of this in the last year from different customers. It is noted that the changelog for 7.5.1.L1 that there was a fix for "OMM RFA provider crashes randomly in removeToken function", could this be a regression from that release or is there something else at fault (either on the RFA side or our application code)?
Other info:
I am more than happy to provide more information as required.
Many thanks.
A RequestToken is considered closed and deleted when the provider finishes processing the close request from the consumer or the provider application closes the stream (i.e. send StreamState Closed message).
It is the provider application responsibility to avoid using the deleted RequestToken.
RFA does attempt to catch and fire an error message when the application attempts to use the invalid RequestToken but it may cover every situation.
From the Dev guide section 11.3.3.2 Request Tokens
If an application attempts to send data on a request token that is closed, it may receive an OMMCmdErrorEvent in response. The application will receive the error event one time per provider on the first submit after the token has been closed. All data sent on a closed request token will be dropped.
Providers should not attempt to submit data in the following cases:
• After the associated client session has become inactive
• After the application has unregistered the client session associated with the token
• After a complete refresh has been submitted for a non-streaming request
• After the provider has received and processed a close request for an item