EMA memory usage increase

The issue:

Increase in memory usage over time, signifying a possible memory leak.


Platform and software:

Refinitiv Real-Time-SDK 2.2.1, EMA C++ on Linux


The EMA usage profile:

  1. ~3 million RICs subscribed for MMT_MARKET_PRICE, spread over 10 Edge host, with two connections to each, i.e. 20 client channels
  2. Subset of that RIC universe is also subscribed to MMT_MARKET_BY_PRICE and MMT_MARKET_MAKER
  3. RefreshMsg, StatusMsg and UpdateMsg copied using their copy-constructor, in the client callback, for processing on separate threads.

The behavior:

  • Base case
    • Items 1 and 2 remained as is.
    • For item 3 above, not making copies, simply receiving callbacks and doing nothing, results in small non-increasing memory footprint.
  • Fast increase in memory usage
    • All three items above remained as is.
    • Occurs when exceptions are thrown by EMA with high frequency, e.g.:
      • Exception Type='OmmInvalidUsageException', Text='The enum value 2315 for the field Id 374 does not exist in the enumerated type dictionary', ErrorCode='-22'
      • Exception Type='OmmInvalidUsageException', Text='The enum value 1115 for the field Id 374 does not exist in the enumerated type dictionary', ErrorCode='-22'
      • Exception Type='OmmInvalidUsageException', Text='Failed to convert to UTF8 in RmtesBufferImpl::toString(). Reason: RSSL_RET_FAILURE', ErrorCode='-1'
  • Slow increase in memory usage
    • All three items above remained as is.
    • Occurs when no exceptions are thrown
    • There is nothing in the logs to signify an issue

Investigation is ongoing on our end to profile the application to try to identify the source of increase in memory usage.

Best Answer

  • Gurpreet
    Gurpreet admin
    Answer ✓

    Hello @leonardo.pupacic,

    You have correctly identified that the memory leak is happening somewhere in the event loop, where somehow copied objects or some other data are not being released into the heap. Only suggestions I can offer are:

    1. Try an EMA example with the same set of instruments - this will rule out the memory leak in the SDK

    2. Use profilers to identify which objects are not being reclaimed.

Answers

  • During our investigation thus-far, we don't believe we've fully gotten to the bottom of the main cause.
    However, along the way we did find some concerning things that we believe should be reported and fixed.

    During regular message parsing we discovered 2 out of bound reads. One in OmmUintDecoder, and the other
    in OmmEnumDecoder:

    Invalid read of size 4
    at 0x9981A8: rsslDecodeUInt
    by 0x59CDF3: refinitiv::ema::access::OmmUIntDecoder::setRsslData(RsslDecIterator*, RwfBuffer*)
    by 0x5DA31E: refinitiv::ema::access::Decoder::setRsslData(refinitiv::ema::access::Data**, unsigned char, RsslDecIterator*, RwfBuffer*, RsslDataDictionary const*, void*) const
    by 0x57DCAF: refinitiv::ema::access::FieldListDecoder::getNextData()
    by 0x57B7CC: refinitiv::ema::access::FieldList::forth() const
    Address 0x12770b753 is 2,019 bytes inside a block of size 2,022 alloc'd

    Which occurred once during runtime

    Invalid read of size 2
    at 0x9991C5: rsslDecodeEnum
    by 0x5987C3: refinitiv::ema::access::OmmEnumDecoder::setRsslData(RsslDecIterator*, RwfBuffer*)
    by 0x5DA31E: refinitiv::ema::access::Decoder::setRsslData(refinitiv::ema::access::Data**, unsigned char, RsslDecIterator*, RwfBuffer*, RsslDataDictionary const*, void*) const
    by 0x57DCAF: refinitiv::ema::access::FieldListDecoder::getNextData()
    by 0x57B7CC: refinitiv::ema::access::FieldList::forth() const
    Address 0x927f0a74 is 164 bytes inside a block of size 165 alloc'd

    The above instance occurred much more frequently than the former:
    369 errors in context 2 of 35

    We also found the following small leaks, that could add up over time (especially the login, if the application is disconnected):

    320 bytes in 20 blocks are definitely lost in loss record 19,934 of 21,026
    at 0x4C2A593: operator new(unsigned long)
    by 0x61BB79: refinitiv::ema::access::LoginCallbackClient::getLoginItem(refinitiv::ema::access::ReqMsg const&, refinitiv::ema::access::OmmConsumerClient&, void*)
    by 0x614274: refinitiv::ema::access::ItemCallbackClient::registerClient(refinitiv::ema::access::ReqMsg const&, refinitiv::ema::access::OmmConsumerClient&, void*, unsigned long long)
    by 0x5949D8: refinitiv::ema::access::OmmConsumerImpl::registerClient(refinitiv::ema::access::ReqMsg const&, refinitiv::ema::access::OmmConsumerClient&, void*, unsigned long long)

    This instance above, was also reported here: https://github.com/Refinitiv/Real-Time-SDK/issues/280

    2,840 (1,920 direct, 920 indirect) bytes in 20 blocks are definitely lost in loss record 20,798 of 21,026
    at 0x4C2A593: operator new(unsigned long)
    by 0x602B81: refinitiv::ema::access::EmaConfigBaseImpl::processXMLnodePtr(refinitiv::ema::access::XMLnode*, _xmlNode* const&)
    by 0x591392: refinitiv::ema::access::OmmConsumerConfigImpl::validateSpecifiedSessionName()
    by 0x58FA08: refinitiv::ema::access::OmmConsumer::OmmConsumer(refinitiv::ema::access::OmmConsumerConfig const&)

  • Hello @adam.melaney,

    Thanks for bringing these up. Similar to the third one - you can report these issues on the GitHub. The development team will need some mechanism to reproduce it, some some code examples would help identify it.