I've discovered a situation that causes my program to (sometimes)
terminate with segment fault during the reconnection logic.
The
code gets in the SEGV somewhere during the processing of runReactorWorker
function, in the shared library: librsslVA.so
Since the library was shipped without debug
symbols, I had to create my own version of the .so from the source included in
the Impl directory in the distribution. (1.0.6)
The
problem reproduces easily with the version compiled locally including debug
symbols.
The problem
shows up at line 614 of rsslReactorWorker.c
The
line of source causing the problem is:
if
(pReactorChannel->reactorChannel.pRsslChannel->socketId !=
REACTOR_INVALID_SOCKET)
When
the problem occurs, pReactorChannel->reactorChannel.pRsslChannel
is 0.
By
looking at the logs in the Elektron Edge Device, I can see that the program was
disconnected as a slow consumer (the output buffer had an overflow).
Elsewhere
in the logic in RsslReactorWorker.c, when in the processing logic forRSSL_RC_CET_CHANNEL_DOWN_RECONNECTING,
the following line appears:
pReactorChannel->reactorChannel.pRsslChannel
= 0;
This is line 529, immediately following a
call to close the channel.
That a
SEGV does NOT occur every time a reconnect is processed indicates that there is
some soft of timing issue here, that makes the error difficult to find.
The fix
may be as simple as checking if pReactorChannel->reactorChannel.pRsslChannel is 0
before retrieving the socketId
(I've
just had a look at the 1.0.7 version of the source to see if the problem is
still present. In the newer version, the source the line of the SEGV is identical).