question

Upvotes
Accepted
35 1 3 5

Invalid EndOfStream c# gzip

if i try to read a gzip file with GZipStream on c# like this

string filePath = "....gz";

int count = 0;

using (FileStream reader = File.OpenRead(filePath))

using (var zip = new GZipStream(reader, CompressionMode.Decompress))

using (StreamReader unzip = new StreamReader(zip))

{

while (!unzip.EndOfStream)

{

var data = unzip.ReadLine();

count++;

}

}

Console.WriteLine(count);

i get less row than read decompressed csv file(decompress with Windows shell)

filePath = "...csv";

count = 0;

using (FileStream reader = File.OpenRead(filePath))

using (StreamReader unzip = new StreamReader(reader))

{

while (!unzip.EndOfStream)

{

var data = unzip.ReadLine();

count++;

}

}

Console.WriteLine(count);

The sample are in https://developers.thomsonreuters.com/elektron-data-solutions/datascope-select-rest-api/downloads

Any ideas, the Size and Packed size on gz archive is strange, Packed size are bigger than decompressed Size(on winrar ui)

tick-history-rest-apic#Downloadcompression
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

@M.ROSSIGNOLI, exactly which sample did you use ? Please note that the samples under the URL you posted are for DSS. For TRTH the samples are in https://developers.thomsonreuters.com/thomson-reuters-tick-history-trth/thomson-reuters-tick-history-trth-rest-api/downloads.

i mean "C# Example Application" Dss.Api.Examples.sln .net solution

Ah yes, ok. That one is the same for DSS and TRTH.

Upvote
Accepted
35 1 3 5

I found a workaround for now, with ICSharpCode.SharpZipLib.GZip.GZipInputStream( https://github.com/icsharpcode/SharpZipLib) libs i read all lines, so the problem seems to be .net built-in GZipStream(or malformed files from trth).

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
13.7k 26 8 12

@M.ROSSIGNOLI, what you observe reminds me of this issue. It was in Java, but the symptoms were similar: counting the number of lines of the file did not deliver the same thing when decompressing from the data stream from the server, or from a file saved on disk. Small data amounts worked fine, but with larger ones the end of the file was dropped. The issue was intermittent, so we had varying numbers of lines for what should have delivered a constant number of lines.

We found out that it was due to an issue with decompressing data on the fly, the popular public libraries we were using were not reliable enough to decompress large amounts of data flowing in through an input stream. After a long investigation we found other libraries that were more reliable. We also found a workaround, which was to first save the file to disk (without decompressing), and then reading it back from disk and decompressing at that time. That worked fine, without dropping data.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

The sample above is from file on disk. I logged in DataScope download with browser(Chrome) .gz file and after run code. It's a very weird behaviour.

Upvote
11.3k 25 8 14

@M.ROSSIGNOLI

I have found similar issue. It seems that either .Net GZipStream or DeflateStream somehow cannot completely decompress large gzip files generated by TRTH.

I use the SevenZipSharp to decompress the file as workaround. It requires both

SevenZipSharp.dll and 7-Zip 9.15 DLL-s files. To use it, you need to add the SevenZipSharp.dll as Reference and modify the path in the code to 7z.dll file's location. Below are the sample code.

//using (var zip = new GZipStream(reader, CompressionMode.Decompress))
SevenZip.SevenZipExtractor.SetLibraryPath(@"<your local path>\7z.dll");

using (var extractor = new SevenZip.SevenZipExtractor(filePath))
using (MemoryStream ms = new MemoryStream())
{
    int indexZip = extractor.ArchiveFileData.First().Index;
    //Decompress result to memory stream
    extractor.ExtractFile(indexZip, ms);
    ms.Position = 0;
    using (StreamReader unzip = new StreamReader(ms))
    while (!unzip.EndOfStream)
    {
        var data = unzip.ReadLine();
        count++;
    }       
}
Console.WriteLine(count);

Hope this helps.

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.