question

Upvotes
Accepted
16 0 0 2

FMRD: possible ways to extract data from the jsonl files downloaded through s3 buckets?

I downloaded the FRMD file from S3 buckets like RFT-FMRD-Quote_Symbology-EQUFND-Delta, RFT-FMRD-MKT_EVNT-REF-Init. 
When unzipped these files were in .jsonl and each row was a json object.
What would be the possible ways to extract the data from this jsonl file to a more readable format. I did try extracting speicific (key, values) from each json obj row and save them to a csv file but that would take > 24 hrs and not sure if each row could have its own schema and i fear that i might miss some data if i try to choose specific (key,values)

please find this sample json row from both the files for reference.sample1.txtsample2.txt


#productrefinitiv-data-platform
sample1.txt (4.7 KiB)
sample2.txt (1.4 KiB)
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

To make this data more readable and break down the complex structure to simpler csv:

import json
import json
import pandas as pd

pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)


def flatten_dict(d, prefix=''):
    flat_dict = {}
    for key, value in d.items():
        new_key = f"{prefix}.{key}" if prefix else key
        if isinstance(value, dict):
            flat_dict.update(flatten_dict(value, prefix=new_key))
        elif isinstance(value, list) and all(isinstance(x, dict) for x in value):
            for i, v in enumerate(value):
                flat_dict.update(flatten_dict(v, prefix=f"{new_key}.{i}"))
        else:
            flat_dict[new_key] = value
    return flat_dict



count = 1

rows = []
with open(jsonl_file_path, encoding="utf-8") as f:
    for line in f:
        json_obj = json.loads(line)
        print(count)
        if json_obj is not None:
            flat_data = flatten_dict(json_obj)
            rows.append(flat_data)
            count += 1

    df = pd.DataFrame(rows)


df.to_csv(csv_file.csv)

@akhilesh.naredla

Hi,

Thank you for your participation in the forum.

Is the reply below satisfactory in answering your question?

If yes please click the 'Accept' text next to the reply. This will guide all community members who have a similar question.

Otherwise please post again offering further insight into your question.

Thanks,

AHS

Please be informed that a reply has been verified as correct in answering the question, and has been marked as such.

Thanks,


AHS


1 Answer

· Write an Answer
Upvotes
Accepted
14.2k 30 5 10

Hi @akhilesh.naredla ,

Have you tried using json.loads() in Python

First, let's say the string is stored in the variable x

1681106460045.png

we're loading them into variable y

y = json.loads(x)

1681106525608.png

Then with the variable y, we can access the value with the keys

y['Reference']['Names']

the output of the code is

[{'Name': 'SmartPool Trading Sessions'}]


Hope this helps and please let me know in case you have any further questions


1681106460045.png (36.2 KiB)
1681106525608.png (138.6 KiB)
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.