FMRD: possible ways to extract data from the jsonl files downloaded through s3 buckets?

akhilesh.naredla

I downloaded the FRMD file from S3 buckets like RFT-FMRD-Quote_Symbology-EQUFND-Delta, RFT-FMRD-MKT_EVNT-REF-Init. 
When unzipped these files were in .jsonl and each row was a json object.
What would be the possible ways to extract the data from this jsonl file to a more readable format. I did try extracting speicific (key, values) from each json obj row and save them to a csv file but that would take > 24 hrs and not sure if each row could have its own schema and i fear that i might miss some data if i try to choose specific (key,values)

please find this sample json row from both the files for reference.sample1.txtsample2.txt

Find more posts tagged with

refinitiv-data-platform

#product

Accepted answers

raksina.samasiri

Hi @akhilesh.naredla ,

Have you tried using json.loads() in Python

First, let's say the string is stored in the variable x

we're loading them into variable y

y = json.loads(x)

Then with the variable y, we can access the value with the keys

y['Reference']['Names']

the output of the code is

[{'Name': 'SmartPool Trading Sessions'}]

Hope this helps and please let me know in case you have any further questions

1681106460045.png

1681106525608.png

All comments

raksina.samasiri

Hi @akhilesh.naredla ,

Have you tried using json.loads() in Python

First, let's say the string is stored in the variable x

we're loading them into variable y

y = json.loads(x)

Then with the variable y, we can access the value with the keys

y['Reference']['Names']

the output of the code is

[{'Name': 'SmartPool Trading Sessions'}]

Hope this helps and please let me know in case you have any further questions

1681106460045.png

1681106525608.png

akhilesh.naredla

To make this data more readable and break down the complex structure to simpler csv:

import json
import json
import pandas as pd

pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)


def flatten_dict(d, prefix=''):
    flat_dict = {}
    for key, value in d.items():
        new_key = f"{prefix}.{key}" if prefix else key
        if isinstance(value, dict):
            flat_dict.update(flatten_dict(value, prefix=new_key))
        elif isinstance(value, list) and all(isinstance(x, dict) for x in value):
            for i, v in enumerate(value):
                flat_dict.update(flatten_dict(v, prefix=f"{new_key}.{i}"))
        else:
            flat_dict[new_key] = value
    return flat_dict



count = 1

rows = []
with open(jsonl_file_path, encoding="utf-8") as f:
    for line in f:
        json_obj = json.loads(line)
        print(count)
        if json_obj is not None:
            flat_data = flatten_dict(json_obj)
            rows.append(flat_data)
            count += 1

    df = pd.DataFrame(rows)


df.to_csv(csv_file.csv)

Quick Links