question

Upvotes
Accepted
16 0 0 3

FMRD: possible ways to extract data from the jsonl files downloaded through s3 buckets?

I downloaded the FRMD file from S3 buckets like RFT-FMRD-Quote_Symbology-EQUFND-Delta, RFT-FMRD-MKT_EVNT-REF-Init. 
When unzipped these files were in .jsonl and each row was a json object.
What would be the possible ways to extract the data from this jsonl file to a more readable format. I did try extracting speicific (key, values) from each json obj row and save them to a csv file but that would take > 24 hrs and not sure if each row could have its own schema and i fear that i might miss some data if i try to choose specific (key,values)

please find this sample json row from both the files for reference.sample1.txtsample2.txt


#productrefinitiv-data-platform
sample1.txt (4.7 KiB)
sample2.txt (1.4 KiB)
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

To make this data more readable and break down the complex structure to simpler csv:

import json
import json
import pandas as pd

pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)


def flatten_dict(d, prefix=''):
    flat_dict = {}
    for key, value in d.items():
        new_key = f"{prefix}.{key}" if prefix else key
        if isinstance(value, dict):
            flat_dict.update(flatten_dict(value, prefix=new_key))
        elif isinstance(value, list) and all(isinstance(x, dict) for x in value):
            for i, v in enumerate(value):
                flat_dict.update(flatten_dict(v, prefix=f"{new_key}.{i}"))
        else:
            flat_dict[new_key] = value
    return flat_dict



count = 1

rows = []
with open(jsonl_file_path, encoding="utf-8") as f:
    for line in f:
        json_obj = json.loads(line)
        print(count)
        if json_obj is not None:
            flat_data = flatten_dict(json_obj)
            rows.append(flat_data)
            count += 1

    df = pd.DataFrame(rows)


df.to_csv(csv_file.csv)

1 Answer

· Write an Answer
Upvotes
Accepted
14.9k 32 5 10

Hi @akhilesh.naredla ,

Have you tried using json.loads() in Python

First, let's say the string is stored in the variable x

1681106460045.png

we're loading them into variable y

y = json.loads(x)

1681106525608.png

Then with the variable y, we can access the value with the keys

y['Reference']['Names']

the output of the code is

[{'Name': 'SmartPool Trading Sessions'}]


Hope this helps and please let me know in case you have any further questions


1681106460045.png (36.2 KiB)
1681106525608.png (138.6 KiB)
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.