FMRD: possible ways to extract data from the jsonl files downloaded through s3 buckets?

akhilesh.naredla
Newcomer
I downloaded the FRMD file from S3 buckets like RFT-FMRD-Quote_Symbology-EQUFND-Delta, RFT-FMRD-MKT_EVNT-REF-Init.
When unzipped these files were in .jsonl and each row was a json object.
What would be the possible ways to extract the data from this jsonl file to a more readable format. I did try extracting speicific (key, values) from each json obj row and save them to a csv file but that would take > 24 hrs and not sure if each row could have its own schema and i fear that i might miss some data if i try to choose specific (key,values)
please find this sample json row from both the files for reference.sample1.txtsample2.txt
Tagged:
0
Best Answer
-
Hi @akhilesh.naredla ,
Have you tried using json.loads() in Python
First, let's say the string is stored in the variable x
we're loading them into variable y
y = json.loads(x)
Then with the variable y, we can access the value with the keys
y['Reference']['Names']
the output of the code is
[{'Name': 'SmartPool Trading Sessions'}]
Hope this helps and please let me know in case you have any further questions
0
Answers
-
To make this data more readable and break down the complex structure to simpler csv:
import json
import json
import pandas as pd
pd.set_option("display.max_columns", None)
pd.set_option("display.max_rows", None)
def flatten_dict(d, prefix=''):
flat_dict = {}
for key, value in d.items():
new_key = f"{prefix}.{key}" if prefix else key
if isinstance(value, dict):
flat_dict.update(flatten_dict(value, prefix=new_key))
elif isinstance(value, list) and all(isinstance(x, dict) for x in value):
for i, v in enumerate(value):
flat_dict.update(flatten_dict(v, prefix=f"{new_key}.{i}"))
else:
flat_dict[new_key] = value
return flat_dict
count = 1
rows = []
with open(jsonl_file_path, encoding="utf-8") as f:
for line in f:
json_obj = json.loads(line)
print(count)
if json_obj is not None:
flat_data = flatten_dict(json_obj)
rows.append(flat_data)
count += 1
df = pd.DataFrame(rows)
df.to_csv(csv_file.csv)0
Categories
- All Categories
- 3 Polls
- 6 AHS
- 36 Alpha
- 166 App Studio
- 6 Block Chain
- 4 Bot Platform
- 18 Connected Risk APIs
- 47 Data Fusion
- 34 Data Model Discovery
- 690 Datastream
- 1.4K DSS
- 629 Eikon COM
- 5.2K Eikon Data APIs
- 11 Electronic Trading
- 1 Generic FIX
- 7 Local Bank Node API
- 3 Trading API
- 2.9K Elektron
- 1.4K EMA
- 255 ETA
- 559 WebSocket API
- 39 FX Venues
- 15 FX Market Data
- 1 FX Post Trade
- 1 FX Trading - Matching
- 12 FX Trading – RFQ Maker
- 5 Intelligent Tagging
- 2 Legal One
- 25 Messenger Bot
- 3 Messenger Side by Side
- 9 ONESOURCE
- 7 Indirect Tax
- 60 Open Calais
- 280 Open PermID
- 45 Entity Search
- 2 Org ID
- 1 PAM
- PAM - Logging
- 6 Product Insight
- Project Tracking
- ProView
- ProView Internal
- 23 RDMS
- 2K Refinitiv Data Platform
- 717 Refinitiv Data Platform Libraries
- 4 LSEG Due Diligence
- LSEG Due Diligence Portal API
- 4 Refinitiv Due Dilligence Centre
- Rose's Space
- 1.2K Screening
- 18 Qual-ID API
- 13 Screening Deployed
- 23 Screening Online
- 12 World-Check Customer Risk Screener
- 1K World-Check One
- 46 World-Check One Zero Footprint
- 45 Side by Side Integration API
- 2 Test Space
- 3 Thomson One Smart
- 10 TR Knowledge Graph
- 151 Transactions
- 143 REDI API
- 1.8K TREP APIs
- 4 CAT
- 27 DACS Station
- 121 Open DACS
- 1.1K RFA
- 106 UPA
- 194 TREP Infrastructure
- 229 TRKD
- 918 TRTH
- 5 Velocity Analytics
- 9 Wealth Management Web Services
- 95 Workspace SDK
- 11 Element Framework
- 5 Grid
- 19 World-Check Data File
- 1 Yield Book Analytics
- 48 中文论坛