My customer who contracted for both Research API and Research bulk feed(= historical data of real-time research reports) is now trying to import a metadata file of the Research bulk feed to python using the following scripts, but can't import it successfully.
Can you please advise on how to import the metadata file to python successfully?
Scripts:
import pandas as pd df = pd.read_json("ctb4614_050_metadata.json", lines=True)
Returen:
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-14-87067cc29fae> in <module> 1 # df = pd.read_json("ctb4614_050_metadata.json", orient="split") ----> 2 df = pd.read_json("ctb4614_050_metadata.json", lines=True) 3 # df = pd.read_json("ctb4614_050_metadata.json", lines=True, orient="values") 4 # df = pd.read_json("ctb4614_050_metadata.zip", compression='infer') /opt/anaconda3/lib/python3.7/site-packages/pandas/io/json/_json.py in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit, encoding, lines, chunksize, compression) 590 return json_reader 591 --> 592 result = json_reader.read() 593 if should_close: 594 try: /opt/anaconda3/lib/python3.7/site-packages/pandas/io/json/_json.py in read(self) 713 elif self.lines: 714 data = ensure_str(self.data) --> 715 obj = self._get_object_parser(self._combine_lines(data.split("\n"))) 716 else: 717 obj = self._get_object_parser(self.data) /opt/anaconda3/lib/python3.7/site-packages/pandas/io/json/_json.py in _get_object_parser(self, json) 737 obj = None 738 if typ == "frame": --> 739 obj = FrameParser(json, **kwargs).parse() 740 741 if typ == "series" or obj is None: /opt/anaconda3/lib/python3.7/site-packages/pandas/io/json/_json.py in parse(self) 847 848 else: --> 849 self._parse_no_numpy() 850 851 if self.obj is None: /opt/anaconda3/lib/python3.7/site-packages/pandas/io/json/_json.py in _parse_no_numpy(self) 1091 if orient == "columns": 1092 self.obj = DataFrame( -> 1093 loads(json, precise_float=self.precise_float), dtype=None 1094 ) 1095 elif orient == "split": ValueError: Expected object or value
Hello @hiroko.yamaguchi
The ctb4614_050_metadata.json file is not a valid JSON message format file. It contains a lot of "docID" nodes without array element (example {"docID": 1}, {"docID": 2}, {"docID":3}).
The application needs to add "[" and "]" to set all "docID" nodes in the same array element
[{"docID": 1}, {"docID": 2}, {"docID":3}]
I have tried to add "[" at the first of the file and "]" at the end of file, then load with pd.read_json("ctb4614_050_metadata.json") and it works fine.
I could replicate it! Thank you!
On the other hand, they're provided with lots of such a file of metadata for all brokers they're entitled to, therefore, is there any way to add the "[" and "]" programmatically?
The Python application can use json.load() function to validate JSON file, but the file needs to be manual fix if it is invalid format.