Hi there!
We downloaded data for more than thousands mutual funds and ETFs to build a proprietary fund database. After some descriptive analysis, we found out that about 50% of funds have extreme returns that aren’t correct (verified and cross-checked manually with external data sources). In other words, the NAV and return data seems to be corrupted. We dug deeper into this and tried out different functions and methods to download the data (see the PDF for the code). There we intentionally tried funds whose data is corrupted. All tested methods return historical NAV data with extreme returns (>50%, >100%). These jumps do not reflect actual fund performance and appear to be caused by:
• Fund splits/consolidations not being adjusted
• Data source inconsistencies
• Missing adjustment factors
The issue persists across different API methods and fields. Interestingly, we manually downloaded the data for these funds using the Excel Add-In and got the correct data without any extreme returns, suggesting different data sources.
However, we cannot download the NAV and return data for thousands of funds via the Excel Add-In. The corrupted data from the Codebook renders our project useless. Any ideas what we could do?
Thanks a lot in advance!
Marvin