For a deeper look into our Eikon Data API, look into:

Overview |  Quickstart |  Documentation |  Downloads |  Tutorials |  Articles

question

Upvotes
Accepted
3 0 0 4

How do I clean the dataset?

Hi Dev Community,


Just a little background to my coding experience, I'm fairly green with regards to coding in Python. Picking up things and learning as I go along. I've only really got experience with VBA and am self taught. Apologies in advance if the solution is pretty obvious.


I'm trying to pull some historical FX data to determine the average daily % change between close dates.

df = ek.get_timeseries(rics, 
                        start_date='2021-03-01',  
                        end_date='2022-06-30',
                        fields='Open Close',
                        interval='daily

*** On a separate note, is there a limit to the amount of data I can pull on these queries?


I've printed out the dataframe to excel and have noticed that some currencies include prices on weekends. I've managed to handle these by converting the index to include the day name and then to exclude the weekends from the dataframe.

df.index = df.index.strftime('%a, %d-%b-%y')
df = df[df.index.str.contains('Sat', 'Sun') == False]

weekend.png


I've also noticed that some currencies have missing opening or closing prices. I'm struggling to figure out how to cycle through the dataframe (array?) to account for these missing prices.

  • If there is a missing OPEN price, I want to populate the field with the prior day's CLOSE price
  • If there is a missing CLOSE price, I want to populate the field the the next day's OPEN price.


Once this is all sorted I'll use the following calculate the daily % change and the average.

df_delta = df.pct_change()
df_mean = df_delta.mean()


Thanks!


eikon-data-apipythontime-seriesencoding
weekend.png (37.4 KiB)
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Upvotes
Accepted
6.8k 14 5 8

hi @allatuw ,

First, the limit can be checked at Eikon Data API Usage and Limits Guideline

Then about the missing prices, let's say the RICs list is the below

rics = ['AUD=','EUR=','GBP=','CNY=','CHF=','NZD=']
df = ek.get_timeseries(rics, 
                        start_date='2021-03-01',  
                        end_date='2022-06-30',
                        fields='Open Close',
                        interval='daily')
df.index = df.index.strftime('%a, %d-%b-%y')
df.loc[~df.index.str.contains('Sat', 'Sun')]

(I've adjusted the code to filter out Sat and Sun rows a bit)

To replace the missing prices, the code below can be used

for ric in rics:
    df[(ric, 'OPEN')].fillna(df[(ric, 'CLOSE')].shift(1), inplace=True)
    df[(ric, 'CLOSE')].fillna(df[(ric, 'OPEN')].shift(-1), inplace=True)

Here's an output example of the case when there is a missing OPEN price, I want to populate the field with the prior day's CLOSE price

1663225522944.png

Hope this helps and please let me know in case you have further questions


1663225522944.png (74.4 KiB)
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Testing with the mock up data

1663226632363.png

1663226632363.png (62.1 KiB)
Upvotes
3 0 0 4

Brilliant, thanks for your help @raksina.samasiri !

icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.