question

Upvotes
Accepted
30 4 7 14

Remove Unwanted Columns After Fetching data DSGET

Hello, once we fetch data from DSWS we generally get this ouput:


# Set Start and End Dates with the following format "yyyy-mm-dd"
StartDate = "2023-03-01" #Set start date
EndDate = "2023-05-31"   #Set end date

# Set The Frequency Of The Data Equal to D,W or M
Freq = 'M'

# Initializations
sp500_data = pd.DataFrame()
ism_data = pd.DataFrame()

# Fetch daily data and add 1 period return               
sp500_data = ds.get_data(tickers='S&PCOMP', fields=['PI'],start=StartDate, end=EndDate, freq=Freq)    

The sp500_data dataframe will looks like this:

Instrument S&PCOMP
Field PI
Currency U$
Dates
2023-03-01 3951.39
2023-04-01 4124.51
2023-05-01 4167.87


Now, how to convert that output into a better dataframe similar to the following:

Dates S&PCOMP
3/1/2023 3951.39
4/1/2023 4124.51
5/1/2023 4167.87


I am not able to find a proper solution, looks like it originally fetches data using MultiIndex that always create a mess when you have to manage data into dataframes. BE aware that the ticker is near the "Dates" not ABOVE so a reset_index will not solve here (Dates is my "index").

python#technologydatadatastream-apidatastream
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

1 Answer

· Write an Answer
Upvotes
Accepted
6.9k 21 3 6

Hi @Aleniles ,

You might be interested in 'Date format and column headings' or try this (I did it with two fields to make it more generic/globally useful):


sp500_data = ds.get_data(
    tickers='S&PCOMP',
    fields=['PI', 'X'],
    start="2023-03-01",
    end="2023-05-31",
    freq='M')
sp500_data.columns = [', '.join(sp500_data.columns[i])
                      for i in range(len(sp500_data.columns))]
sp500_data
icon clock
10 |1500

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.

Hi Jonathan, thanks for getting back to me.


Unfortunately this doesn't solve.

As you can see in the attached image, the output still doesn't put the header in one line only (see red blank cells) Annotation 2023-06-05 101601.jpg

You can, of course, achieve this by resetting the index but I would like to have all in one header without resetting (keeping the "dates" as the index).

I mean, the name of the fields and the date must be on the same line (1 row of header) without re-indexing.

Hi @Aleniles,

The Datastream API Owners, here at LSEG Refinitiv, chose to output their Python Library's data using Pandas dataframes because it has become the norm in the Python community. Pandas seperates the data in its dataframes and the label for each row (named 'index'). Pandas also allows you to name its index; here it was named 'Dates'.

As can be seen with the Python command `sp500_data.columns`, you will see that 'Dates' does not show up (the output is `Index(['S&PCOMP, PI, U$', 'S&PCOMP, X, U$'], dtype='object')`). The Python command `sp500_data.index` shows you the index (`Index(['2023-03-01', '2023-04-01', '2023-05-01'], dtype='object', name='Dates')` in this instance).

If you would like to remove the index name, you can do so with the Python command `sp500_data.index.name = None`. I would not advise it, but it's up to you. This, however, does not "put" the 'Dates' name in the columns' name, since Dates, here, is not in the dataframe, it's the index.

You can have the date in the dataframe if you want, with code like this:


sp500_data = ds.get_data(
    tickers='S&PCOMP',
    fields=['PI', 'X'],
    start="2023-03-01",
    end="2023-05-31",
    freq='M')
sp500_data.columns = [', '.join(sp500_data.columns[i])
                      for i in range(len(sp500_data.columns))]
sp500_data = sp500_data.reset_index()

Write an Answer

Hint: Notify or tag a user in this post by typing @username.

Up to 2 attachments (including images) can be used with a maximum of 512.0 KiB each and 1.0 MiB total.