Python Pandas -remove rows based on given value in cell?

I have following script which removes rows in which date start with 202110 in column DATE OF OPERATION. I understand that space in the name of column is not allowed so script also replace space by _ and then after rows are removed it add back the space. For some reason I couldn't attach csv here so please see example below:

Column1DATE OF OPERATIONNAVpUnitsdsasa2021120124324dsasa2021102223232sd20211022232sd202110223-2802.6667


The code is as below and the error I'm getting is: KeyError: 'DATE_OF_OPERATION'

Could you advice what is the case of the error? - Thank you

import os
import glob
import pandas as pd
from pathlib import Path
source_files = sorted(Path(r'/Users/maciejgrzeszczuk/Downloads/').glob('*.csv'))

for file in source_files:
df = pd.read_csv(file)
df.columns = df.columns.str.replace(' ', '_')
df = df[~df['DATE_OF_OPERATION'].astype(str).str.startswith('202110')]
df.columns = df.columns.str.replace('_', ' ')
name, ext = file.name.split('.')
df.to_csv(f'{name}.{ext}', index=0)
Tagged:

Best Answer

  • pf
    pf LSEG
    Answer ✓

    Hi @grzeszczukmaciek ,

    This is a pure Python question (and not an issue related to our APIs), but let's try to propose a solution.

    spaces are allowed in column names, so your code could be simplified to:

    for file in source_files:
     df = pd.read_csv(file)
     df = df[~df['DATE OF OPERATION'].astype(str).str.startswith('202110')]
     name, ext = file.name.split('.')
     df.to_csv(f'{name}.{ext}', index=0)

    On my side, it woked with following file.csv content:

    DATE OF OPERATION,ASK,BID,TRADE PRICE
    20210901,10,12,11
    20211001,11,13,12
    20211101,12,14,13

Welcome!

It looks like you're new here. Sign in or register to get started.

Welcome!

It looks like you're new here. Sign in or register to get started.