Skip to content Skip to sidebar Skip to footer

Python Pandas How To Remove Outliers From A Dataframe And Replace With An Average Value Of Preceding Records

I have a dataframe 16k records and multiple groups of countries and other fields. I have produced an initial output of the a data that looks like the snipit below. Now i need to do

Solution 1:

I don't know of any built-ins to do this, but you should be able to customize this to meet your needs, no?

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.rand(10,5),columns=list('ABCDE'))
df.index = list('abcdeflght')

# Define cutoff value
cutoff = 0.90

for col in df.columns: 
    # Identify index locations above cutoff
    outliers = df[col][ df[col]>cutoff ]

    # Browse through outliers and average according to index location
    for idx in outliers.index:
        # Get index location 
        loc = df.index.get_loc(idx)

        # If not one of last two values in dataframe
        if loc<df.shape[0]-2:
            df[col][loc] = np.mean( df[col][loc+1:loc+3] )
        else: 
            df[col][loc] = np.mean( df[col][loc-3:loc-1] )

Post a Comment for "Python Pandas How To Remove Outliers From A Dataframe And Replace With An Average Value Of Preceding Records"