This article gives an example of how to use an exponentially weighted moving average filter to remove noise from a data set using the pandas library in python 3. I am writing this as the syntax for the library function has changed. The syntax I had been using is shown in Connor Johnoson’s well explained example here.
I will give some example code, plot the data sets then explain the code. The pandas documentation for this function is here. Like a lot of pandas documentation it is thorough, but could do with some more worked examples. I hope this article will plug some of that gap.
Here’s the example code:
import matplotlib.pyplot as plt import pandas as pd import numpy as np ewma = pd.Series.ewm x = np.linspace(0, 2 * np.pi, 100) y = 2 * np.sin(x) + 0.1 * np.random.normal(x) df = pd.Series(y) # take EWMA in both directions then average them fwd = ewma(df,span=10).mean() # take EWMA in fwd direction bwd = ewma(df[::-1],span=10).mean() # take EWMA in bwd direction filtered = np.vstack(( fwd, bwd[::-1] )) # lump fwd and bwd together filtered = np.mean(filtered, axis=0 ) # average plt.title('filtered and raw data') plt.plot(y, color = 'orange') plt.plot(filtered, color='green') plt.plot(fwd, color='red') plt.plot(bwd, color='blue') plt.xlabel('samples') plt.ylabel('amplitude') plt.show()
This produces the following plot. Orange line = noisy data set. Blue line = backwards filtered EWMA data set. Red line = forwards filtered EWMA data set. Green line = sum and average of the two EWMA data sets. This is the final filtered output.
Let’s look at the example code. After importing the libraries I will need in lines 1-5, I create some example data. Line 6 creates 100 x values with values spaced evenly from 0 to 2 * pi. Line 7 creates 100 y-values from these 100 x-values. Each y value = 2*sin(x)+some noise. The noise is generated using the np.random.normal function. This noisy sine function is plotted in line 15 and can be seen as the jagged orange line on the plot.
Forwards and backwards EWMA filtered data sets are created in lines 10 and 11.
Line 10 starts with the first x-sample and the corresponding y-sample and works forwards and creates an EWMA filtered data set called fwd. This is plotted in line 17 as the red line.
Line 11 starts at the opposite end of the data set and works backwards to the first – this is the backwards EWMA filtered set, called bwd. This is plotted in line 18 as the blue line.
These two EWMA filtered data sets are added and averaged in lines 12-13. This data set is called filtered. This data set is plotted in line 16 as the green line.
If you look at the ewma functions in line 10 and 11, there is a parameter called span. This controls the width of the filter. The lag of the backwards EWMA data behind the final averaged filtered output is equal to this value. Similarly the forward EWMA data set has an offset forwards of the noisy data set equal to this value. Increasing the span increases the smoothing and the lag. Increasing the value will also reduce the peaks of the filtered data in relation to the unfiltered data. You need to try out different values.
My present application for this filter is removing jitter from accelerometer data. I have also used this filter to smooth signals from hydrophones.