-
Notifications
You must be signed in to change notification settings - Fork 1
Hw03
Kernel Density Estimation of Time Series Data
Save this data set to a file under the filename 'goals.dat'. I found this dataset on this site. It is soccer goals scored by England against Scotland at Hampden Park, Glasgow from 1872 – 1987. The first column is the number of goals and the second column is the year. There are gaps in the year (when they didn't play) and there are years where a score wasn't recorded (these represent actual gaps in data).
0,1872
1,1874
0,1876
2,1878
4,1880
1,1882
0,1884
1,1886
5,1888
1,1890
4,1892
2,1894
1,1896
3,1898
1,1900
1,1904
1,1906
1,1908
0,1910
1,1912
1,1914
NA,1916
NA,1918
0,1921
2,1923
0,1925
2,1927
0,1929
0,1931
1,1933
0,1935
1,1937
2,1939
NA,1941
NA,1943
NA,1945
2,1948
1,1950
2,1952
4,1954
1,1956
4,1958
1,1960
0,1962
0,1964
4,1966
1,1968
0,1970
1,1972
0,1974
1,1976
1,1978
2,1980
1,1982
1,1984
0,1985
0,1987
Write a python script to compute the Kernel Density Estimation of this dataset using a Gaussian kernel with a bandwidth of 1, 3, and 10. Which bandwidth parameter do you fell produces the best results? Be sure to skip over records in the file that have "NA" for scores in that year.
Your program should generate 1 plot with two lines in the same plot.
- A plot of the goals. The x axis represents the year, the y axis represents England's the score.
- The kernel density estimation line of the data with your preferred bandwidth parameter.
Use the method "plt.ylabel" to label the y-axis as England's score and "plt.xlabel" as the year.
import sys
import math
import matplotlib.pyplot as plt
SQRT_2PI = math.sqrt(2.0 * math.pi)
def gaussian(x):
return math.exp(-0.5*x*x)/SQRT_2PI
def function_kde(x, y, h, binpoints):
bins = [0] * len(binpoints)
for i in range(len(binpoints)):
bins[i] = y * gaussian( (binpoints[i] - x) / float(h) ) / float(h)
return bins
if __name__=='__main__':
x = []
y = []
for line in file('goals.dat'):
line = line.strip()
[goals, year] = line.split(",")
if goals != "NA":
x.append( float(year) )
y.append( float(goals) )
n = len(x)
low = min(x)
high = max(x)
nbins = n * 1
binpoints = [0] * nbins
masterbin = [0] * nbins
delta = (high - low) / (nbins - 1)
b = low
i = 0
while b <= high:
binpoints[i] = b
b += delta
i += 1
for i in range(n):
bins = function_kde(x[i], y[i], 3, binpoints)
for j in range(nbins):
masterbin[j] += bins[j]
# Plot the master bin
plt.plot(binpoints, masterbin)
plt.plot(x, y)
plt.show()