NPR N-W Estimator
NPR N-W Estimator
NPR N-W Estimator
, Cov(
i
,
j
) = 0 for i = j.
To derive the estimator note that we can express m(x) in terms of the
joint pdf f(x, y) as follows:
m(x) = E[Y | X = x] =
_
yf(y | x)dy =
_
yf(x, y)dy
_
f(x, y)dy
We want to estimate the numerator and denominator separately using kernel
estimators. Firstly, for the joint density f(x, y) we use a product kernel
density estimator. ie
f(x, y) =
1
nh
x
h
y
n
i=1
K
_
x x
i
h
x
_
K
_
y y
i
h
y
_
=
1
n
n
i=1
K
h
x
(x x
i
)K
h
y
(y y
i
)
Hence, we have that
_
y
f(x, y)dy =
1
n
_
y
n
i=1
K
h
x
(x x
i
)K
h
y
(y y
i
)
Now,
_
yK
h
y
(y y
i
)dy = y
i
. Hence, we can write that
_
y
f(x, y)dy =
1
n
n
i=1
K
h
x
(x x
i
)y
i
This is our estimate of the numerator. For the denominator we have
_
f(x, y)dy =
1
n
n
i=1
K
h
x
(x x
i
)
_
K
h
y
(y y
i
)dy
=
1
n
n
i=1
K
h
x
(x x
i
) since the integral wrt y equals one
=
f(x)
1
Therefore, the Nadaraya-Watson estimate of the unknown regression func-
tion is given by
m(x) =
n
i=1
K
h
x
(x x
i
)y
i
n
i=1
K
h
x
(x x
i
)
=
n
i=1
W
h
x
(x, x
i
)y
i
where the weight function W
h
x
(x, x
i
) =
K
h
x
(xx
i
)
n
i=1
K
h
x
(xx
i
)
. Note that
n
i=1
W
h
x
(x, x
i
) =
1. This kernel regression estimator was rst proposed by Nadaraya (1964)
and Watson (1964). Note that the estimator is linear in the observations
{y
i
} and is, therefore, a linear smoother.
Asymptotic properties
This is complicated by the fact that the estimator is the ratio of two correlated
random variables. In the denominator we have that
E
f(x) f(x) +
h
2
2
2
K
f
(2)
(x)
and V (
f(x))
R(K)f(x)
nh
(See Section 2 on kernel density estimation)
For the the numerator,
E
_
n
i=1
K
h
x
(x x
i
)Y
i
_
=
_ _
v
1
n
K
_
x u
h
x
_
f(u, v)dudv
=
_ _
vK(s)f(x hs, v)dsdv (+)
using the change of variable s =
xu
h
x
. Now,
f(v | x hs) =
f(x hs, v)
f(x hs)
so that f(x hs, v) = f(v | x hs)f(x hs). The integral in (+) above is
therefore equal to
_ _
vK(s)f(v | x hs
)
f(x hs)dsdv =
_
K(s)f(x hs)
_
vf(v | x hs)dvds
2
=
_
K(s)f(x hs)m(x hs)ds
= f(x)m(x) +h
2
x
2
K
[f
(1)
(x)m
(1)
(x) +f
(2)
(x)m(x)/2 +f(x)m
(2)
(x)/2 +o(h
2
)]
using Taylor series expansions for f(x hs) and m(x hs). Therefore,
E m(x)
E
_
f(x, y)ydy
E
f(x)
f(x)[m(x) +h
2
x
2
K
(f
(1)
m
(1)
/f +f
(2)
m/(2f) +m
(2)
/2)]
f(x)[1 +h
2
x
2
K
f
(2)
/(2f)]
= m(x) +
h
2
x
2
2
K
_
m
(2)
(x) + 2m
(1)
(x)
f
(1)
(x)
f(x)
_
using the approximation that 1 +h
2
c)
1
(1 h
2
c) for small h in the factor
in the denominator and multiplying through. Hence, for a random design,
the
bias( m(x))
h
2
x
2
2
K
_
m
(2)
(x) + 2m
(1)
(x)
f
(1)
(x)
f(x)
_
However, in the xed design case the
bias( m(x))
h
2
x
2
m
(2)
(x)
When f
(1)
(x) = 0 the bias with a random design equals that with a xed
design. However, the two situations are not identical. The random design
has zero probability of being equally-spaced, even when f(x) is the U(0, 1)
pdf.
The V ( m(x)) can be obtained by using the following approximation for
the variance of the ratio of two random variables, N and D:
V
_
N
D
_
_
EN
ED
_
2
_
V (N)
(EN)
2
+
V (D)
(ED)
2
2Cov(N, D)
(EN)(ED)
_
provided the variance of the ratio exists. This result is based on a rst-order
Taylor series expansion. Now,
V
_
1
n
n
i=1
K
h
x
(x x
i
)Y
i
_
=
1
n
E[K
h
x
(x x
i
)Y
i
]
2
O(n
1
)
R(K)f(x)
nh
[
2
+m(x)
2
]
3
using the facts that
_
v
2
f(v | x hs) = [
2
(x) =
2
i=1
K
h
x
(x x
i
)Y
i
,
1
n
n
i=1
K
h
x
(x x
i
)
_
=
1
n
E[K
h
x
(x x
i
)
2
Y
i
] O(n
1
)
R(K)f(x)m(x)
nh
Substituting into the approximation formula gives
V ( m(x))
R(K)
2
nhf(x)
The variance of m(x) involves terms relating to the error variance
2
and
the relative amount of data through f(x).
We can use the above point-wise bias and variance results to construct
an expression for the AMSE of m(x) which is as follows:
AMSE( m(x))
h
4
x
4
4
K
_
m
(2)
(x) + 2m
(1)
(x)
f
(1)
(x)
f(x)
_
2
+
R(K)
2
nhf(x)
4