NPR N-W Estimator

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

The Nadaraya-Watson Estimator

Derivation of the estimator


We have a random sample of bivariate data (x
1
, Y
1
), . . . , (x
n
, Y
n
).
The Nadaraya-Watson estimator we will be studying in this section is
more suitable for a random design. ie. when the data come from a joint pdf
f(x, y). The regression model is
Y
i
= m(x
i
) +e
i
, i = 1, . . . , n
where m() is unknown. The errors {
i
} satisfy
E(
i
) = 0, V (
i
) =
2

, Cov(
i
,
j
) = 0 for i = j.
To derive the estimator note that we can express m(x) in terms of the
joint pdf f(x, y) as follows:
m(x) = E[Y | X = x] =
_
yf(y | x)dy =
_
yf(x, y)dy
_
f(x, y)dy
We want to estimate the numerator and denominator separately using kernel
estimators. Firstly, for the joint density f(x, y) we use a product kernel
density estimator. ie

f(x, y) =
1
nh
x
h
y
n

i=1
K
_
x x
i
h
x
_
K
_
y y
i
h
y
_
=
1
n
n

i=1
K
h
x
(x x
i
)K
h
y
(y y
i
)
Hence, we have that
_
y

f(x, y)dy =
1
n
_
y
n

i=1
K
h
x
(x x
i
)K
h
y
(y y
i
)
Now,
_
yK
h
y
(y y
i
)dy = y
i
. Hence, we can write that
_
y

f(x, y)dy =
1
n
n

i=1
K
h
x
(x x
i
)y
i
This is our estimate of the numerator. For the denominator we have
_

f(x, y)dy =
1
n
n

i=1
K
h
x
(x x
i
)
_
K
h
y
(y y
i
)dy
=
1
n
n

i=1
K
h
x
(x x
i
) since the integral wrt y equals one
=

f(x)
1
Therefore, the Nadaraya-Watson estimate of the unknown regression func-
tion is given by
m(x) =

n
i=1
K
h
x
(x x
i
)y
i

n
i=1
K
h
x
(x x
i
)
=
n

i=1
W
h
x
(x, x
i
)y
i
where the weight function W
h
x
(x, x
i
) =
K
h
x
(xx
i
)

n
i=1
K
h
x
(xx
i
)
. Note that

n
i=1
W
h
x
(x, x
i
) =
1. This kernel regression estimator was rst proposed by Nadaraya (1964)
and Watson (1964). Note that the estimator is linear in the observations
{y
i
} and is, therefore, a linear smoother.
Asymptotic properties
This is complicated by the fact that the estimator is the ratio of two correlated
random variables. In the denominator we have that
E

f(x) f(x) +
h
2
2

2
K
f
(2)
(x)
and V (

f(x))
R(K)f(x)
nh
(See Section 2 on kernel density estimation)
For the the numerator,
E
_
n

i=1
K
h
x
(x x
i
)Y
i
_
=
_ _
v
1
n
K
_
x u
h
x
_
f(u, v)dudv
=
_ _
vK(s)f(x hs, v)dsdv (+)
using the change of variable s =
xu
h
x
. Now,
f(v | x hs) =
f(x hs, v)
f(x hs)
so that f(x hs, v) = f(v | x hs)f(x hs). The integral in (+) above is
therefore equal to
_ _
vK(s)f(v | x hs
)
f(x hs)dsdv =
_
K(s)f(x hs)
_
vf(v | x hs)dvds
2
=
_
K(s)f(x hs)m(x hs)ds
= f(x)m(x) +h
2
x

2
K
[f
(1)
(x)m
(1)
(x) +f
(2)
(x)m(x)/2 +f(x)m
(2)
(x)/2 +o(h
2
)]
using Taylor series expansions for f(x hs) and m(x hs). Therefore,
E m(x)
E
_

f(x, y)ydy
E

f(x)

f(x)[m(x) +h
2
x

2
K
(f
(1)
m
(1)
/f +f
(2)
m/(2f) +m
(2)
/2)]
f(x)[1 +h
2
x

2
K
f
(2)
/(2f)]
= m(x) +
h
2
x
2

2
K
_
m
(2)
(x) + 2m
(1)
(x)
f
(1)
(x)
f(x)
_
using the approximation that 1 +h
2
c)
1
(1 h
2
c) for small h in the factor
in the denominator and multiplying through. Hence, for a random design,
the
bias( m(x))
h
2
x
2

2
K
_
m
(2)
(x) + 2m
(1)
(x)
f
(1)
(x)
f(x)
_
However, in the xed design case the
bias( m(x))
h
2
x
2
m
(2)
(x)
When f
(1)
(x) = 0 the bias with a random design equals that with a xed
design. However, the two situations are not identical. The random design
has zero probability of being equally-spaced, even when f(x) is the U(0, 1)
pdf.
The V ( m(x)) can be obtained by using the following approximation for
the variance of the ratio of two random variables, N and D:
V
_
N
D
_

_
EN
ED
_
2
_
V (N)
(EN)
2
+
V (D)
(ED)
2

2Cov(N, D)
(EN)(ED)
_
provided the variance of the ratio exists. This result is based on a rst-order
Taylor series expansion. Now,
V
_
1
n
n

i=1
K
h
x
(x x
i
)Y
i
_
=
1
n
E[K
h
x
(x x
i
)Y
i
]
2
O(n
1
)

R(K)f(x)
nh
[
2

+m(x)
2
]
3
using the facts that
_
v
2
f(v | x hs) = [
2

(x hs) + m(x hs)


2
] and

(x) =
2

for all x. (ie. a constant).


Also,
V (

f(x))
R(K)f(x)
nh
Finally,
Cov
_
1
n
n

i=1
K
h
x
(x x
i
)Y
i
,
1
n
n

i=1
K
h
x
(x x
i
)
_
=
1
n
E[K
h
x
(x x
i
)
2
Y
i
] O(n
1
)

R(K)f(x)m(x)
nh
Substituting into the approximation formula gives
V ( m(x))
R(K)
2

nhf(x)
The variance of m(x) involves terms relating to the error variance
2

and
the relative amount of data through f(x).
We can use the above point-wise bias and variance results to construct
an expression for the AMSE of m(x) which is as follows:
AMSE( m(x))
h
4
x
4

4
K
_
m
(2)
(x) + 2m
(1)
(x)
f
(1)
(x)
f(x)
_
2
+
R(K)
2

nhf(x)
4

You might also like