\documentclass[twocolumn,letterpaper]{article} \pagestyle{myheadings} %\input{seteps}
\usepackage{mcfnsSep2012}
%\usepackage[preprint] {draftcopy} %On Sep. 20, 2014, McfnsDraftcopy renamed to draftcopy
\usepackage{amsmath,bm,url}
\usepackage[pdftex]{graphicx}
\usepackage{setspace,lineno}
\usepackage{natbib} % no sorting, please!
\usepackage{color}
\usepackage[breaklinks,pdfstartview={FitH -32768},pdfborder={0 0 0},bookmarksopen,bookmarksnumbered]{hyperref} %\usepackage{bibtexlogo}
\usepackage[utf8]{inputenc}
\usepackage{epstopdf}
%______________________________________________________________________
% Define MCFNS variables:
% % % % % % % % % % % % % % % % % % %
% % % %Not sure how to format authors
%Roger C. Lowe III, PhD
%The University of Georgia,
%Warnell School of Forestry and Natural Resources, Athens Georgia, USA30605
%lowe@uga.edu
%Chris J. Cieszewski, PhD
%The University of Georgia,
%Warnell School of Forestry and Natural Resources, Athens Georgia, USA 30605
%biomat@uga.edu
% % % % % % % % % % % % % % % % % % %
\setcounter {page} {65} \def\issueno {2}
\def\editors	 {{\href{mailto:pbettinger@warnell.uga.edu}{Editor:~Pete Bettinger}}}
\def\submit 	 {May~23,~2014} 	%Submission date can be different than the issue year \issueyear
\def\accept 	 {Sep.~21,~2014} 		%The works should be Accepted & Published in the year of the Current_Issue \issueyear
\def\lasterrata	 {Sep.~26,~2014} 	%Last Errata date can be different than the Issue-Year \issueyear
\def\citename	{Lowe} 		%"Author" or "FirstAuthor et al."
\def\citeemail	{lowe@uga.edu} 	% Use later: {\href{mailto://\citeemail}{\citename}}
\def\citeetal	{et al.} 		% or {} %for a single author; or
\author{	{\href{mailto://\citeemail}{Roger C. \citename}}, %Change only 1st name of 1st author
		{\href{mailto:mail@cjci.net} {Chris J. Cieszewski}}
%}\affiliation{	\small\it{Professor, {\href{http://web.unbc.ca/~garcia/}{University of Northern British Columbia, Prince George, BC, Canada. Ph./FAX:\,250.960.5004/5539}}}
}\affiliation{
  \small\it{{\href{http://www.uga.edu}{Warnell School of Forestry and Natural Resources, The University of Georgia, Athens GA 30602 USA }}}
}\def\yourtitle	{{Multi-source K-nearest neighbor, Mean Balanced forest inventory of
	Georgia}} 				%need double {{ for \\ e.g.: {{Title \\ Subtitle}}
\def\yourkwords	{Landsat 5 Thematic Mapper, Forest Inventory and Analysis,
	landscape analysis, total balancing, large-area inventories}
\def\yourabstract{
We describe here a case study in compiling a high-resolution forest
inventory for central Georgia using the K-nearest neighbor approach with
multi-source data and Mean Balancing correction for the estimation bias. In
general, multi-source data collected through various incompatible designs
cannot be mixed due to intractable variances and unknown bias. Because of
this incompatibility abundant information about the environment (i.e.
atmospheric conditions, soil composition, spatio-temporal data from nearly
40 years of satellite imaging, and a wealth of site specific studies with
sampling for various growth attributes) frequently cannot be used to produce
new unbiased estimates for the variables and areas of interest. This study
was carried out in central Georgia, and the k-NN approach was used to
fuse together various incompatible data from public and private sources. We used
the Mean Balancing approach to remove the bias resulting from this data fusion.
The result of the study is a derivation of an unbiased
high-resolution forest inventory, which can be used for small area's fiber
supply assessment analysis.}%----------------------------------------------------------------------

% put any of your personal LaTeX definitions etc here.
    % --- math  ---
\newcommand{\asd}{\ensuremath{\mbox{\sc asd}}}
\newcommand{\astime}{\ensuremath{\mbox{\sc ast}}}
%\newcommand{\ttc}{\ensuremath{\mbox{\sc ttc}}}
\newcommand{\ud}{\,\mathrm{d}}
\newcommand{\ie}{i.e.\ }
\newcommand{\eg}{e.g.\ }
\usepackage{amssymb}

    \usepackage{amsmath,bm}
    \newcommand{\vc}[1]{\bm{#1}}
    \newcommand{\mat}[1]{{\mathrm #1}}  % or \bf
    \newcommand{\der}[2]{\frac{{\mathrm d}#1}{{\mathrm d}#2}}
    \newcommand{\pder}[2]{\frac{\partial #1}{\partial #2}}
    \newcommand{\dr}[2]{{\mathrm d}#1/{\mathrm d}#2}
    \newcommand{\dd}{\,{\mathrm d}}
    \newcommand{\diag}{\mathop{\mathgroup\symoperators diag}\nolimits}
    \newcommand{\abs}{\mathop{\mathgroup\symoperators abs}\nolimits}
    \providecommand{\e}{\mathrm e} % included in amsmath?

    %\usepackage{slpflts}
    \setcounter{topnumber}{5}
    \setcounter{totalnumber}{5}
    \renewcommand{\topfraction}{0.90}
    \renewcommand{\bottomfraction}{0.90}
    \renewcommand{\textfraction}{0.10}
    \renewcommand{\floatpagefraction}{0.80}
    \setcounter{dbltopnumber}{5}
    \renewcommand{\dbltopfraction}{0.90}
    \renewcommand{\dblfloatpagefraction}{0.80}

    \newcommand{\captionfont}{\small} % or {\sf}
    % or \newcommand{\captionfont}{\small}

    % Figure (tag, caption)
    \newcommand{\fig}[2]{\begin{figure}[htbp]\leavevmode\centering%
    \includegraphics[width=0.48\textwidth]{#1.pdf}\caption{\captionfont #2}%
    \label{fig:#1}\end{figure}}

    % Figure, two-column (tag, caption)
    \newcommand{\figw}[2]{\begin{figure*}[htbp]\leavevmode\centering%
    \includegraphics[width=0.96\textwidth]{#1.pdf}\caption{\captionfont #2}%
    \label{fig:#1}\end{figure*}}

    % Double figure
    \newcommand{\figdouble}[3]{\begin{figure*}[htbp]\leavevmode\centering%
    \includegraphics[width=0.48\textwidth]{#1.pdf}%
    \hfill%
    \includegraphics[width=0.48\textwidth]{#2.pdf}%
    \caption{\captionfont #3}\label{fig:#1}%
    \end{figure*}}

    % Two figures side by side
    \newcommand{\figs}[4]{%
    \begin{figure}[htbp]%
    \begin{minipage}[t]{0.48\linewidth}%
    \centering%
    \includegraphics[width=\linewidth]{#1}%
    \caption{\captionfont #2} \label{fig:#1}%
    \end{minipage}%
    \hfill%
    \begin{minipage}[t]{0.48\linewidth}%
    \centering%
    \includegraphics[width=\linewidth]{#3}%
    \caption{\captionfont #4} \label{fig:#3}%
    \end{minipage}%
    \end{figure}}

    \bibliographystyle{SAF}
    \bibpunct{(}{)}{,}{a}{}{,}

% THE REST SHOULD BE AUTOMATIC ... Go To the first Section ...

\title{\Large\bf\uppercase\yourtitle}
\begin{document} \markright{\hfil{{{\href{mailto://\citeemail}{\citename}\citeetal}}~(\issueyear)/\mcfnshead}} \twocolumn[ \begin{@twocolumnfalse}
\maketitle \hypersetup{pdftitle={\mcfnshead}, pdfauthor={\citename~(\issueyear)}, pdfsubject={\yourtitle}, pdfkeywords={\yourkwords}}  \hrule
\begin{abstract}\yourabstract\\\\{\bf Keywords:}~\yourkwords \end{abstract}\hrule\vspace{.3truein}\end{@twocolumnfalse}]
%\numberwithin{figure}{section}


% Continue with the first Section:

\section{Introduction}

Under multi-use sustainable natural resource management, the provision of
timely, reliable, and accurate information about natural resources, their
forested ecosystems, and adjacent areas is essential for maintaining their
ecological balance and sustained productivity. This is especially important
where forests tend to be fast growing and changing, highly fragmented in
area and ownership, and the demand for their wood products is high, such as
those in Georgia and other southeastern states. However great the need for
forest product is though, there is a lack of detailed stand-level
information for large portions of this region.

The United States Forest Service Forest Inventory and Analysis (FIA) Unit
program collects forest information and produces regular reports on the
condition of forests throughout the country. In Georgia, the FIA data is
used in various large area inventory based analyses ranging from carbon
studies to tree mortality analysis (\citealt{vandeusen10}, \citealt{meng07}).
The FIA inventory provides reliable, unbiased estimates suitable for
reporting across large areas (\citealt{blackard08}, \citealt{walker07},
\citealt{sivianpillai06}, \citealt{wayman00}). However, the large-area FIA inventories
are not suitable for applications to smaller areas, and there is still a compelling need for
higher-resolution forest information. A more suitable source for this
information is compiled by local agencies familiar with those areas whose
intimate knowledge is needed for their management. The forest product
industry and other large area forest owners typically maintain their own
private inventories that are more detail oriented and suitable for small
area, stand-level, forest management.

Nearest neighbor methods are an established means to generate estimates of
forest volume (\citealt{trotter97}, \citealt{franco01}, \citealt{mcroberts12}),
basal area (\citealt{mcroberts09a}, \citealt{meng09a},
\citealt{sivianpillai06}), biomass (\citealt{gjertsen07}, \citealt{tomppo08}, \citealt{reese10}),
and carbon (\citealt{mcroberts10}, \citealt{labrecque06},
\citealt{blackard08}), to only name a few. This method's popularity, in
part, stems from its intuitive implementation, the ability to generate
simultaneous estimates for multiple variables using the same parameters
usually the number of nearest neighbors K, and the ability to make use of
noisy data for prediction (\citealt{cieszewski08}). However, the use of
nearest neighbor methods with multi-source data are inherently biased
(\citealt{iles09}) and should be appropriately considered.

The total-balancing concept proposed by Iles (\citealt{iles09}, also \citealt{cieszewski05}) is the foundation of our
approach to addressing the issue of bias in our high-resolution forest
inventory for the state of Georgia. In this approach, the large-area FIA
information and local forest inventories are used together to develop a
spatially explicit inventory that maintains the large-area unbiased
properties of the FIA inventory and the local precision of the forest
industry inventories, even though they are traditionally viewed as having
incompatible variances. The purpose of this research is twofold. First, we
generate a broad area, high resolution, spatially explicit inventory for
Georgia that is equal to an unbiased mean volume per hectare derived from
the FIA. Second, we demonstrate the potential gains in local precision we can
obtain by fusing local inventory information with the explicit inventory
while maintaining overall balancing.

\section{Methods}
\subsection{Study area}

The study area for this research is the state of Georgia, USA (Fig. ~\ref{fig1}). As a
whole, Georgia is a typical southern state with 66.7{\%} of forest cover. It
has over 9.7 million hectares of forestland, of which approximately 45{\%}
are conifer, 42{\%} are deciduous, 12{\%} are a mixed forest type, and the
remaining percentage non-stocked (\citealt{cieszewski07}). Adding to the
complexity of the landscape, an approximate 650,000 non-industrial
landowners hold 75{\%} of the forestland whose average parcel size is
decreasing (Georgia Forestry Commission, 2008).



\begin{figure}[htb]
	\centerline{\includegraphics[width=.45\textwidth]{Lowe_etal_galtmscenes.eps}}
	\caption{The 12 Landsat WRS 2 scenes included in the study.}
	\label{fig1}
\end{figure}



There are distinct differences in the composition of Georgia forests when
comparing its locations from north to south. Hardwood ecosystems dominate
the areas in the north part of Georgia (Fig. \ref{fig2}A) (Tab.\ref{tab1}). The forests
transition to conifer-dominated ecosystems as one proceeds southward and to
the east (Fig. \ref{fig2}B) (Tab. \ref{tab1}). \newline

\begin{figure}[htb]
	\centerline{\includegraphics[width=.48\textwidth]{Lowe_etal_fiaregvols.eps}}
	\caption{Total volume summarized by FIA regions for the A) coniferous forestland, and B) deciduous forestland for the state of Georgia.}
	\label{fig2}
\end{figure}

% Table generated by Excel2LaTeX from sheet 'Sheet1'
\begin{table*}[htb]
	\caption{Forestland area and total volume summarized by FIA regions reported by the FIA.}
	\begin{center}
\begin{tabular}{ccccccc}
	\cline{2-7}      & \multicolumn{2}{c}{Conifer Forest} & \multicolumn{2}{c}{Mixed Forest} & \multicolumn{2}{c}{Deciduous Forest} \\
	\hline
	FIA Region & Forestland & Volume & Forestland & Volume & Forestland & Volume\\
	& (1000 ha) & (Mil. m$^3$) & (1000 ha) & (Mil. m$^3$) & (1000 ha) & (Mil. m$^3$)\\
	\hline
	Northern & 228.2 & 486.4 & 189   & 435   & 782.3 & 1,781.00 \\
	North Central & 452.4 & 911.4 & 176   & 314.8 & 679.5 & 1,504.20 \\
	Central & 1,487.20 & 2,378.30 & 340.7 & 463   & 1,241.20 & 2,060.70 \\
	Southeastern & 1,764.80 & 2,560.40 & 311.6 & 392.5 & 1,109.60 & 1,678.20 \\
	Southwestern & 568.2 & 836.9 & 145.7 & 182.9 & 446.4 & 695.7 \\
	\hline
\end{tabular}
		\label{tab1}
	\end{center}
	
\end{table*}


\subsection {Satellite imagery}

We used Landsat 5 Thematic Mapper satellite imagery to model the cubic-meter
per hectare estimates using the k nearest neighbor approach. We attempted to
attain imagery from the leaf-off season early in the year, leaf-on from the
summer months, and another leaf-off image from late in the year. However,
this was not possible in all cases (Tab. \ref{tab2}) due to cloudy conditions. We
acquired two to four cloud-free images for each of the 12 scenes that wholly
or partially cover most of the state. A minimum of eight well-distributed
ground control points were located on each scene and the root-mean square
error calculated using the early leaf-off scene as the base image. No RMSE
exceeded 30 meters. A visual inspection revealed no egregious misalignment
in the imagery. Two UTM zones, zone 16 and zone 17, overlap the state. To
facilitate processing, we created a custom coordinate system definition that
shifted UTM zone 17 west by 500,000 meters and projected each image to this
custom UTM zone. Landsat 5 Thematic Mapper bands 1 -- 5 and 7 from each
image were composited and used at the 30 meter resolution.

%\textbf{2. Acquisition dates of the Landsat 5 satellite imagery used in the volume estimation %processes.}
% Table generated by Excel2LaTeX from sheet 'Sheet1'
\begin{table*}[htbp]
	\centering
	\caption{Acquisition dates of the Landsat 5 satellite imagery used in the volume estimation processes.}
	\begin{tabular}{ccccc}
		\hline
		TM Scene & Image 1 & Image 2 & Image 3 & Image 4 \\
		\hline
		Path 17, Row 37 & 4/11/2010 & 9/2/2010 & 12/7/2010 & NA \\
		Path 17, Row 38 & 3/13/2010 & 6/14/2010 & 12/7/2010 & NA \\
		Path 17, Row 39 & 6/14/2010 & 10/4/2010 & 11/21/2010 & NA \\
		Path 18, Row 36 & 4/18/2010 & 6/21/2010 & 11/12/2010 & NA \\
		Path 18, Row 37 & 4/2/2010 & 10/11/2010 & 12/14/2010 & NA \\
		Path 18, Row 38 & 1/12/2010 & 5/20/2010 & 10/11/2010 & 12/14/2010 \\
		Path 18, Row 39 & 2/13/2010 & 10/11/2010 & 12/14/2010 & NA \\
		Path 19, Row 36 & 11/16/2009 & 3/24/2010 & 10/2/2010 & 11/19/2010 \\
		Path 19, Row 37 & 2/20/2010 & 7/30/2010 & 11/19/2010 & NA \\
		Path 19, Row 38 & 2/20/2010 & 7/30/2010 & 11/19/2010 & NA \\
		Path 19, Row 39 & 2/20/2010 & 10/18/2010 & 11/19/2010 & NA \\
		Path 20, Row 36 & 1/29/2010 & 10/9/2010 & NA    & NA \\
		\hline
	\end{tabular}%
	\label{tab2}%
\end{table*}%

\subsection {Inventory data}

To maintain plot integrity, the Forest Service does not release cruise plot
coordinates to the public. However, they do allow their use at one of their
secure data centers. We processed the satellite imagery using their field
measured GPS locations at the Southern Research Station in Knoxville,
Tennessee in December of 2010. We used a series of \textit{arcpy} (\citealt{esri10}) scripts to
extract the TM band 1 -- 5 and 7 pixel values for each FIA cruise locations
for all images used in this study (Tab. \ref{tab2}).

A fundamental aspect of the FIA's measurement protocol is the fact that
measured plots shall not be given preferential treatment by the inventory
crew or the public. The landowner is permitted to manage the forest as they
see fit. Thus, there is the possibility that the database may contain
outdated information about a plot since any changes to the land that occurs
after the inventory are not recorded until the next measurement cycle.
Absent of the plot locations outside the FIA's data center, we were unable
to perform a visual inspection of the TM data at each plot center. However,
we did evaluate the spectral information stored in the training sample list
using a series of pseudo-image composites, where the "pseudo-image
composites" refer to an image whose pixels have been sequentially
rearranged from the lowest NDVI (Eq. 1) (on the left) to the highest NDVI
(on the right). We used the following steps to generate the pseudo-images
for each scene:

\begin{enumerate}
	\item sort the training samples according to their NDVI (Eq. 1) values,
	\item reorganize the data into a grid that stores the information from one TM band,
	\item repeat step 2 for each spectral band in the training sample list,
	\item import each band into ArcGIS, and generate the pseudo-image using the \textit{Composite Bands} command, and then
	\item repeat steps 2 and 4 for the NDVI values.
\end{enumerate}


\begin{equation}
	NDVI = \frac{NIR-RED}{NIR+RED} \label{eq.ndvi}
\end{equation}
where:
	NIR is the near-infrared layer (TM band 4), and
	RED is the red layer (TM band 3).


The pseudo-image shown in Figure \ref{fig3} presents the spectral information stored
in the training sample list in an organized manner where the site with the
lowest NDVI measure occurs in the lower-left and the site with the highest
NDVI measure occurs in the upper-right. By visual inspection of the actual
TM and NDVI images, and with the aid of the 2010 NAIP aerial photography, we
were able to loosely define the cover types represented in sections A -- G
as seen in Figure \ref{fig4}. The pixels in sections A) and B) were captured in areas
void of green vegetation such as a cultivated field (Fig. \ref{fig4}) or a place
inundated with water. Near the other end of the NDVI spectrum, the samples
in frame F) (Fig. \ref{fig3}) are sites captured in mature forested areas with full
canopy closure (Fig. \ref{fig4}). The sites in frames C through E contain samples
from old fields, young pine plantations, and thinned forests. Frame G
contained the samples with the highest NDVI values. These are cropland sites
with abundant, low-lying, fast green vegetation. We used the
forested/non-forested thresholds determined by this process for each scene
to assess which, if any, FIA plots had been harvested between the time a
plot was measured and the capture of the late-winter TM image. We assigned
those plots a volume per hectare (m$^{3})$ equal to zero.

We used the Forest Vegetation Simulator (\citealt{wykoff96}, \citealt{dixon02}) to
project the FIA field measurements to a common end-of-year 2010. These data
are our \textit{2010 common timeline FIA} data. We implemented the SN variant and accepted the data
processing defaults. The projected dataset contains 6,367 total plots that
we have classified as deciduous, mixed, or evergreen according to their
dominant specie representation (Tab. \ref{tab3}). There were 150 non-stocked and
2,122 non-forested plots within the state that were not used in the
analysis.

\begin{figure}[htb]
	\hspace{-.05in}\centerline{\includegraphics[width=.485\textwidth]{Lowe_etal_pseudoimage1.eps}}
	\caption{Pseudo-Landsat image generated from FIA sample sites representing A) bare ground sites, B-C) the transition to forest, D-F) the transition to a closed canopy forest, and G) cropland.}
	\label{fig3}
\end{figure}

\begin{figure}[htb]
	\centerline{\includegraphics[width=.475\textwidth]{Lowe_etal_pseudoimage2.eps}}
	\caption{Visually assessed A-C) non-forest, D-F) sparse forest, and G-I) closed canopy sites as they relate to samples in the J) pseudo-Landsat image (Figure 3 and how they appear in the 2010 color-infrared NAIP (A, D, G), the winter TM (B, E, H), and the NDVI (C, F, I) images.}
	\label{fig4}
\end{figure}
% Table generated by Excel2LaTeX from sheet 'Sheet1'
\begin{table*}[htbp]
	\centering
	\caption{Summary of age, basal area, and cubic-meter volume per hectare for all FIA ground measurements.}
	\begin{tabular}{rrccccc}
		\hline
		&       & Hardwood & Mix   & Pine  & Non-stocked & No Forest \\
		\hline
		& \multicolumn{1}{c}{\# of Stands} & 1,628 & 634   & 1,833 & 150   & 2,122 \\
		\multicolumn{1}{c}{Age} & \multicolumn{1}{c}{Mean} & 48    & 38    & 27    & 3     & 0 \\
		& \multicolumn{1}{c}{St. Dev.} & 30    & 24    & 17    & 2     & 0 \\
		& \multicolumn{1}{c}{Min.} & 0     & 1     & 0     & 0     & 0 \\
		& \multicolumn{1}{c}{Max.} & 149   & 162   & 115   & 5     & 0 \\
		\multicolumn{1}{c}{Basal Area (m$^2$)} & \multicolumn{1}{c}{Mean} & 9.1   & 21.8  & 21.8  & 0     & 0 \\
		& \multicolumn{1}{c}{St. Dev.} & 4.9   & 10.3  & 10.6  & 0     & 0 \\
		& \multicolumn{1}{c}{Min.} & 0     & 0     & 0     & 0     & 0 \\
		& \multicolumn{1}{c}{Max.} & 38.3  & 60.4  & 98.7  & 0     & 0 \\
		\multicolumn{1}{c}{Vol / Ha (m$^3$)} & \multicolumn{1}{c}{Mean} & 144.1 & 125   & 117   & 0     & 0 \\
		& \multicolumn{1}{c}{St. Dev.} & 109.8 & 89.8  & 83.2  & 0     & 0 \\
		& \multicolumn{1}{c}{Min.} & 0     & 0     & 0     & 0     & 0 \\
		& \multicolumn{1}{c}{Max.} & 810.6 & 436   & 622   & 0     & 0 \\
		\hline
	\end{tabular}%
	\label{tab3}%
\end{table*}%

We stratified the plots further by the WRS2 scene boundaries (Fig. \ref{fig1}). There
is overlap among the scenes in both a north-south and east-west direction so
some plots were used multiple times in different scene-level calculations
(Tab. \ref{tab4}). These stratified data are the source of the target mean used in
the scene-level scaling process and as the input \textit{training} samples used in the volume
estimation process. The data files, one for each TM scene, includes FIA plot
age, cubic-meter basal area per hectare (BA), cubic-meter volume per hectare
(CF), county FIPS code, the TM scene identifier, and the TM spectral
summaries that were recorded at each plot center.

We obtained 918 conifer forest polygons and associated stand summary
information from our various industrial partners with holdings in WRS2 path
18, row 37. We visually inspected each area on the early and late in the
year leaf-off TM and on the 2010 USDA Farm Service Agency National
Agriculture Imagery Program aerial photography to ensure the data did not
include any partially harvested stands. We manually recoded the stand
summaries to zero for any stand that reflected a total harvest. We projected
the individual stand ages, volumes and basal area measures to a common 2010
end-of-year timeline. The final industrial data set contained 19,210
hectares. Their ages ranged from zero to 61 and average volume per hectare
was 158 m$^{3}$.
% Table generated by Excel2LaTeX from sheet 'Sheet1'
\begin{table*}[htbp]
	\centering
	\caption{Summary of 2010 common timeline FIA plot measurements for the 12 TM scenes encompassing the state of Georgia.}
	\begin{tabular}{ccccccc}
		\hline
		&       &       & \multicolumn{4}{c}{Volume / Hectare (m$^3$)} \\
		\cline{4-7}    Path / Row & Cover Type & \# Plots & Mean  & St. Dev. & Min.  & Max. \\
		\hline
		17 / 37 & Pine  & 358   & 126   & 8     & 0     & 569 \\
		17 / 38 & Pine  & 682   & 111   & 5.4   & 0     & 569 \\
		17 / 39 & Pine  & 264   & 97.3  & 7.6   & 0     & 348 \\
		18 / 36 & Pine  & 107   & 151   & 17.6  & 0     & 622 \\
		18 / 37 & Pine  & 613   & 124   & 6     & 0     & 569 \\
		18 / 38 & Pine  & 689   & 106   & 5     & 0     & 569 \\
		18 / 39 & Pine  & 183   & 102   & 9     & 0     & 295 \\
		19 / 36 & Pine  & 143   & 139   & 14.7  & 0     & 622 \\
		19 / 37 & Pine  & 345   & 125   & 8.1   & 0     & 513 \\
		19 / 38 & Pine  & 253   & 105   & 7.9   & 0     & 376 \\
		19 / 39 & Pine  & 37    & 108   & 21.7  & 0     & 276 \\
		20 / 36 & Pine  & 84    & 115   & 15.7  & 0     & 513 \\
		\hline
	\end{tabular}%
	\label{tab4}%
\end{table*}

\subsection {Land cover}

We used a composited 2008 Land Use Trends Land Cover of Georgia (GLUT)
(NARSAL 2006) and National Land Cover Data (NLCD) 2006 (USDOI, 2006) to
stratify the land base into generic conifer, mixed forest, and deciduous
forest types. The composite was created using a raster intersection where
\begin{equation*}
\textit{Composite Land Cover}=\textit{GLUT}*1000+\textit{NLCD}.
\end{equation*}
This procedure outputs a single raster layer whose values represent the
inputs from both data sets. For example, a cell whose output is "31042"
represents an area classified by GLUT as \textit{clearcut}, class 31, and classified by NLCD
as \textit{evergreen,} class 42. The overlay resulted in almost 200 unique combinations. We
reduced the number of classes by reclassifying the cells using the
cross-matrix (see Tab. \ref{tab5}). We assumed that the cells classified by GLUT as a
clearcut would ultimately result in an evergreen forest. We assigned all
cells classified as evergreen by one agency and deciduous by the other to
the mixed forest type. The classes not listed in the table (i.e. urban, GLUT
cropland/pasture, and water) were used as a non-forest mask. Statewide,
there were a total of 9,744,747.8 hectares of forested land represented in
this dataset. Adhering to the above re-classification scheme, we labeled
44{\%} of stands as conifer, 18{\%} as mixed, and 37{\%} as deciduous class.

\subsection {Software}

We used a variety of commercial software and in-house programs to process
the data. Image co-registration, data projection, land cover
re-classification, and data cataloging tasks were performed in ESRI's ArcGIS
(\citealt{esri10}) and ERDAS' Imagine (\citealt{erdas10}). We converted the data layers
among common GIS image formats and generic binary formats using the GDAL
interfaced with Python (\citealt{vanrossum03}) and Perl (The Perl Foundation). We
developed custom programs written with Lahey/Fujitsu LF95 v. 8.1b Fortran
compiler to implement the nearest-neighbor processing, data summarization,
and image generation.

\subsection {Initial KNN estimation based on the FIA data}

In this study, the volume prediction for a pixel was determined using:
\begin{equation*}
\hat{y}_{i}=\frac{1}{k}\sum_{j=1}^{k}y_{j}^{i}
\end{equation*}
where $\hat{y}_i$ is the predicted value for pixel $i$; and
$\{y_{j}^{i}, j=1,2,...,k\}$ are the $k-$spectrally nearest response values stored in the training list.


This process can be modified using a weighting factor which is commonly
based on the physical distance between pixel $i $and the location of the K
neighbor(s). However, this was not implemented due to limited access to the
actual FIA plot locations. We tested the number of near neighbors for each
scene using leave-one-out cross-validation analyses. Using the FIA training
list data as input, we generated volume estimates for K=1 to 20. In this
process, we limited the nearest neighbor selection to entries with the same
composite land cover type. Following the recommendation of McRoberts (2002),
the optimal K was selected as the value of k that produces an RMSE (Eq. 2)
no larger than 2.5{\%} of the minimum (RMSE value across the same range of
K). \newline

\begin{equation}
RMSE=\sqrt{\frac{\sum_{i=1}^{6}(y_i-\hat{y}_i)^2}{n}}
\end{equation}
where $y_i$ is the ground-observed, assumed to be true, measurement for sample $i$, $\hat{y}_i$ is the predicted value for sample $i$, and $n$ is the total number of samples.

It should be noted that the goal in optimal selection of K is not only
improving accuracy, but often it is also preserving the co-variance between
different predicted variables while preserving the range of their predicted
values. The process of generating volume per hectare (m$^{3})$ estimates for
pixel $i$ initiated with the selection of the K-nearest entries in the training
list. Nearness in this study refers to the Euclidean spectral distance (ESD)
and is calculated using equation 3. The process was executed with the
following steps:

\begin{enumerate}
	\item calculate ESD from each forested pixel $i$ to each entry in the training list having the same composite land cover type,
	\item use Fortran's intrinsic \textit{minval} and \textit{minloc} to find the first closest neighbor in the list,
	\item store the volume per hectare value associated with the spectrally nearest entry in the training list and mask it from the list of spectral distances,
	\item repeat 2 {\&} 3 K times, and then
	\item average those samples to form the KNN-based volume per hectare (m$^{3})$ estimate .
\end{enumerate}

\begin{equation}
ESD=\sqrt{{\sum_{i=1}^{6}(j_i-{k}_i)^2}}
\end{equation}
where $j_{i}$ is the band $i$ value for the $j^{th}$ entry in the training list and $k_{i}$ is the band $i$ value for the current pixel in the image.

% Table generated by Excel2LaTeX from sheet 'Sheet1'
\begin{table*}[htbp]
	\centering
	\caption{Reclassification matrix used to combine the GLUT and NLCD land cover products.}
	\begin{tabular}{ccccccc}
		& \multicolumn{6}{c}{2008 GLUT}\\
		\cline{1-7}          & Clear- & Decid- & Ever- &       & Forested & Nonforest\\
		NLCD 2006 & cut(31) & uous(41) & green(42) & Mixed(43) & Wetland(91) & Wet.(93)\\
		\hline
		Deciduous(41) & Ever. & Decid. & Mixed & Mixed & Decid. & NA \\
		Evergreen (42) & Ever. & Mixed & Ever. & Mixed & Ever. & NA \\
		Mixed (43) & Ever. & Mixed & Mixed & Mixed & Mixed & NA \\
		Evergreen (52) & Ever. & Mixed & Ever. & Mixed & Ever. & NA \\
		Clearcut (71) & Ever. & Decid. & Ever. & Mixed & Ever. & NA \\
		Crop (81, 82) & Ever. & Decid. & Ever. & Mixed & NA    & NA \\
		Wetland (90) & Ever. & Decid. & Ever. & Mixed & Decid. & Decid. \\
		\hline
	\end{tabular}%
	\label{tab5}%
\end{table*}%


An advantage of the KNN method is the ability to make many estimates for a
single location given the information available in the training list. The
additional information we stored for each pixel included mean spectral
distance and a blended land cover. The blended land cover was created by
storing the majority composite land cover type. We assigned a mixed type
where there was no majority.

In step 1 of the initial estimation process, we use the composite land cover
data to influence the nearest neighbor selection by limiting the available
entries in the training list to only those with similar cover types
(conifer, mixed, or deciduous). While the GLUT and NLCD were the most
current state and/or national scale data sets publicly available, they were
not current to the dates of the TM used in this study and required
fine-tuning to bring them up to the current common timeline. We transformed
the late-season TM to NDVI and generated two derivatives. The first
(NDVI$_{F})$ contained the NDVI information for all forested pixels
represented in the composite land cover dataset. The remaining non-forested
pixels were masked out. The second derivative (NDVI$_{NF})$ had NDVI
information for the non-forested pixels represented in the composite land
cover and the forested pixels were masked out. We used a series of
thresholds and visual inspections of both NDVI derivatives to create a
current timeline 1) forested mask, and 2) land cover layer (LCOV). We
removed the areas from the forested mask that were originally classified as
a forest, yet through visual inspection of the NDVI$_{F}$ data, were
determined to be non-forested. On the other hand, we added to the forested
mask the areas we determined to be wrongly classified as non-forest in the
NDVI$_{NF}$ data. We assigned the forested pixels the blended land cover
label to create the LCOV layer.

\subsection {Mean-balancing to the FIA mean volume per-hectare}

The objective of the Mean Balancing process is to remove any potential bias
in the estimated mean by adjusting individual pixel estimates up or down so
the TM-based mean for an area, in this study a Landsat scene, equals the
mean of the FIA plot measurements from the same area. We implemented two
balancing methods. Scaling in the first method, we refer to it in this paper
as \textit{ordered Mean Balancing}, is based on each pixel's Euclidean spectral distance where those cells
with large ESD values are adjusted more often. Throughout the iterative
process, pixels with the largest Euclidean spectral distance are adjusted
first. In each subsequent pass, the ESD threshold for pixel selection and
adjustment is lowered to include a larger number of pixels. Some pixels,
especially those with a large ESD, may be adjusted multiple times while it
is possible others are not adjusted at all. Each TM scene was processed
separately as were the conifer, mixed, and deciduous cover types as denoted
in LCOV dataset. The protocol we followed is as follows:

\begin{enumerate}
	\item calculate the TM-based mean volume per hectare (VAC$_{L})$ for a TM scene, include only cells attributed with the current LCOV type (conifer, deciduous, or mixed);
	\item calculate the mean volume per hectare of the FIA plots (VAC$_{F})$ that fall within the same TM scene and are attributed with the current LCOV type (conifer, deciduous, or mixed);
	\item select the pixels equal to or larger than the ESD threshold, and either
	\begin{enumerate}
		\item adjust the selected pixels by the ratio of the maximum FIA plot volume per hectare to the maximum estimated volume per hectare represented in this set of pixels if VAC$_{L}$ is less than VAC$_{F}$, or
		\item decrease the selected pixel values by 2.5{\%} if VAC$_{L}$ is greater than VAC$_{F}$;
	\end{enumerate}
	\item recalculate VAC$_{L}$,
	\item repeat steps 3 and 4 until VAC$_{L}$ is within 2{\%} of VAC$_{F}$, and then
	\item rescale all pixels by the ratio of VAC$_{F}$ to VAC$_{L}$ to ensure the balanced mean volume per hectare pixel estimates for area equals the FIA's estimate from the same area.
\end{enumerate}
In the second method, pixel values were scaled proportionally by the ratio
of the FIA target mean, VAC$_{F}$, to the TM-based mean volume per hectare,
VAC$_{L}$. In this paper, we refer to this approach as \textit{proportional Mean Balancing}.

\subsection {Fusion of industry and initial KNN estimates}

We demonstrate an additional improvement to our spatially explicit inventory
with the fusion of information from a high-intensity ground-based inventory
of industrial pine sites in the central Georgia, path 18, row 37 scene. The
goal of this process was to incorporate those measurements we think are
highly accurate into our TM-based volume estimates and preserve them
throughout the balancing process. Equalization was implemented on a
stand-by-stand basis where only the pixels within an inventoried stand were
adjusted. Pixel estimates for areas outside these managed areas were not
modified. For each stand individually, we:

\begin{enumerate}
	\item determined the mean of the initial KNN estimate for a given stand, then
	\item adjusted the initial KNN estimates within its stand boundary by the ratio of industry and KNN means, and
	\item reset the ESD measure for each of the pixels within the given stand boundary to zero (indicating a very accurate estimate) and then
	\item re-ran the ordered Mean Balancing routine for the entire scene.
\end{enumerate}
By resetting the ESD measures within each stand to zero, we reduce the
likelihood, but do not eliminate the possibility; an individual estimate
will be adjusted during the Mean Balancing process.

\textbf{Assessment}

We present the leave-one-out RMSE associated with each optimal K (Fig. \ref{fig5}) as
a measure of accuracy of the initial KNN estimation process. Additionally,
we calculated mean absolute errors (MAE) (Eq. 4) for the Mean Balancing
results. We also present summaries of the Mean Balancing processes for each
scene for the pine type contained in LCOV for the initial KNN estimates and
the Mean Balanced estimates.

\begin{equation}
MAE=\frac{\sum_{i=1}^{n}\left | \hat{y}_i-y_i \right |}{n}
\end{equation}
where $y_i$ is the ground-observed, assumed to be true, measurement for sample $i$, $\hat{y}_i$ is the predicted value for sample $i$; and $n$ is the total number of samples.

We present an assessment of the estimates generated by the 1) initial KNN,
2) the two Mean Balancing approaches, and the 3) industry-infused and Mean
Balanced processes for the central-Georgia scene, path 18, row 37. The field
measurements and GIS data obtained from our industrial partners were not
used in the first two estimation routines. Therefore, we use the RMSE and
MAE calculated across each industrial stand as an assessment of their
accuracy based on an independent, albeit limited in terms of forest type,
data source. The industrial data is an integral part of the industry-fused
process, so we do not consider them suitable samples for independent
validation. However, we present their summaries to confirm the improvement
in prediction accuracy achieved through this process.

Finally, to demonstrate the varying results one would attain by querying the
1) standard FIA database, the 2) initial KNN, both 3) Mean Balanced, and the
4) industry-infused data. We present the results of a series of queries at
varying scales. We first present summaries for Hancock County, Georgia for
the conifer type. There are 97 industrial stands, approximately 2,023
hectares, located in the county. The final two summaries are centered at
33.3141 degrees north and 82.9368 west with a radius of
$\raise.5ex\hbox{$\scriptstyle 1$}\kern-.1em/
\kern-.15em\lower.25ex\hbox{$\scriptstyle 2$} $ mile (203.4 hectares) and
3.5 miles (9,967.1 hectares). There are five industry stands within the
3.5-mile radius that encompass 83 hectares and one stand less than 8.1
hectares in size within the 0.5-mile radius.
\section{Results}

The path 19, row 39 scene is located in the extreme southwestern part of the
state. The image covers approximately 74,866.9 hectares of forested land and
contains 69 FIA plots. The scene with the next smallest coverage of the
state, path 18, row 39, encompasses 1.2 million hectares of forested land
and 339 forested FIA plots. Due to the small number of plots and a
relatively large percentage of overlap by adjacent scenes, nearly 88{\%}, we
processed the path 19, row 39 data using K=1. While we scaled the volume per
hectare estimates to its FIA scene mean using the same approach as the other
scenes, we used the estimates from the adjacent scenes in the overlapping
areas (path 19, row 38 and path 18, row 39) when possible. Unless specified,
the following sections focus on the remaining 11 scenes used in this study.

\begin{figure}[htb]
	\centerline{\includegraphics[width=.48\textwidth]{Lowe_etal_optimalk.eps}}
	\caption{Root-mean squared error measures for K=1 to K=20 for the 12 TM scenes that were generated during the determination of the optimal K}
	\label{fig5}
\end{figure}

\subsection {Selection of the optimal K}

The leave-one-out KNN assessment of cubic-meter volume per hectare based on
the training data revealed an initial decrease in RMSE as the number of
neighbors was increased. The gain in accuracy continued from K=3 to K=10 and
then leveled off (Fig. \ref{fig5}, Tab. \ref{tab6}). Root-mean squared error values for the
optimal K ranged from 55.3 m$^{3}$/ha 55{\%} of the FIA mean for path 19,
row 39, to 87.2 m$^{3}$/ha, or 71{\%} of the FIA mean for path 18, row 38
(Tab. \ref{tab6}).

% Table generated by Excel2LaTeX from sheet 'Sheet1'
\begin{table*}[htbp]
	\centering
	\caption{Optimum K and resulting combined-type RMSE for each TM scene used in the study.}
	\begin{tabular}{cccc}
		\hline
		Path / Row & Optimal K & RMSE (m$^3$/ha) & Relative RMSE \\
		\hline
		17 / 37 & 4     & 81.8  & 67\% \\
		17 / 38 & 5     & 73.8  & 60\% \\
		17 / 39 & 6     & 73.9  & 60\% \\
		18 / 36 & 8     & 88.9  & 73\% \\
		18 / 37 & 10    & 71.3  & 58\% \\
		18 / 38 & 8     & 87.2  & 71\% \\
		18 / 39 & 3     & 56.1  & 46\% \\
		19 / 36 & 6     & 74.9  & 61\% \\
		19 / 37 & 6     & 68.3  & 56\% \\
		19 / 38 & 5     & 71.9  & 59\% \\
		19 / 39 & 1     & 55.3  & 45\% \\
		20 / 36 & 6     & 67.8  & 55\% \\
		\hline
	\end{tabular}%
	\label{tab6}%
\end{table*}%

The compression of the range of initial volume per hectare estimates is
apparent in this study. Initial volume per hectare estimates assessed on the
entries of the training list data ranged from 0 to 388.6 m$^{3}$/ha, less
than half of the range of the FIA measurements (Fig. \ref{fig6}). The KNN-derived
mean for the training list entries was 21{\%} below the mean calculated from
the FIA data, 101.7 m$^{3}$/ha and 129.5 m$^{3}$/ha, respectively.

The northern Georgia path 18, row 36 scene yielded the largest RMSE (88.9
m$^{3}$/ha) and the southern Georgia scene, path 18, row 39, produced the
smallest (56.1 m$^{3}$/ha). There were approximately 9{\%} more forested
hectares, a total of 10.6 million hectares, in LCOV than reported by the
initial composite land cover data. There are 5.2 million hectares of conifer
represented in LCOV, 4.6 million hectares of deciduous, and 2.9 million
hectares of mixed forest type.

\subsection {Model assessment}

The summaries shown below are products of an assessment made simultaneously
on the training list samples compiled during the estimation processes where
the training list entries were treated as a separate list of pixels in need
of an estimate. The cover type designations used in these summaries were
assigned by the LCOV data layer.

Initial KNN point estimates of conifer volume per hectare (m$^{3})$, were on
average 22{\%} below the 2010 common timeline FIA estimates (Tab. \ref{tab7}), the
ordered Mean Balanced estimates 13{\%} below the 2010 common timeline FIA
estimates, and the proportional Mean Balanced estimates were, on average,
26{\%} above the 2010 common timeline FIA estimates. Minimum RMSE for the
initial KNN, 69.9 m$^{3}$/ha, and both the ordered and proportional Mean
balanced processes, 72 m$^{3}$/ha and 56.3 m$^{3}$/ha, respectively,
occurred in the southern Georgia 18/39 scene. However, the maximum RMSE
occurred in different scenes for each model. The maximum RMSE and MAE for
the initial KNN process occurred in the south-central Georgia scene 18/38,
101.3 m$^{3}$/ha; the extreme northwestern Georgia scene 20/36 for the
ordered Mean Balancing approach, 129.9 m$^{3}$/ha, and in the extreme
north-central Georgia scene 19/36 for the proportional Mean Balancing
approach, 45.9 m$^{3}$/ha.

% Table generated by Excel2LaTeX from sheet 'Sheet1'
\begin{table*}[htbp]
	\centering
	\caption{Comparison of mean volume estimates ($m^3/ha$), RMSE ($m^3/ha$), and MAE ($m^3/ha$) for training list entries for the: i) conifer Forest Inventory and Analysis (FIA) plot measurements; ii) KNN initial estimates; iii) ordered Mean-Balanced estimates; and iv) proportional Mean-Balanced estimates. (MAE is mean-absolute error).}
    \begin{tabular}{rcccccccccc}
    \hline
    &  FIA     & \multicolumn{3}{c}{KNN} & \multicolumn{3}{c}{Mean-Balanced} & \multicolumn{3}{c}{Proportionally Scaled} \\
    \cline{3-11}    \multicolumn{1}{c}{Path / Row} & Mean   & Mean  & RMSE  & MAE   & Mean  & RMSE  & MAE   & Mean  & RMSE  & MAE \\

    \hline
    \multicolumn{1}{c}{17 / 37} & 125.5 & 102.2 & 91    & 65.4  & 100.1 & 108.8 & 79.5  & 145.9 & 74.5  & 58.7 \\
    \multicolumn{1}{c}{17 / 38} & 111.3 & 86.7  & 82.3  & 59.6  & 94.3  & 88.9  & 65.7  & 129.2 & 73.2  & 56.7 \\
    \multicolumn{1}{c}{17 / 39} & 97.3  & 72.2  & 74.5  & 57.2  & 81.3  & 87.7  & 68.4  & 115.7 & 65.4  & 53 \\
    \multicolumn{1}{c}{18 / 38} & 150.7 & 106.4 & 101.3 & 78.6  & 125.2 & 129.9 & 107   & 196.8 & 91.1  & 73.9 \\
    \multicolumn{1}{c}{18 / 37} & 124.3 & 107.8 & 87.7  & 64.7  & 108.9 & 95.7  & 70.1  & 148.6 & 71.2  & 57.1 \\
    \multicolumn{1}{c}{18 / 38} & 105.5 & 76.5  & 88.8  & 64.2  & 90.6  & 109.2 & 83.3  & 126.2 & 69.9  & 54.1 \\
    \multicolumn{1}{c}{18 / 39} & 101.7 & 82.4  & 69.9  & 51.4  & 70.7  & 72    & 54.5  & 120.8 & 56.3  & 45.3 \\
    \multicolumn{1}{c}{19 / 36} & 138.7 & 68.2  & 100.6 & 74.1  & 126.9 & 128.6 & 103.5 & 211.2 & 115.9 & 94 \\
    \multicolumn{1}{c}{19 / 37} & 125.3 & 95    & 90.8  & 65.9  & 112.1 & 98.8  & 74    & 155.7 & 74.5  & 59.1 \\
    \multicolumn{1}{c}{19 / 38} & 105.2 & 80.9  & 75.8  & 58.4  & 83.5  & 81.4  & 64.8  & 128   & 60.3  & 47.8 \\
    \multicolumn{1}{c}{19 / 39} & 108.2 & 132.5 & 17.9  & 5.1   & 121.2 & 35.3  & 29.7  & 124   & 43.3  & 36.9 \\
    \multicolumn{1}{c}{20 / 36} & 114.8 & 76.4  & 83.5  & 65.1  & 109.7 & 131.5 & 108.3 & 179.5 & 107.5 & 88.2 \\
    \hline
    		\end{tabular}%
	\label{tab7}%
\end{table*}%


\begin{figure}[htb]	
	\centerline{\includegraphics[width=.5\textwidth]{Lowe_etal_histo.eps}}
	\caption{Histogram comparing the distributions of the FIA and the remotely sensed estimates made during the initial KNN process.}
	\label{fig6}
\end{figure}

The greatest differences between the initial KNN and both Mean Balanced
processes occurred in the extreme north-central Georgia scene 19/36 and the
extreme northwestern Georgia 20/36 scene. The ordered Mean Balancing
procedure increased the mean estimate, assessed at each FIA sample point, by
86{\%}, and by more than 209{\%} for the proportional Mean Balancing
process. The smallest difference between the initial KNN and both Mean
Balancing process occurred in the central Georgia scene 18/37 with a
difference of less than 2{\%} for ordered Mean Balancing and by less than
38{\%} for the proportional Mean Balancing process (Tab. \ref{tab7}).

\subsection {Scene-wide summaries}

Summaries of the entire initial KNN and Mean Balanced estimated surfaces
follow. All forested pixels are included in these results. The cover type
specifications were assigned by the LCOV data layer.

Mean conifer volume per hectare estimates before ordered Mean Balancing were
on average 26{\%} lower than the target FIA mean. Thirty-nine percent of all
the conifer-classified pixels in the state required adjustment to attain
equalization. Four scenes needed adjustments to 100{\%} (Tab. \ref{tab8}) of their
conifer-classified areas, while the other eight scenes required adjustments
to 20{\%} or fewer. In total, 2,304,005 conifer hectares across the 12
scenes (Tab. \ref{tab8}) were scaled. After ordered Mean Balancing, conifer volume
per hectare estimates ranged 0 m$^{3}$ to 795 m$^{3}$ with a mean of 115.9
m$^{3}$/ha, compared to 0-395.6 m$^{3}$/ha and a mean of 85.8 m$^{3}$/ha$^{
}$before processing. All conifer pixels were scaled during the proportional
Mean Balancing process yielding a range of volumes from 0 m$^{3}$/ha to 808
m$^{3}$/ha (Tab. \ref{tab9})

The initial conifer mean in the northern scene, path 19, row 36, was 34{\%}
below the FIA target (Tab. \ref{tab8}). In order to raise that scene's conifer mean
to the appropriate level, 100{\%} of the conifer pixels (375,378 hectares)
had to be adjusted during the ordered Mean Balancing processes. This
resulted in the range of data being increased from 0-395.6 m$^{3}$/ha to
0-795.3 m$^{3}$/ha with a mean of 1,983 m$^{3}$/ha, which is equal to the
FIA's. One-hundred percent of the data were adjusted during the proportional
Mean Balancing process which yielded the target mean of 138.8 m$^{3}$/ha and
a similar range of estimates from 0 to 808 m$^{3}$/ha (Tab. \ref{tab9}). However, the
standard deviation was more than twice as large as those from the ordered
mean Balancing process, 116.6 m$^{3}$/ha and 50.2 m$^{3}$/ha, respectively.

% Table generated by Excel2LaTeX from sheet 'Sheet1'
\begin{table*}[htbp]
	\centering
	\caption{Area of conifer forestland adjusted on the pixel level during the ordered Mean Balancing processes.}
    \begin{tabular}{rccccc}
    	\hline
    	& Initial Mean & Adjusted Area & Adjusted Area & Adjusted Mean & Max. Est. \\
    	\multicolumn{1}{c}{Path / Row} &  (\% of FIA Mean) & (ha)  & (\%)  &  (St. Dev.)($m^3/ha$) &  ($m^3/ha$) \\
    	\hline
    	\multicolumn{1}{c}{17 / 37} & -13\% & 7,481.40 & 2\%   & 125.5 (68.6) & 637 \\
    	\multicolumn{1}{c}{17 / 38} & -20\% & 78,690.80 & 6\%   & 111.3 ( 66.1) & 625 \\
    	\multicolumn{1}{c}{17 / 39} & -23\% & 62,303.40 & 12\%  & 97.3 (50.1) & 381 \\
    	\multicolumn{1}{c}{18 / 36} & -32\% & 162,672.80 & 100\% & 150.7 (66.0) & 472 \\
    	\multicolumn{1}{c}{18 / 37} & -12\% & 42,196.60 & 4\%   & 124.3 (67.7) & 606 \\
    	\multicolumn{1}{c}{18 / 38} & -27\% & 1,182,838.10 & 100\% & 105.4 (66.9) & 244 \\
    	\multicolumn{1}{c}{18 / 39} & -39\% & 33,265.60 & 12\%  & 101.7 (50.2) & 300 \\
    	\multicolumn{1}{c}{19 / 36} & 0.34  & 375,378.60 & 100\% & 138.8 (50.2) & 795 \\
    	\multicolumn{1}{c}{19 / 37} & 30\%  & 132,418.10 & 20\%  & 125.4 (82.8) & 570 \\
    	\multicolumn{1}{c}{19 / 38} & -23\% & 54,152.70 & 12\%  & 105.2 (60.6) & 414 \\
    	\multicolumn{1}{c}{19 / 39} & -2\%  & 6.5   & <1\%  & 108.2 (59.9) & 279 \\
    	\multicolumn{1}{c}{20 / 36} & -49\% & 172,600.60 & 100\% & 114.8 (82.6) & 273 \\
    	\hline
    \end{tabular}%

	\label{tab8}%
\end{table*}%

% Table generated by Excel2LaTeX from sheet 'Sheet1'
\begin{table*}[htbp]
	\centering
	\caption{Area of conifer forestland adjusted on the pixel level during the proportional Mean Balancing processes.}
    \begin{tabular}{rccccc}
    	\hline
    	& Initial Mean & Adjusted Area & Adjusted Area & Adjusted Mean & Max. Est. \\
    	\multicolumn{1}{c}{Path / Row} &  (\% of FIA Mean) & (ha)  & (\%)  &  (St. Dev.)($m^3/ha$) &  ($m^3/ha$) \\
    	\hline
    	\multicolumn{1}{c}{17 / 37} & -13\% & 463,276 & 100\% & 125.5 (68.6) & 426 \\
    	\multicolumn{1}{c}{17 / 38} & -20\% & 1,394,528 & 100\% & 111.3 ( 66.1) & 439 \\
    	\multicolumn{1}{c}{17 / 39} & -23\% & 491,604 & 100\% & 97.3 (50.1) & 330 \\
    	\multicolumn{1}{c}{18 / 36} & -32\% & 158,057 & 100\% & 150.7 (66.0) & 473 \\
    	\multicolumn{1}{c}{18 / 37} & -12\% & 1,032,216 & 100\% & 124.3 (67.7) & 360 \\
    	\multicolumn{1}{c}{18 / 38} & -27\% & 1,132,081 & 100\% & 105.4 (66.9) & 407 \\
    	\multicolumn{1}{c}{18 / 39} & -39\% & 258,125 & 100\% & 101.7 (50.2) & 290 \\
    	\multicolumn{1}{c}{19 / 36} & -34\% & 339,060 & 100\% & 138.8 (50.2) & 808 \\
    	\multicolumn{1}{c}{19 / 37} & -30\% & 644,264 & 100\% & 125.4 (82.8) & 490 \\
    	\multicolumn{1}{c}{19 / 38} & -23\% & 434,758 & 100\% & 105.2 (60.6) & 347 \\
    	\multicolumn{1}{c}{19 / 39} & -2\%  & 43,342 & 100\% & 108.2 (59.9) & 280 \\
    	\multicolumn{1}{c}{20 / 36} & -49\% & 146,237 & 100\% & 114.8 (82.6) & 453 \\
    	\hline
    \end{tabular}%

	\label{tab9}%
\end{table*}%

\subsection {Fusion of industrial data in path 18, row 37}

The path 18, row 37 mean of the initial KNN-based estimates for conifer
volume per hectare were more than 12{\%} below the 2010 common timeline FIA
estimate (Tab. \ref{tab8}). After scaling, the scene-wide conifer means were all near
equal to the 18/38 FIA Target (+/- 0.2{\%}). The maximum conifer pixel
estimate for the ordered Mean Balanced and Industry Fused routines were both
almost twice its FIA and initial KNN counterparts and the maximum value
yielded from the proportional Mean Balance routine was 15{\%} larger (Tab.
\ref{tab10}).

The mean stand volume per hectare produced by the initial KNN estimation
routine was 27{\%} below the mean calculated from the industry ground
measurements (Tab. \ref{tab11}) and the range of predicted stand means was half. The
ordered Mean Balancing process average was 18{\%} below the industry's
measure with a compressed range of estimates of almost 17{\%} and the
proportional Mean Balancing mean 16{\%} lower with a compressed range of
almost half. By design, the average stand cubic foot volume per hectare and
the industry measures are nearly equal. While the means are equal, the range
of individual pixel estimates is 10{\%} lower. The initial KNN and the
ordered Mean Balancing process yielded similar RMSE measures of 92.5
m$^{3}$/ha and 96.7 m$^{3}$/ha, and MAE measures of 73.6 m$^{3}$/ha and 78.8
m$^{3}$/ha, respectively (Tab. \ref{tab11}), only slightly higher than those from the
proportional Mean Balanced data. The RMSE and MAE from the industry-fused
process was nearly 60{\%} lower, 10.7 m$^{3}$/ha and 3.8 m$^{3}$/ha,
respectively.

The scatter plots in Figure \ref{fig7} reveal the weak positive relationship between
the industry observed stand's cubic foot volume and its remotely sensed
estimates using the initial KNN procedure (Fig. \ref{fig7}A), the proportional Mean
Balancing (Fig. \ref{fig7}B), the Mean Balancing routine (Fig. \ref{fig7}C). Purposely through
the scaling of individual estimates within each industry stand, there is a
strong positive relationship with the industry measures and the
industry-fused estimates (Fig. \ref{fig7}D). The effects of the scaling that occurred
during the Mean Balancing process (Fig. \ref{fig7}C) are apparent throughout the
extent of the industry measurements. The range of estimates for the
zero-volume samples (i.e. harvested sites) expanded from 0 to just above
139.9 m$^{3}$/ha (Fig. \ref{fig7}A) to 0 to approximately 279.9 m$^{3}$/ha (Fig. \ref{fig7}C).

% Table generated by Excel2LaTeX from sheet 'Sheet1'
\begin{table*}[htbp]
	\centering
	\caption{Conifer volume per hectare estimates from the 2010 common timeline FIA and generated from the four remote sensing methods for the path 18, row 37 scene.}
	\begin{tabular}{cccc}
		\hline
		Method & MEAN (m$^{3}$/ha) & St. Dev. (m$^{3}$/ha) & Max (m$^{3}$/ha) \\
		\hline
		2010 Common Timeline FIA & 124   & 6     & 311 \\
		Initial KNN & 110   & 60    & 311 \\
		Proportional MB & 124   & 67    & 360 \\
		Ordered MB & 124   & 68    & 606 \\
		Industry Fused & 124   & 83    & 541 \\
		\hline
	\end{tabular}%
	\label{tab10}%
\end{table*}%

% Table generated by Excel2LaTeX from sheet 'Sheet1'
\begin{table*}[htbp]
	\centering
	\caption{Path 18, row 37 stand-level comparison of mean conifer volume per hectare generated from the four estimates based on remote sensing.}
    \begin{tabular}{ccccccc}
    	\hline
    	& Mean  & St. Dev. & Min   & Max   & RMSE  & MAE \\
    	Method & (m$^{3}$/ha) & (m$^{3}$/ha) & ($^{3}$/ha) & (m$^{3}$/ha) & (m$^{3}$/ha) & (m$^{3}$/ha) \\
    	\hline
    	Industry & 158.2 & 108.0   & 0.0     & 426.3 & 0.0     & 0.0 \\
    	Initial KNN & 115.6 & 46.0    & 2.0     & 211.6 & 92.5  & 73.6 \\
    	Proportional MB & 133.1 & 53.4  & 3.0     & 253.4 & 83.8  & 67.6 \\
    	Ordered MB & 129.4 & 53.9  & 1.6   & 357   & 96.7  & 78.8 \\
    	Industry Fused & 158.2 & 104.3 & 0.0     & 406.3 & 10.7  & 3.8 \\
    	\hline
    \end{tabular}%

	\label{tab11}%
\end{table*}%

\begin{figure}[htb]
	\centerline{\includegraphics[width=.49\textwidth]{Lowe_etal_scatter.eps}}
	\caption{Scatter plots reflecting the volume per hectare (m$^{3}$) estimates for each industry stand from the A) initial KNN, the B) proportional Mean Balancing, C) the ordered Mean Balancing, and the D) industry-fused processes.}
	\label{fig7}
\end{figure}

\subsection {Multi-scale queries}

Total conifer area reported by the FIA and the area of conifer represented
in LCOV for Hancock County, Georgia are nearly identical (Tab. \ref{tab12}). The FIA
reports 56,205 hectares of coniferous forestland while LCOV represents
56,195 total conifer hectares. Each of the remotely sensed processes yielded
a mean conifer volume per hectare larger than what the FIA reported. The
initial KNN process yields a mean volume per hectare of 125.8 m$^{3}$/ha,
17{\%} more than the FIA; ordered Mean Balancing estimates 133.5 m$^{3}$/ha,
25{\%} more, proportional Mean Balancing estimates 143.2 m$^{3}$/ha, 34{\%}
more, and the industry-fused process yields an estimate of 135.3 m$^{3}$/ha
(Tab. \ref{tab12}), 26{\%} more than the FIA. The difference between the FIA's
estimate, 213 million m$^{3}$ and the remote sensing estimates for total
conifer volume ranged from 16{\%} to 25{\%}. The initial KNN process yields
247 million m$^{3}$, ordered Mean Balancing 262 million m$^{3}$,
proportional Mean Balancing 247 million m$^{3}$, and the industry-fused
process 266 million m$^{3}$.

The FIA reported 2,269.1 hectares (Tab. \ref{tab13}) of conifer forestland area in
the 3.5-mile radius query area. However, the LCOV layer reports 4,468
hectares of conifer-classified pixels. All remotely sensed conifer volume
per hectare estimates were lower than the FIA's reported value. FIA reports
a volume per hectare of 259.3 m$^{3}$ while the TM-derived data reports a
volume per hectare of 120.3 to 146.7 m$^{3}$ (Tab. \ref{tab13}). FIA reports no
forestland area or volume in the half-mile query (Tab. \ref{tab14}). The remotely
sensed estimates in this query area ranged from a mean conifer volume per
hectare of 142.5 m$^{3}$ from the initial KNN estimate to 163.2 m$^{3}$ from
the proportional Mean Balanced data.

% Table generated by Excel2LaTeX from sheet 'Sheet1'
\begin{table*}[htbp]
	\centering
	\caption{Conifer volume per hectare estimates generated from the FIA, the initial KNN, Mean Balancing, and the industry-fused methods for Hancock County, Georgia.}
    \begin{tabular}{ccccccc}
    	\hline
    	& Area  & Min   & Max   & Mean  & St. Dev. & Volume \\
    	Method & (ha)  & (m$^{3}$/ha) & (m$^{3}$/ha) & (m$^{3}$/ha) & (m$^{3}$/ha) & (Mil. M$^{3}$) \\
    	\hline
    	FIA Db (Hancock) & 56,205 & NA    & NA    & 107.3 & NA    & 14.9 \\
    	Initial KNN & 56,196 & 0.0     & 289.2 & 125.8 & 53.7  & 17.3 \\
    	Proportional MB & 56,196 & 0.0     & 329.9 & 143.2 & 61.5  & 18.9 \\
    	Ordered MB & 56,196 & 0.1   & 594.1 & 133.5 & 60.2  & 18.3 \\
    	Industry-fused & 56,196 & 0.1   & 590.5 & 135.3 & 63.7  & 18.6 \\
    	\hline
    \end{tabular}%

	\label{tab12}%
\end{table*}%

% Table generated by Excel2LaTeX from sheet 'Sheet1'
\begin{table*}[htbp]
	\centering
	\caption{Query results from the 3.5-mile radius query to the FIA database, the initial KNN, Mean Balancing, and the industry-fused methods for the conifer type.}
    \begin{tabular}{ccccccc}
    	\hline
    	& Area  & Min   & Max   & Mean  & St. Dev. & Volume \\
    	Method & (ha)  & (m$^{3}$/ha) & (m$^{3}$/ha) & (m$^{3}$/ha) & (m$^{3}$/ha) & (Mil. M$^{3}$) \\
    	\hline
    	FIA Db Query & 2,269 & NA    & NA    & 259.3 & NA    & 1.5 \\
    	Initial KNN & 4,468 & 0.1   & 520   & 129.4 & 65.8  & 1.4 \\
    	Proportional MB & 4,468 & 0.0     & 355.3 & 144.8 & 63.7  & 1.6 \\
    	Ordered MB & 4,468 & 0.0     & 266.5 & 120.3 & 56.7  & 1.3 \\
    	Industry-fused & 4,468 & 0.1   & 538.4 & 130.2 & 71.4  & 1.4 \\
    	\hline
    \end{tabular}%

	\label{tab13}%
\end{table*}%

% Table generated by Excel2LaTeX from sheet 'Sheet1'
\begin{table*}[htbp]
	\centering
	\caption{Query results from the 0.5-mile radius query to the FIA database, the initial KNN, Mean Balancing, and the industry-fused methods for the conifer type.}
    \begin{tabular}{ccccccc}
    	\hline
    	& Area  & Min   & Max   & Mean  & St. Dev. & Volume \\
    	Method & (ha)  & ($^{3}$3/ha) & (m$^{3}$/ha) & (m$^{3}$/ha) & (m$^{3}$/ha) & (Thousand M$^{3}$) \\
    	\hline
    	FIA Db Query & 0     & NA    & NA    & 0     & NA    & 0 \\
    	Initial KNN & 121   & 4.5   & 253.4 & 142.5 & 52.3  & 42.5 \\
    	Proportional MB & 121   & 5.1   & 289.0   & 163.2 & 59.3  & 48.8 \\
    	Ordered MB & 121   & 6.4   & 462.4 & 147.2 & 55.6  & 43.9 \\
    	Industry-fused & 121   & 6.4   & 538.4 & 157.6 & 76.5  & 47.1 \\
    	\hline
    \end{tabular}%

	\label{tab14}%
\end{table*}%

\section{Discussion}

In the study described here we used the novel approach of Mean Balancing for
removing bias from KNN estimates based on the FIA data and industrial
inventory data, modeled on satellite imagery for the purpose of
redistributing the FIA pine inventory means to pixel size areas of pine
forests. We based the approach on the rationalization for balancing
an inventory to an unbiased total presented by Iles (2009). In essence, this
approach states that any process resulting in the same total as an unbiased
estimate is itself unbiased. The principle usefulness of this approach is
fusing the large area FIA information with other data for the purpose of
obtaining useful small area estimates while maintaining the statistical
integrity of the landscape level inventory.

Though Iles (2009) balanced on the total volume reported from a large-area
timber inventory, we balanced on the means reported by the FIA, which
essentially represents the same principle. In this approach, we allowed
individual pixel estimates to adjust upward or downward until the remote
sensing-based mean volume per hectare (m$^{3})$ equalized with the mean
derived from the FIA plot measurements and projected to a 2010 common
timeline. We used two methods of balancing of which one was indiscriminate
to any variables and consisted of equal scaling of all estimated values to
achieve the desired mean. The other method was based on scaling each
estimate proportionally to its pixel's ESD, thus giving more stability to
better predict estimates while scaling more the poorer estimates. In theory
the later approach seems more desirable and has strong logical basis;
however, in our example it produced less accurate estimates for the testing
data. Based on these results we conclude that further research is need in
this area to improve the discriminate algorithm, because it seems that
estimates from better matched stands should be more accurate than estimates
from mismatched stands, which would suggest that they should be changed
less.

This inventory of Georgia differentiates itself from other large-area,
remote sensing-based inventories in the northeastern United States and
abroad (\citealt{mcroberts10}, \citealt{tomppo08}, \citealt{mcroberts09b})
in the manner bias is addressed. Recommendations for minimizing bias are the
incorporation of a weighting factor during the nearest neighbor process
(\citealt{katila06}, \citealt{mcroberts09a}), generalization or segmentation
(\citealt{hyvonen05}, \citealt{woodcock01}) and the careful selection of the optimal K
(\citealt{mcroberts02}) and method of estimation (\citealt{labrecque06}).
We on the other hand accept the statistical integrity of the FIA's
large-area reports and conform our measurements to them.

Scaling estimates based solely on the ESD to attain equalization decreased
the local accuracy of our stand-level volume per hectare (m$^{3})$ estimates
as seen in figures \ref{fig7}A and \ref{fig7}B. Root-mean squared error decreased by 4{\%} and
MAE by 7{\%} (Tab. \ref{tab10}) when compared to the initial KNN estimates. However,
at a more suitable summary unit for the FIA, scene-level summaries of mean
volume per hectare estimates from the balanced models were near equal.
Furthermore, after incorporating the small area forest inventory, our local
accuracy increased by nearly 2.5 times, while maintaining large area per
hectare conformity with the FIA (Tab. \ref{tab9}).

Several issues requiring further assessment were identified throughout this
research. We did not explore balancing to the total volume. Our
rationalization for using the mean as the target is the fact that volume per
hectare is invariant to total area. Total volume, on the other hand, is a
product of forestland area and, unlike volume per hectare, fluctuates as
that area changes. However, total volume is the measure the FIA reports, so
the issue should be addressed.

Second, there is room for more complete utilization of the small-area
measurements. This study only leveraged the information from our industry
partners within their stand boundaries. The high resolution ground
information, however, can be used for estimates across the entire scene. For
instance, Sivanpillai (2006) used similar high resolution forest
measurements in conjunction with remote sensing and multivariate regression
to estimate age and density for a site in eastern Texas and Meng (\citealt{meng09b}) used
high resolution forest information and satellite imagery with geostatistical
techniques for a large-area forest inventory.

Natural resource managers have a growing amount of data available for
incorporation into their decision-making and management processes.
Regardless of the source, whether it is the product of a small forest
inventory designed for a locally accurate estimate, a report based on
sparsely located plots adequate for large area approximations, or even if it
is a bit of information your foreman 'knows' is true and, for that reason
alone, must be included in the analysis, they all contain useful bits of
information. We used the total balancing concept to assimilate those
seemingly unrelated, yet useful bits of information into our high
resolution, spatially explicit inventory for the state of Georgia. The
inventory retains the FIA's unbiased nature across large areas for volume
per hectare (m$^{3})$, however, unlike the FIA, our inventory also maintains
the local accuracies provided by our forest industry partners.


\section*{Acknowledgments}
We are grateful to the two anonymous reviewers who provided helpful comments for the earlier version of this manuscript.

\addtolength{\textheight}{-3.251truein}

%\input{Lowe_etal_MultisourceKNN_MCFNS_Accepted-26Sept2014.bbl}
\input{LoweEtAlBibliography.bbl}

%% \vspace{.2in}
%% \section*{Biographical Note}

\label{docend}
\end{document}