Spatially Explicit Machine Learning Approaches for House Price Models
Abstract
Spatial data or georeferenced data are special in that it has spatial reference, meaning that it is linked with geographic coordinates on Earth. The spatial component allows for the identification of spatial patterns, relationships and trends among spatial objects. Spatial objects are usually not randomly or independently distributed, but spatially autocorrelated. In spatial data analysis, spatial autocorrelation has been well recognized with the advocate of spatial statistical techniques, such as spatial clustering, spatial interpolation, spatial regression, and spatial simulation. However, spatial effects or spatial context is largely absent in mainstream machine learning methods. With the popularity of machine learning in various applications in both industry and academia, a new research area has emerged in the spatial community: spatial explicit machine learning. It refers to the use of machine learning algorithms to analyze and predict spatial data with the explicit integration of spatial effects or patterns. It is expected to improve the model accuracy and prediction by incorporating spatial relationships or patterns in the data that have not been captured by traditional machine learning models and, subsequently, to gain better understanding of the data generation mechanism. This research utilizes Franklin County, OH residential house transaction data to explore three different data-driven approaches to integrate spatial perspectives into traditional machine learning algorithms: 1) imposing spatial constraints on unsupervised learning to delineate spatially constrained housing submarkets ; 2) integrating spatial weights into the cost function of supervised learning to improve house price prediction accuracy; and 3) enhancing data input using spatial feature engineering in tree-based ensemble learning for modeling multiscale spatial effects. It intends to contribute new insights for spatially explicit machine learning to the literature. Overall, three studies explore spatially explicit machine learning methods from three different aspects, and the empirical results show that spatially explicit machine learning methods are preferred over traditional machine learning methods when data have strong positive spatial autocorrelation, or more general, data include spatial information that is important for classification, clustering, or prediction tasks.