laitimes

Small model, big breakthrough! Neural networks provide insight into spatial heterogeneity and accurately describe complex geographical phenomena

In order to promote the universalization of AI4S, reduce the barriers to the dissemination of scientific research results of academic institutions, and provide a communication platform for more industry scholars, technology enthusiasts and industrial units, HyperAI has planned a series of live broadcast columns called "Meet AI4S", inviting researchers or related units who are deeply involved in the field of AI for Science to share research results and methods in the form of videos.

In the first episode of the "Meet AI4S" series, we are fortunate to invite Ding Jiale, a Ph.D. student in remote sensing and geographic information system at Zhejiang University, whose Zhejiang Provincial Key Laboratory of Resources and Environmental Information System has published a number of high-value research results in national high-tech fields such as digital earth and geographic information system, remote sensing and global positioning system.

In this sharing, Dr. Ding Jiale shared his latest research results with the topic of "Neural Networks Provide a New Explanation for the Spatial Heterogeneity of Housing Prices". In this study, the OSP-GNNWR model is constructed by combining a spatial proximity measure (OSP) optimized by neural networks with a weighted regression method of geographic neural networks, which can more accurately describe complex spatial processes and geographical phenomena by solving the spatial non-stationary regression relationship between the dependent variable and the independent variable.

Click here to view the full replay ⬇️

https://www.bilibili.com/video/BV14W42197on/

On the premise of not violating the original intention, HyperAI has compiled and summarized Dr. Ding Jiale's in-depth sharing.

Small model, big breakthrough! Neural networks provide insight into spatial heterogeneity and accurately describe complex geographical phenomena

Starting from the interpretability of the model, it promotes the future development of science

As an explorer of geography, if we come up with a model that can only predict house prices simply, then such results seem uninteresting to me. What we pursue is to use a series of regression coefficients that vary with spatial location output by these models to make reasonable scientific explanations for geographic processes or geographic models, so that such research can be more forward-looking and practical. It is with this vision in mind that I have chosen the topic of "Neural Networks Provide New Explanations for the Spatial Heterogeneity of Housing Prices" to share today.

前段时间,我们团队在地理信息科学领域知名期刊 International Journal of Geographical Information Science 上发表了题为「A neural network model to optimize the measure of spatial proximity in geographically weighted regression approach: a case study on house price in Wuhan」的研究论文。

Address:

https://www.tandfonline.com/doi/full/10.1080/13658816.2024.2343771

In this study, we introduce a neural network method to nonlinearly couple multiple spatial proximity measures (such as Euclidean distance, travel time, etc.) between observation points to obtain an optimized spatial proximity measure (OSP), so as to improve the accuracy of the model's prediction of housing prices.

In order to solve the problems of the abstract "spatial proximity" that cannot construct a loss function and the neural network is difficult to train, we further combine OSP with the Geographically Neural Network Weighted Regression (GNNWR) method to construct the osp-GNNWR model, which realizes the training of the neural network by solving the spatial nonstationary regression relationship between the dependent variable and the independent variable. Finally, the model proved to have better global performance and more accurately describe complex spatial processes and geographical phenomena.

Next, I will use this work as an example to share with you the specific process of neural networks to provide a new explanation for the spatial heterogeneity of housing prices.

Research Background: Scientific breakthroughs under the dual challenges

"Spatial heterogeneity" is the key factor causing the fluctuation of housing prices, but a single distance measurement method is overstretched in capturing the "spatial heterogeneity" of housing prices in complex geographical environments. Traditional geographically weighted regression models (GWRs) also face challenges in measuring spatial proximity. It is for these factors that we chose to conduct this study.

Spatial heterogeneity: the differential expression of different spaces

First, let me introduce you to the background of spatial heterogeneity and geographically weighted regression.

The ordinary linear regression model OLS is the most commonly used and fundamental statistical method for determining the regression relationship of variables, using a very concise formula to describe the relationship between the dependent variable and multiple independent variables, as shown in the figure below, where y is equal to an intercept term, plus the product of several regression coefficients and independent variables.

Small model, big breakthrough! Neural networks provide insight into spatial heterogeneity and accurately describe complex geographical phenomena

When we apply statistical methods such as OLS to geography, we often need to take into account the inherent spatial characteristics of some geographical problems, so the related research of spatial statistics and spatiotemporal modeling has emerged.

Ordinary linear regression models assume that the regression coefficients are independent of the spatial and temporal location of the sample data, and that the calculated independent variable coefficients are the average of the study area.

However, the regression relationship in the actual geographical process will show differences in different spatial locations. Taking housing prices as an example, the main influencing factors of a house of the same type in the city center and the suburbs are different, so their regression relationship also has different forms, and we call this characteristic spatial heterogeneity (spatial non-stationarity).

Small model, big breakthrough! Neural networks provide insight into spatial heterogeneity and accurately describe complex geographical phenomena

Spatial heterogeneity is an inherent feature of the description of the relationship between geographic elements, and it is the expression of the difference between the relationships or structures of geographic elements in different spatial and temporal locations. It means that there are differences in the mechanism by which the data is generated at different spatial locations, either in the form of a corresponding regression model or as parameters change with the spatial position.

Geographically weighted regression: The transformation from spatial proximity to weights is achieved through a kernel function

Geographically weighted regression (GWR) is a modeling method for spatial heterogeneous processes proposed by Academician A. Stewart Fotheringham of United States.

As you can see from the formula below, although the overall form of GWR is still a linear regression relationship, its intercept term and regression coefficient become a mapping relationship with the coordinate position (ui, vi). That is to say, the regression relationship is different at different coordinate positions, and the regression relationship reflected in the whole formula will also change with the spatial position.

Small model, big breakthrough! Neural networks provide insight into spatial heterogeneity and accurately describe complex geographical phenomena

The regression coefficient of GWR is difficult to determine, and the most commonly used solution method today is similar to OLS, which uses a weighted least squares method.

In the formula below, the diagonal weight matrix W is used to weight the sample to reflect the spatial correlation between the independent variables. Specifically, the weights between the samples are calculated based on the spatial proximity of the samples, and the closer the two points in space will have a stronger correlation, we will assign them a larger weight and model it.

Small model, big breakthrough! Neural networks provide insight into spatial heterogeneity and accurately describe complex geographical phenomena

How do you convert from spatial proximity to weights? GWR is a kernel function, such as a Gaussian kernel function, a bisquare kernel function, etc., to convert spatial proximity into a weight, so as to realize the weight equation construction. However, this approach has certain limitations.

Small model, big breakthrough! Neural networks provide insight into spatial heterogeneity and accurately describe complex geographical phenomena

In the past, the key to the modeling of spatial heterogeneous processes was to design and construct the spatiotemporal weight kernel function based on the spatiotemporal location proximity metric, and then use the local weighted regression theory to establish the nonstationary objective solution function, and realize the geographic modeling of the spatiotemporal nonstationary relationship through the optimal solution of the model evaluation criterion.

For example, the existing structure system of kernel functions with single-parameter analysis as the core is relatively simple, and it is difficult to fully estimate the complex effect of spatiotemporal proximity on spatiotemporal weights, resulting in the inability to accurately solve the spatiotemporal nonstationary characteristics of complex geographical relationships.

With the continuous development of big data in recent years, it is a feasible solution to solve the dilemma of the current development of spatiotemporal relationship modeling methods by giving full play to the advantages of massive data in the big data environment, making efficient use of the nonlinear fitting ability of deep neural networks, and using neural networks to explain spatial heterogeneity.

How can neural networks be used to explain spatial heterogeneity?

融合 SWNN,GNNWR 具有更强的泛化能力

Previously, we proposed a geographic neural network weighted regression model, GNNWR, which uses a deep neural network (spatially weighted neural network SWNN) to assign a series of spatial weights to the samples at each location.

GNNWR Paper Address:

https://doi.org/10.1080/13658816.2019.1707834

Small model, big breakthrough! Neural networks provide insight into spatial heterogeneity and accurately describe complex geographical phenomena

Specifically, SWNN takes the distance vector from each sample point to the other sample points as input, and outputs a series of spatial weights at that position, that is, the weight matrix W, so as to express the spatial heterogeneity.

Small model, big breakthrough! Neural networks provide insight into spatial heterogeneity and accurately describe complex geographical phenomena

In order to have a strong generalization ability on smaller samples and to achieve faster convergence in model training, we multiply and combine the weights of SWNN output with the global regression coefficient obtained by OLS prior in the GNNWR method to form a regression coefficient of spatial heterogeneity.

The regression equation is obtained from the above figure, which is composed of independent variables, global regression coefficients, and adjustment parameters of spatial nonararity on the observation point. Based on this, we establish a spatial regression model based on neural network to solve spatial nonstationary processes.

Optimize spatial proximity metrics with neural networks

As mentioned earlier, SWNN takes as input the distance vector from each sample point to the other sample points. In this process, we generally use Euclidean distance, such as using the length of two points in space as a measure of distance, which is the most intuitive and easy-to-understand distance expression.

Small model, big breakthrough! Neural networks provide insight into spatial heterogeneity and accurately describe complex geographical phenomena

However, in the urban environment, European distances are affected by natural and traffic conditions, and it is difficult to reflect the actual spatial proximity. For example, if you want to go to the Qiantang River on the other side, if you can't take the highway bridge, you need to go around a big circle to get there. In this case, although the straight-line distance between the two points is very close, they are very far apart in actual space, and the Euclidean distance does not fully reflect their spatial proximity.

In the real world, the exchange of people and goods is often through the road traffic network, and Road Network Distance (ND) and Travel Duration (TD) are also appropriate measures of spatial proximity.

However, due to traffic rules and road capacity limitations, the spatial proximity represented by the same length of road network distance and the same length of travel time is not the same. For example, if you drive for 13 minutes, you can only walk a short distance if you are on a viaduct, and you can only walk a long distance if you are on a viaduct.

Therefore, if you use a single measure of spatial proximity, there are certain limitations. Therefore, we try to establish a distance fusion function, which couples multiple distance measures together to optimally characterize spatial proximity.

Small model, big breakthrough! Neural networks provide insight into spatial heterogeneity and accurately describe complex geographical phenomena

Based on the above equation, we couple several "distances" between two points to form a better and more accurate representation of the true spatial proximity between two points.

But there is also a problem with this equation, FSP is a distance representation that needs to unify multiple different dimensions. For example, the units of travel time and Euclidean distance are inherently different, and orders of magnitude may vary greatly, and the coupling effect cannot be fully realized by relying on ordinary functions alone. To do this, we construct a neural network called SPNN for spatial proximity, which maps these distances into a unified spatial proximity measure.

Small model, big breakthrough! Neural networks provide insight into spatial heterogeneity and accurately describe complex geographical phenomena

Then, by training this neural network, the computation of a particular function can be transformed into a data-driven fitting process, which is the idea of using neural networks to optimize spatial proximity.

Connect the two neural networks to form the osp-GNNWR

Since spatial proximity is an abstract concept and there is no truth value, for example, given a point a and a point b, it cannot be said that the spatial proximity between a and b is a definite value x, which makes the loss function of SPNN undefinable and therefore untrainable.

Small model, big breakthrough! Neural networks provide insight into spatial heterogeneity and accurately describe complex geographical phenomena

Our solution is to connect two neural networks directly into GNNWR by using the output of the SPNN as a distance input to form a unified whole, which we call Geographic Network Weighted Regression for Optimizing the Spatial Proximity Metric (osp-GNNWR).

According to this model, we can directly train the entire network with the error of the sample estimate, and directly train the network with the fitting value of the final dependent variable y and the value-added error as a loss function. The whole network is trained, and the previous SPNN is also trained at the same time, which solves the SPNN solving problem and completes the regression task.

Taking Wuhan housing prices as an example, the osp-GNNWR provides a new explanation for the spatial heterogeneity of housing prices

Taking Wuhan housing prices as an example, we select 968 independent second-hand housing transaction data in Wuhan and divide them into training sets and test sets in a ratio of 85:15. In these data, 10 independent variables in 3 categories were selected by using the characteristic price method commonly used in housing price modeling, including the basic information of these houses, surrounding supporting facilities, transportation convenience, etc. On this basis, we choose Euclidean distance and travel time as input distances for SPNN to build the osp-GNNWR model.

Small model, big breakthrough! Neural networks provide insight into spatial heterogeneity and accurately describe complex geographical phenomena

For the optimized spatial proximity measure, as shown in the figure below, the color of each point in the graph represents the residual difference of the fitting result; Orange indicates that the osp-GNNWR fits better than the original GNNWR model; The lines represent the resulting difference between the optimal spatial proximity and the Euclidean distance.

Small model, big breakthrough! Neural networks provide insight into spatial heterogeneity and accurately describe complex geographical phenomena

As can be seen in Figure A, the distance between OSP and Euclidean is quite different in the urban fringe area, and due to the influence of road network structure, there is a certain directional difference. In particular, we can find a low difference in the direction of the red arrow, which is mainly due to the fact that the direction coincides with the Wuhan Second Ring Expressway, and the Euclidean distance used to construct the OSP and the travel time itself are small.

As can be seen in Figure B, in the center of the city, due to the well-developed transportation facilities, the spatial proximity of different directions is relatively balanced, so the difference between the OSP and Euclidean distances shows a regular concentric circle distribution.

Through these differences between OSP and Euclidean distances, we are also able to demonstrate the practical significance of optimizing spatial proximity metrics.

Based on the results of housing price modeling, we can further discuss the spatial heterogeneity of the regression coefficients, such as studying the impact of college distance on housing prices.

As shown in the chart below, the UA parameters in the Hongshan District Center of Wuhan are significantly higher than in other areas, indicating that universities have a positive impact on housing prices in the area, that is, the closer to educational institutions, the higher the housing prices. In addition, these universities and research institutes have also brought a better livable environment, creating a more prosperous rental market.

Small model, big breakthrough! Neural networks provide insight into spatial heterogeneity and accurately describe complex geographical phenomena

Small models also make a big sense

We did not use large models in the above research, although large neural network models, deep network models, etc. are very popular now, but small models still have their practical significance. In the case of not having so much computing power and enriching dataset samples, designing a small and beautiful model will also be of great help to solve some problems.

Finally, there are some references, if you are interested, you can also check it out.

Small model, big breakthrough! Neural networks provide insight into spatial heterogeneity and accurately describe complex geographical phenomena

Convocation Order

HyperAI (hyper.ai) is the largest search engine in the field of data science in China, focusing on the latest research results of AI for Science for a long time, and has interpreted more than 100 academic papers in top journals.

We welcome research groups and research teams that are conducting research and exploration around AI for Science to contact us to share the latest research results, submit in-depth interpretation articles, participate in the Meet AI4S live column, and more ways to promote AI4S are waiting for us to explore together!

Add WeChat: Neuro Star (WeChat ID: Hyperai01)

Read on