Chiang Mai Journal of Science

Print ISSN: 0125-2526 | eISSN : 2465-3845

1,647
Articles
Q3 0.80
Impact Factor
Q3 1.3
CiteScore
7 days
Avg. First Decision

Housing Price Prediction by Divided Regression Analysis

Yann Ling Goh, Yeh Huann Goh, Chun-Chieh Yip and Kooi Huat Ng
* Author for corresponding; e-mail address: gohyl@utar.edu.my
Volume: Vol.49 No.6 (November 2022)
Research Article
DOI: https://doi.org/10.12982/CMJS.2022.102
Received: 17 May 2022, Revised: 28 August 2022, Accepted: 23 September 2022, Published: -

Citation: Goh Y.L., Goh Y.H., Yip C. and Ng K.H., Housing Price Prediction by Divided Regression Analysis, Chiang Mai Journal of Science, 2022; 49(6): 1669-1682. DOI 10.12982/CMJS.2022.102.

Abstract

     Regression analysis is a statistical methodology to investigate the relationship between the dependent variable and the independent variables. In current era with the trend of big data, we might face some problems when performing statistical analysis for the massive volume of data. For example, the heavy burden of the computing load will cause the computation to be time consuming, the accuracy of the results might be affected in view of the vast volume of data. Hence, divided regression analysis is proposed to reduce the burden of the computing load. This approach performs subdivision of the dataset into several unique subsets, then the multiple linear regression is fi tted into each subset. The results obtained from each subset are then combined to obtain a divided regression model which is treated as the original overall dataset. The dataset used in this paper is KC Housesales Data, obtained from the Kaggle website. The dataset contains statistics information about the housing price, for example, size of lot, size of living area and selling price of the house. The goal of this paper is to predict the selling price of a house from the given attributes. The dataset is partitioned into five subsets. Consequently, multiple linear regression is fitted for each subset. Then, some model adequacy checking will be applied on the models. The test in determining the existence of multicollinearity in the models is rather important as well because the collinearity among the independent variables will affect the overall results. Hence, the variance inflation factor (VIF) approach is used to determine the existence of multicollinearity. Finally, the divided regression model is obtained by combining results from all the subsets and the validity of divided regression model is verified.

Keywords: divided regression, multicollinearity, big data

Related Articles

A Bias-Reduced Estimator for Negative Binomial Regression with an Application to CO2 Emissions Data
DOI: 10.12982/CMJS.2025.085.

Fatimah M. Alghamdi, Gamal A. Abd-Elmougod, M. A. El-Qurashi, Ehab M. Almetwally, Ahmed M. Gemeay and Ali T. Hammad

Vol.52 No.6 (November 2025)
Research Article View: 401 Download: 149
New Class of Kibria–Lukman Estimator for Addressing Multicollinearity in Poisson Regression Model
DOI: 10.12982/CMJS.2025.064.

Ohud A. Alqasem, Ali T. Hammad, M.M. Abd El-Raouf and Ahmed M. Gemeay

Vol.52 No.5 (September 2025)
Research Article View: 609 Download: 239
Outline
Figures