View of House Price Forecasting Using Machine Learning

(1)

Annals of R.S.C.B., ISSN:1583-6258, Vol. 25, Issue 6, 2021, Pages. 13633 – 13638 Received 25 April 2021; Accepted 08 May 2021.

House Price Forecasting Using Machine Learning

Mohammad Anas Adil, Rohankumar, Ashutosh Singh Tomar Galgotias University, Greater Noida

Abstract –As it’s not easy for real estate market to predict the worthy prices in market so we use machine learning to improve and see the cost with more accuracy. The purpose of this paper is to predict the market value of property. This system helps to determine the initial price of the property according to the geographical diversity. By going through past market patterns and price levels, and future developments future costs will be expected. This test means predicting the prices of houses in the city of Mumbai with the help ofDecision tree regressor. this will help customers put assets to die without going to the vendor. The results of this study proved that the Decision tree regressor provides 89% accuracy.

Keywords - Decision tree decision, machine learning.

1. INTRODUCTION

Every single organization in today’s real estate business is more effective at gaining a competitive edge than other competitors. There is a need to simplify the process for the average person while giving the best results. This paper proposes a system that predicts housing prices using a machine learning algorithm. In case you are going to sell a house, you should see what the price of installing the sticker is. In addition, PC counts can give you a straight gauge! This retrospective model is designed not only forpredicting the price of ready-to-sell housing but also under construction.

Postponing is a machine learning tool that encourages you to meet expectations by importing - from current measurable information - the connection between your specified parameter and many different independent parameters. According to this definition, the cost of housing depends on the parameters, for example, the number of rooms, accommodation, accommodation, and so on. If we use a mockery to find out how to apply these limits, we can calculate housing prices in a given global region.

The intended feature of this proposed model is the price of local property and the independent features are: no. bedrooms, no. bathrooms, carpet area, built-in area, floor, material age, zip code, latitude and height of furniture. In addition to the aforementioned factors, which are much needed to predict house prices, we have included two other factors - air quality and crime. These factors provide significant contributions to predicting commodity prices because higher prices of these factors will lead to lower house prices.

All implementations are done using Python program language. In the construction of the guesswork type, Decision tree regressor is used from the “Scikit-learn” library. Grid Search CV helps find the highest depth ofconstructing the decision tree. After the trained model is ready, it is integrated with the user interface using Flask (a python framework).

(2)

2. RELATEDWORK

The value of a particular building depends on the infrastructure properties surrounding the property. Recently, a few authors' scales to find the best customer properties came with a variety of technologies. Raghunandhan [1] talked about the basic concepts of data mining on how it works and supported algorithms for predictive purposes. The most important part is which machine learning algorithm is best suited to correctly guess the price of the house. Often the environmental factors also determine what may be the price of house depending on different factors [2] introduces the various factors used when predicting prices with good accuracy using a return model. A. Varma [3] designed a system that used local real-time data to calculate accurate global values using Google Maps.

Investigators also show that there is a link between physical appearance and invisible symptoms such as crime statistics, house prices, overcrowding, etc. For example, "City Forensics:

Uses Visual Elements to Predict Visible City Elements" [4], uses visual attributes to predict the value of a property. Hujia Yu, Jiafu Wu (2014) [5] used class and subtraction algorithms.

According to the analysis, the square footage, the contents of the roof, and its location are very important statistically in estimating the sale price of a home. Also the forecast analysis can be improved by the PCA process. Li Li and kai-Hsuan Chu (2017) [6] studied various algorithms such as Backpropagation neural network (BPN) and Radial basis functional (RBF) neural networks. The use of RBF and BPN models was introduced to identify differences between housing price indexes like Cathy and sin price index and complexity function to obtain macroeconomic analysis.

NiharBhagat, AnkitMohokar, Shreyash Mane (2016)[7] studied the regression lines of housing forecasting. The purpose of this paper is to predict the effective real estate price of customers in terms of their budgets and priorities. Analysis of past market trends and price levels will predict future house prices.

3. SYSTEM DESIGN ANDARCHITECTURE

Phase 1: Data collection

There are many methods and processes for processing data. We have collected data on Mumbai real estate agents from various property websites. Details may include signs such as Location, carpet, built-in location, age of goods, zip code, etc. We must collect quantitative data organized and categorized. Data collection is required before any type of machine learning research can be done. Database verification should otherwise be no point in data analysis.

Phase 2: Data processing

Data processing is the process of removing unwanted data from our data set. There may be missing or deducted amounts from the database. This can be treated with data cleaning. If there are too many missing values in the variable we will replace those values or replace the normal value.

Phase 3: Model training

(3)

As the data is divided into two parts: the Training set and the test set, we must first train the model. The training set includes target flexibility. The decision tree algorithm is used in a set of training data. The decision tree forms a retrospective model with a tree structure.

Phase 4: Testing and integration with the UI

A qualified model is used to test the database and

house prices are predicted. The trained model is also integrated with the front end using Flask in python.

Fig 1. The generic flow of development

A. RESPONSES FOR READING:

In the process of developing this model, various retrospective techniques were studied. SVM, Random Forest, Linear regression, Multiple linear regression, Decision Tree Regressor, KNN, all tested on training databases. However, the decision tree regressor has provided high accuracy in predicting house prices. The decision to choose an algorithm depends largely on the size and type of data in the data used. The decision tree algorithm is well suited to our database.

B. REMINDER OF REPENTANT TREE:

The decision tree regressor looks at the attributes and trains the model in the form of a tree to predict future data to produce a logical result. The decision tree is calculated from the maximum depth, the depth of the graph minutes and according to the system analyzes the data.

Grid Search CV is a way of dealing with parameter adjustment that will perform well and evaluate the model in all combinations of calculation parameters shown in the grid. The Grid Search CV in this algorithm is used to test the best value for maximum depth, using where to build a decision tree.

(4)

C. COMBINING OF FLASES

After creating the model and successfully delivering the results, the next step is to make the integration with the UI, and we can do this by using Flask. Flask is very nice web framework. Where we can build a web system. Flask is easy to put routes together and this framework is mainly used to integrate python models.

4. RESULT AND DISCUSSION

A. Data preprocessing: it’s a data mining technique where we convrrt raw data into more meaningful data.

Age and floor parameters were handled for their missing values. the target attribute is also dropped off from the training dataset. Panda’s library is used for this purpose. For statistical visualization of the dataset, the min, max, standard deviation, mean of the target attribute were found out. We devide our data into a training set which is 80% of whole data and a test set which is 20% of whole data.

Max-depth:

As mentioned earlier grid search cv helps to find max depth for the tree. We have used Matplotlib to visualize the different max-depths and complexity performance.

Following are the visualizations:

Fig 2.Testing max-depth values(1) (On axis: Number of training pts. vs Score)

Fig 3.Testing max-depth values(2) (On axis: Number of training pts. vs Score)

(5)

Fig 4. Max-depth value for optimal model

B. Fitting the model:

From the Scikit-learn library, a Decision tree regressor is used to train the model. The predict function is used to predict the test set results.

The following shows the plot of predicted vs actual prices with the accuracy of prediction:

Whre Green dots represents Actual prices and blue(zig-zag) line represents the predicted prices of houses.

Fig 5. Actual vs predicted price graph based on the dataset

Accuracy is nothing but the r2 score of the regression model.

5. CONCLUSION

In this paper, the Decision tree machine learning algorithm is used to construct a prediction model to predict potential selling prices for any real estate property. Additional features like air quality and crime rate were included in the dataset to help predict the prices even better. These features are not mostly included in the datasets of other prediction systems, which makes this system different. These features influence people’s decision while purchasing a property, so why not include it in predicting house prices. The trained model is integrated with the User Interface using the Flask Framework. The system provides 89% accuracy while predicting the prices for the real estate prices.

(6)

FUTURESCOPE

In the future, we are presenting a comparative study of the systems’ predicted price and the price from real estate websites such as Housing.com for the same user input. Also, to simplify it for the user, we are going to recommend real estate properties to the user based on the predicted price.

The current dataset only includes cities of Mumbai, expanding it to other cities and states of India is the future goal. To make the system even more informative and user-friendly, we will be including Gmap. This will show the neighborhood amenities such as hospitals, schools surrounding a region of 1 km from the given location. This can also be included in making predictions since the presence of such factors increases the valuation of real estate property.

REFERENCES

[1] Lakshmi, B. N., and G. H. Raghunandhan. "A conceptual overview of data mining." 2011 National Conference on Innovations in Emerging Technology. IEEE, 2011.

[2] Manjula, R., et al. "Real estate value prediction using multivariate regression models." Materials Science and Engineering Conference Series. Vol. 263. No. 4. 2017.

[3] A. Varma et al., “House Price Prediction Using Machine Learning And Neural Networks,” 2018 Second International Conference on Inventive Communication and Computational Technologies, pp. 1936–1939, 1936.

[4] Arietta, Sean M., et al. "City forensics: Using visual elements to predict non-visual city attributes." IEEE transactions on visualization and computer graphics 20.12 (2014): 2624-2633.

[5] Yu, H., and J. Wu. "Real estate price prediction with regression and classification CS 229 Autumn 2016 Project Final Report 1–5." (2016).

[6] Li, Li, and Kai-Hsuan Chu. "Prediction of real estate price variation based on economic parameters." 2017 International Conference on Applied System Innovation (ICASI). IEEE, 2017.

[7] NiharBhagat, AnkitMohokar, Shreyash Mane "House Price Forecasting using Data Mining"

International Journal of Computer Applications,2016.

[8] N. N. Ghosalkar and S. N. Dhage, "Real Estate Value Prediction Using Linear Regression," 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), Pune, India, 2018, pp. 1-5.