It has been a while since I made the last git commit in the project repository. Although I am a bit busy with my daytime job recently, I will keep the side project rolling. This blog post is about my recent reflection on the project direction. There are two goals I want to achieve in …
Predicting House Price in Hong Kong #5
in #4, I talked about how to find the hidden APIs of Centaline Android App. In this article, I am going to show you how to write a scraper to scrape the data. You can find the Scrapy Crawler in this Git directory. 3 Levels of Data The scraping was done in 3 levels. First, …
Predicting House Price in Hong Kong #4
Date: 26 July 2020 In #3, I faced difficulty in having a lot of missing values in the property transaction data. After searching the web, I found that Centaline claimed they have spent 10 million HKD to fill the missing values. Maybe, I should try scraping data from Centaline. Luckily, I have scrapped Centaline data …
Predicting House Price in Hong Kong #3
Date: 13 July 2020 In part 2, we have talked about splitting data into a training set and testing set. In part 3, I would like to share some findings on the first data exploration. Big Problem: Missing Data I did not expect there are so many missing data in some of the key fields. I …
Predicting House Price in Hong Kong #2
In part 1, we have talked about scraping transactional data of apartments. In part 2, let’s talk about splitting data into a training set and testing set. Why we need to split? The short answer is we want to prevent overfitting/memorizing and hope that the model trained can generalize into unknown cases. For example, if …
Predicting House Price in Hong Kong #1
Date: 7 July 2020 Housing is still one of the greatest concerns of Hong Kong people. I think it will be fun to build a model to predict the house price in Hong Kong. Unlike, Kaggle competition, data is not readily available. So, the first step, which I am going to share in this post, …