Modelling Business Space Occupancy

# Modelling Business Space Occupancy The team, and our skillsets: https://www.geeksforgeeks.org/ways-to-import-csv-files-in-google-colab/ | Name | Specialities | | -------- | -------- | | Ellie | Data Analysis | | Bea | Data Analysis, Machine Learning | | Aldi | Stochastic Analysis | | Santi | Game Theory, Networks, ML | | Giulia | Numerical #stuff | Here are the relevant links: * [Google drive](https://drive.google.com/drive/folders/1xz9hn_Ex4lOIxfe5UbZ04nyQnrsa_ZTy) * [Google colab](https://colab.research.google.com/drive/1Jg5LPudsLjbXD2sqhCcMoC0JJRxiA__M) * [Overleaf](https://www.overleaf.com/project/62bec48688a16fb83d80db23) ## Tuesday ### Initial to do list 1. Clean data (remove repeating group labels) 2. Create some simple plots * pie chart of building types * building type vs. rate * building type vs. distance from city centre * distance from city centre vs. rate * histogram of length of occupancy 3. Group property categories together * Shops * Cafes + pubs + restaurants * Offices * Factories + warehouses + workshops * Other (things we aren't interested in, at least for now... ) ### Interpreting the data as a graph We can represent the properties in Luton as nodes of a graph, and connect nodes based on their euclidean distance (connected if euclidean distance <x, can vary x and see how it effects the graph). Each node will be encoded as occupied/unoccupied. Additionally can encode by the property type. *Hypothesis:* When a property/node becomes unoccupied it has a knock on effect for nearby properties. Thinking to simplify things we can start by just looking at a retail graph - i.e. shops + restaurants. First thing is to create the graph. We can simplify this process using [GriSPy](https://grispy.readthedocs.io/en/latest/). ### Occupied or Empty Sites | ![](https://i.imgur.com/rQX07GQ.png) | |:--:| | Map of empty (red) and occupied (blue) businesses. | |![](https://i.imgur.com/pcbSxdy.png)| |:--:| | Map of empty (red) and occupied (blue) businesses, combined into one map. | ### Location by business type ![](https://i.imgur.com/bvKuK7d.png) ### Histograms with the new categories ## Wednesday ### Graph Working on embedding the graph onto a map. ![](https://i.imgur.com/jhPsZsl.png) * Thinking we can use this for feature mining using the graph structure. ### Random Forest Built random forest classifier. The input features are: * Business type (SHOP AND PREMISES, RESTAURANT AND PREMISES, OFFICE AND PREMISES, FACTORY AND PREMISES, OTHER) * Rateable value * Distance from city centre (city centre was chosen to be the town hall) Example of the random forest tree: ![](https://i.imgur.com/iIkdqGm.png) *Results* On test set got accuracy score of 0.72. On (bootstrapped) ensemble test set got accuracy score of 0.76. Have confusion matrix: ![](https://i.imgur.com/XDLDmf6.png) This is for the first Random Forest tree, and the results of the ensemble test. Decided to add additional features: * Business type (SHOP AND PREMISES, RESTAURANT AND PREMISES, OFFICE AND PREMISES, FACTORY AND PREMISES, OTHER) * Rateable value (binned into range categories) * Distance from city centre (city centre was chosen to be the town hall) * Radial distance from city centre Example of the random forest tree with the additional features: ![](https://i.imgur.com/1UNGodW.png) and have new conversion matrix: ![](https://i.imgur.com/wKQ7cVq.png) ### Including data from other sources Map of Luton colour coded according to the deprivation index: ![](https://i.imgur.com/xCYz8VM.png) ## Thursday ### Random Forest Decided to add more additional features: * Business type (SHOP AND PREMISES, RESTAURANT AND PREMISES, OFFICE AND PREMISES, FACTORY AND PREMISES, OTHER) * Rateable value (binned into quartiles) * Distance from city centre (city centre was chosen to be the town hall) * Radial distance from city centre * deprivation index * house price index The businesses that were found to be more likely to become unoccupied are shown in the following figure: ![](https://i.imgur.com/3yRg2Za.png) We can see some clustering in the location of businesses of the same type. For example, four of the factories in this category (orange) are in the same area, to the west of the city centre along the river. It should be noted that all of the places classified as *vulnerable* were small businesses. They appear to be the ones that are the most susceptible to the other features that were used in the model. ![](https://i.imgur.com/NhE7AZ3.png) This map shows vacant sites that our model predict to have the highest probability to be occupied. Again, we can see patterns in their location. In this case, some of them are alligned along the city's main arteries, indicating them as the best places for future development.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.