
# Blog entry - Predicting City Bicycle Station Occupancy
City of Helsinki runs a popular city bicycle scheme. In past years it has been extended to cover the neighboring town Espoo, in addition to Helsinki. The urban population in these towns - even non-bicyclists - are generally happy with the system. It is well-run and orderly, as the system requires bicycles to be returned at bicycle stalls instead of allowing them to be left lying around on sidewalks and parks.
However, being faced with an empty bicycle station when one has planned a ride, is an obvious dissatisfaction for a city bike customer. It is also foreseeable that, lacking a reliable model for predicting bicycle demand, the resulting need of re-filling stations reactively generates overheads and friction in the organization responsible for running the bicycle operation.
The [Introduction to Data Science course](https://courses.helsinki.fi/en/data11001/124843910) at University of Helsinki involves a mini project assignment on its students' topic of choice. Our project team (everyone keen city bicycle users) felt that this was a good opportunity to contribute to improving the bicycle scheme further - for our own convenience, and for the common good. We wanted to demonstrate that using the openly available data, it would be possible to create a machine learning model capable of predicting the bicycle demand with useful accuracy.
City of Helsinki publishes bicycle usage statics as openly available data sets ([trip data](https://www.hsl.fi/hsl/avoin-data) and [station data](https://www.avoindata.fi/data/fi/dataset/hsl-n-kaupunkipyoraasemat)) . Additionally, open programming interfaces are provided for querying the station status. These data, combined with weather, geolocation and demographic data, allowed us to build a model, which predicts with reasonable accuracy the stations that are likely to miss bikes on any given days The most critical hours can be identified, and maintenance actions planned accordingly.
## Proof-of-concept system

The data series above shows an example of our model accuracy. The blue line shows the prediction for bicycle departures during May 11th - 17th in 2021 from the Rautatientori bicycle station(at the main railway station). The orange line shows what actually realized. This accuracy was reached with reasonable efforts and limited amount or model parameters. We believe this proves that there is potential for very useful results with professional level development and extended data access.
[](https://city-bikes-react-app.oa.r.appspot.com/)
To test the model, we created a simple [web user interface](https://city-bikes-react-app.oa.r.appspot.com/) above (click to experiment). It allows the user to determine a date and time of interest, and receive a forecast like the one below, of the stations with the highest risk of running out of bicycles at that point in time. The UI was implemented for demonstration purposes. Clearly a professional system would require a sophisticated operations dashboard, but this was beyond the scope of this data science project.
