Try   HackMD

Data Scientist Assessment
Flight Delay Prediction

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’
Please sign up to be eligible for up to HK$6,000 completion bonus.
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’
This assessment awards an Intermediate level certificate.

You will be given a learning data set with flight delay claims from 2013/01 to 2016/07. The goal is to predict claim amount for the future flights in hidden data set. Please use any machine learning techniques to make your prediction as accurate as possible. We judge model quality

Q using mean absolute error
Q1
and mean squared (Brier error) error
Q2
between the predicted claim and actual claim.

Q1=1Nβˆ‘i=1N|predictioniβˆ’actuali|Q2=1Nβˆ‘i=1N(predictioniβˆ’actuali)2

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More β†’
Bonus Point: we welcome you to add extra parameters/data to the learning & testing data test for more precise prediction. This could become one of the evaluation criteria for us (since it demonstrates how good you are in terms of crawling other useful data).

Business Objectives

A higher amount of predicted claim, with the cap of $800 as an arbitrary value, should be assigned to the high-risk flights. This is to adequately compensate the risk we need to take and naturally screen out the high risks flights.

A lower amount of predicted claim, as low as 0, should be assigned to the low-risk flights. This is to increase the conversion for a low-risk customer to buy and expand the risk pool.

The optimization should result in a very low absolute error

|Expected(is_claim)–Actual(is_claim)| in an aggregated manner, which means we are not over/under-charging the customer (i.e. what we call precisely & dynamically price the risk for each customer).

Judging Criteria

The claim logic for Flight Delay Refund is:

  • If the delay_time field is greater than 3 hours OR equal to β€˜Cancelled’, $800 will be claimed. Otherwise, claim amount will equal to $0.
  • We will calculate the result by taking average Absolute error
    Q1
    and Brier error
    Q2
    (should be your key optimization target) among all testing data.

Deliverables

  1. Upload your source code to either GitHub or Bitbucket. Feel free to use any programming language and libraries.
  2. We would download and run your code. Please make sure it is executable and there are instructions on how to setup the environment. In general, we use following guidelines to assess the submission.
  3. Prepare a short presentation in PowerPoint or PDF format, with your thought about the problem, description of your approach and next steps, e.g. anything you did not implement or possible improvement areas. We would mainly judge the content of the presentation, please focus on that instead of its design or visual effects.

Skills to be graded

  1. Data Visualization
  2. Machine Learning
  3. Scripting and Command Line
  4. Communication and Business Acumen

Submit Assessment: https://t1.gl/submit-assessment.

Download Data

https://drive.google.com/a/terminal1.co/file/d/1AkEc76q6NbqEojk3BQJEfbx-RIigDCve/view?usp=sharing
(46MB CSV file)

Data Description

Field name Description
flight_id Unique ID for each flight
Flight_no flight number of each flight
Week Indicate which week of year is the departure date in, For example, for flight departing at 17/1/2018, the week will be 3
Departure Location of departure
Arrival Location of arrival
Std_hour Scheduled departure time, in 24-hour format
delay_time Number of delayed hours
is_claim Claim amount, our insurance will pay customer a fixed amount of HK$800 when a delay happens. During your prediction, you can assign any value between $0 to $800 as the expected value of predicted claim amount. Absolute error and Brier error will then be calculated based on the difference between actual claim amount & your predicted claim amount.

Copyright Β© 2016-2020 Terminal 1 Limited.