Try   HackMD

Data Scientist Assessment
Conversion Speed Prediction

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
This assessment is in beta and may be changed.
Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →
Please fill our NDA/IP before starting, as it covers our entire interview process.

In this challenge you are required to analyze online videos conversion speed and factors affecting it. All of the files were taken from youku and converted to multiple formats with FFmpeg. Both incoming and outgoing video attributes as well as conversion time are recorded in data.pdf.

Data Corpus

The source data consists of comma separated records in PDF format. The first row is a header with the following fields:

id,key,type,length,i,p,i_size,p_size,size,frames,i_width,i_height,i_codec,i_bitrate,i_framerate,o_width,o_height,o_codec,o_bitrate,o_framerate,target

Each row has unique id. The key field represents the movie identifier. The same movie might have been converted several times. The movie type represents the content category. Example data records are below:

0,1,Gaming,130.35667,27,1537,64483,825054,889537,1564,176,144,mpeg4,54590,12.0,176,144,mpeg4,56000,12.0,0.612
1,1,Gaming,130.35667,27,1537,64483,825054,889537,1564,176,144,mpeg4,54590,12.0,320,240,mpeg4,56000,12.0,0.98
2,1,Gaming,130.35667,27,1537,64483,825054,889537,1564,176,144,mpeg4,54590,12.0,480,360,mpeg4,56000,12.0,1.216

Deliverables

  1. You are required to extract the data from the PDF file and analyze it. Submit your extraction code in any programming language and write instructions about how to execute it. Your program should extract the data in a reasonable time (less than 10 minutes).

  2. Then you are required to build forecasting model to predict movie conversion time based on incoming and outoing file characteristics. Feel free to use any programming language for this task. Deliverables are source code files and instructions about how to setup and run your program.

    Output of the training phase should be a model file. Example of the training command is shown below:

    ​$ cat train.csv | ./train_my_amazing_model > model
    

    Testing phase outputs prediction time for every row in test file:

    ​$ cat test.csv | ./my_amazing_model > predictions.csv
    ​$ head -n2 predictions.csv
    ​id,prediction
    ​0,1.23
    
  3. Finally, you are required to prepare a 20 minute presentation explaining your data extraction and processing process, descriptive analysis of the data, forecasting model, and business impact.

    Please provide enough information about why a certain model was chosen, what alternative approaches there are and how quality could be improved.

    Note that there would be technical and non-technical people in the audience. Your presentation deliverable should be a video AND attached presentation in PDF/PPT format. Please prepare for questions.

Submission

In Terminal 1 we believe that really good analysts are also good developers and presenters, hence your code quality, predictive model performance and presentation structure would all be evaluated.

  1. You can find our grading guidelines at https://t1.gl/review.
  2. Submit your assessment at https://t1.gl/submit-assessment.

Copyright © 2016-2020 Terminal 1 Limited.