In this challenge you are required to analyze online videos conversion speed and factors affecting it. All of the files were taken from youku and converted to multiple formats with FFmpeg. Both incoming and outgoing video attributes as well as conversion time are recorded in data.pdf.
The source data consists of comma separated records in PDF format. The first row is a header with the following fields:
Each row has unique id
. The key
field represents the movie identifier. The same movie might have been converted several times. The movie type
represents the content category. Example data records are below:
You are required to extract the data from the PDF file and analyze it. Submit your extraction code in any programming language and write instructions about how to execute it. Your program should extract the data in a reasonable time (less than 10 minutes).
Then you are required to build forecasting model to predict movie conversion time based on incoming and outoing file characteristics. Feel free to use any programming language for this task. Deliverables are source code files and instructions about how to setup and run your program.
Output of the training phase should be a model file. Example of the training command is shown below:
Testing phase outputs prediction time for every row in test file:
Finally, you are required to prepare a 20 minute presentation explaining your data extraction and processing process, descriptive analysis of the data, forecasting model, and business impact.
Please provide enough information about why a certain model was chosen, what alternative approaches there are and how quality could be improved.
Note that there would be technical and non-technical people in the audience. Your presentation deliverable should be a video AND attached presentation in PDF/PPT format. Please prepare for questions.
In Terminal 1 we believe that really good analysts are also good developers and presenters, hence your code quality, predictive model performance and presentation structure would all be evaluated.
Copyright © 2016-2020 Terminal 1 Limited.