# Job Descriptions NER ## Introduction In this exercise the objective is to analyse the performance of a Named-Entity recognition model trained over a list of `keywords` related to `job descriptions`. In order to accomplish this task you will be supplied with the following files: - Training set, a file containing a set of automatically annotated entities within a list of job descriptions. This file contains a set of lines (__separated by blank lines__) with the following format: ``` <START:jobtitle> encargado <END> a de <START:jobtitle> limpieza <END> <START:location> campo <END> de <START:location> gibraltar <END> <START:location> ceuta <END> 2014 <START:temporal> summer <END> <START:jobtitle> intern <END> program corporate <START:jobtitle> consultor <END> high street ``` - Test set, a smaller file with a set of manually annotated entities with the same format as above. - A folder containing the software needed for training and evaluating this type of models. ## Tagger Tools Inside the `tagger-tools` folder there is an executable in `bin/tagger-tools`. With this program you can train and evaluate the performance of a model: If you invoke the program without parameters the general usage of the program is printed out, for example: `.bin/tagger-tools` will produce the following output: ``` Usage: $RUN generate_training_set <input_dir> <output_file> $RUN train_model <iterations> <cutoff> <language> <training_data_file> <model_file> $RUN train_cv_model <iterations> <cutoff> <n_folds> <language> <training_data_file> <model_file> $RUN evaluate_model <model_file> <input_file> <output_file> $RUN extract_examples <elasticsearch_url> <elasticsearch_index> <country_code> <examples_output_file> $RUN extract_job_titles <credential_file> <spreadsheet_id> <output_path> ``` Here the relevant options are: #### train_model To train a new model given a new training set you can execute: - `train_model 100 4 es ner_training_source output_model` A new model using the input file `ner_training_source` will be created, this new model will be written to the `output_model` file. #### evaluate_model In order to evaluate the new trained model you can execute: - `evaluate_model output_model test-set evaluation.json` In this case a new json file will be created including all the evaluation relevant metrics. An example of an output file is as next: ``` { "evaluation_stats": { "proglang": { "tp": 0, "fp": 1, "tn": 0, "fn": 2, "precision": 0.0, "recall": 0.0, "f_measure": 0.0 }, "location": { "tp": 29, "fp": 12, "tn": 0, "fn": 36, "precision": 0.7073170731707317, "recall": 0.4461538461538462, "f_measure": 0.5471698113207548 }, "jobspec": { "tp": 6, "fp": 35, "tn": 0, "fn": 61, "precision": 0.14634146341463414, "recall": 0.08955223880597014, "f_measure": 0.1111111111111111 }, "global": { "tp": 170, "fp": 183, "tn": 0, "fn": 332, "precision": 0.48158640226628896, "recall": 0.3386454183266932, "f_measure": 0.39766081871345027 }, "temporal": { "tp": 0, "fp": 7, "tn": 0, "fn": 13, "precision": 0.0, "recall": 0.0, "f_measure": 0.0 }, "jobtitle": { "tp": 130, "fp": 115, "tn": 0, "fn": 197, "precision": 0.5306122448979592, "recall": 0.39755351681957185, "f_measure": 0.4545454545454546 }, "seniority": { "tp": 4, "fp": 11, "tn": 0, "fn": 6, "precision": 0.26666666666666666, "recall": 0.4, "f_measure": 0.32 }, "lang": { "tp": 1, "fp": 2, "tn": 0, "fn": 17, "precision": 0.3333333333333333, "recall": 0.05555555555555555, "f_measure": 0.09523809523809525 } } } ``` ## Objectives The objective here is quite broad and there is not a right or wrong solution, we are expecting a descriptive analysis of the `training` and `test` sets. From this analysis we should be able to answers questions like: - What entity classes are defined in the training-set and test-set? - What is the coverage of the annotation among classes, mind that the annotations in the training-set have been automatically created? - Should we increase the number of annotations for a specific class? - Have all classes an equivalent coverage in both sets, training and test? - Can you estimate a minimum number of annotations to obtain reliable results? - What is the performance for all classes? - There is any class with a poor performance? - How can we try to improve the performance for these bad performer classes? - In your opinion should we focus in `precision` or `recall`? What consecuences could have having a good `precision` vs a low `recall`? And in the opposite case? Extra points: - Can you think in a quick win way to improve the performance of the model? - Go for it!!!