# QuilL at 2023 Samsung AI Challenge: Image Quality Assessment ## 01. Prepare the testing set For captioning inference, we should extract the features from a pre-trained model. First, check the `inference_config.py` and replace the key `test_dis_path` with the path of folder including testing images. Then, run the file: ``` CUDA_VISIBLE_DEVICES=0 python custom_extract_features.py ``` Each image will be represented as 49 x 2048 feature vectors. Features of all test images are stored in the file `./test.hdf5` at the root path. ## 02. Doing end-to-end inference The end-to-end inference is splitted into two stages: IQA and captioning inference. First, check the `inference_config.py` and only change some keys: In terms of `ic_config` (configs for image captioning model), change the `.hdf5` path of test image features extracted by step 1 at key `features_path` (by default it should be `./test.hdf5`). Also need to change the path of `.json` information file at key `path_meta_test_data`. For example: - `"features_path": "/root-path/test.hdf5"` - `"path_meta_test_data": "/root-path/test.json"` `test.json` has the following structure: ``` { "images": [ { "file_name": "./dataset/test/j00zs3u6dr.jpg", "id": 0 }, { "file_name": "./dataset/test/ytv70so3zb.jpg", "id": 1 } ] } ``` In terms of `iqa_config` (configs for IQA model), change the image folder and `.csv` information file at two keys `test_dis_path` and `dis_test_path`, respectively. `test.csv` has the following structure: ``` "img_name","img_path" "j00zs3u6dr","./dataset/test/j00zs3u6dr.jpg" "ytv70so3zb","./dataset/test/ytv70so3zb.jpg" ``` ``test.csv`` is used to ensure the order of filenames. All filenames including in `test.csv` should be included in image folder at `test_dis_path`. For example: - `"test_dis_path": "/root-path/test"` - `"dis_test_path": "/root-path/test.csv"` `test.json` and `test.csv` are already available in the source code folder. Then just run the inference file: ``` CUDA_VISIBLE_DEVICES=0 python end_to_end_inference.py ``` The submission file will be generated as `full_submission.csv` The results produced by provided source code are: - Public test: MOS: 0.76895 Captioning score: 1.3576489877 - Private test MOS: 0.56685 Captioning score: 0.9365593996 while our results on leaderboard are: - Public test: MOS: 0.76895 Captioning score: 1.3754521458 - Private test MOS: 0.56685 Captioning score: 0.9377 The captioning score reproduced is 0.0001 worse than the score in the private leaderboard. It is due to some variances because of the checkpoints saving. However, this result is still valid because it doesnt affect the current leaderboard, i.e., it is not higher than the 4th team, and is not less than the 6th team. https://drive.google.com/drive/folders/1GfdfV4ycxP1HgkguSpSBYrQyf1R3aePZ?usp=drive_link