contributed by < weihsinyeh >
資訊所 葉惟欣
學號: R13922043
Problem 1: Zero-shot Image Captioning with LLaVA
Zero-shot: You cannot do any finetuning for the pretrained model.
In this problem, you only need to evaluate the pretrained LLaVA on image captioning task.
Input: (1)image (2)language instruction (3)generation config
Output: caption
Please use the model “llava-hf/llava-1.5-7b-hf” in the transformers package