--- title: AI Research tags: Product Research --- ## Prepare Training Data Training data is how you teach GPT-3 what you'd like it to say. Your data must be a JSONL document, where each line is a prompt-completion pair corresponding to a training example. You can use our CLI data preparation tool to easily convert your data into this file format. ``` {"prompt": "<prompt text>", "completion": "<ideal generated text>"} {"prompt": "<prompt text>", "completion": "<ideal generated text>"} {"prompt": "<prompt text>", "completion": "<ideal generated text>"} ... ``` ## CLI Data Preperation Tool ``` openai tools fine_tunes.prepare_data -f <LOCAL_FILE> ``` ## Preparing Dataset General best practices Fine-tuning performs better with more high-quality examples. To fine-tune a model that performs better than using a high-quality prompt with our base models, you should provide at least a few hundred high-quality examples, ideally vetted by human experts. From there, performance tends to linearly increase with every doubling of the number of examples. Increasing the number of examples is usually the best and most reliable way of improving performance ## Classification Problems Classification In classification problems, each input in the prompt should be classified into one of the predefined classes. For this type of problem, we recommend: - Use a separator at the end of the prompt, e.g. \n\n###\n\n. Remember to also append this separator when you eventually make requests to your model. - Choose classes that map to a single token. At inference time, specify max_tokens=1 since you only need the first token for classification. - Ensure that the prompt + completion doesn't exceed 2048 tokens, including the separator - Aim for at least ~100 examples per class - To get class log probabilities you can specify logprobs=5 (for 5 classes) when using your model - Ensure that the dataset used for finetuning is very similar in structure and type of task as what the model will be used for{"prompt":"Company: BHFF insurance\nProduct: allround insurance\nAd:One stop shop for all your insurance needs!\nSupported:", "completion":" yes"} ``` {"prompt":"Company: Loft conversion specialists\nProduct: -\nAd:Straight teeth in weeks!\nSupported:", "completion":" no"} ``` ``` {"prompt":"Company: BHFF insurance\nProduct: allround insurance\nAd:One stop shop for all your insurance needs!\nSupported:", "completion":" yes"} {"prompt":"Company: Loft conversion specialists\nProduct: -\nAd:Straight teeth in weeks!\nSupported:", "completion":" no"} ``` ``` {"prompt":"Subject: <email_subject>\nFrom:<customer_name>\nDate:<date>\nContent:<email_body>\n\n###\n\n", "completion":" <numerical_category>"} ``` --- ```{"prompt":"<Product Name>\n<Wikipedia description>\n\n###\n\n", "completion":" <engaging ad> END"}``` ``` {"prompt":"Samsung Galaxy Feel\nThe Samsung Galaxy Feel is an Android smartphone developed by Samsung Electronics exclusively for the Japanese market. The phone was released in June 2017 and was sold by NTT Docomo. It runs on Android 7.0 (Nougat), has a 4.7 inch display, and a 3000 mAh battery.\nSoftware\nSamsung Galaxy Feel runs on Android 7.0 (Nougat), but can be later updated to Android 8.0 (Oreo).\nHardware\nSamsung Galaxy Feel has a 4.7 inch Super AMOLED HD display, 16 MP back facing and 5 MP front facing cameras. It has a 3000 mAh battery, a 1.6 GHz Octa-Core ARM Cortex-A53 CPU, and an ARM Mali-T830 MP1 700 MHz GPU. It comes with 32GB of internal storage, expandable to 256GB via microSD. Aside from its software and hardware specifications, Samsung also introduced a unique a hole in the phone's shell to accommodate the Japanese perceived penchant for personalizing their mobile phones. The Galaxy Feel's battery was also touted as a major selling point since the market favors handsets with longer battery life. The device is also waterproof and supports 1seg digital broadcasts using an antenna that is sold separately.\n\n###\n\n", "completion":"Looking for a smartphone that can do it all? Look no further than Samsung Galaxy Feel! With a slim and sleek design, our latest smartphone features high-quality picture and video capabilities, as well as an award winning battery life. END"} ``` ## Notes - Impact: This dimension evaluates the degree to which a contribution has a positive impact on the organization's mission, goals, and objectives. - Originality: This dimension evaluates the degree to which a contribution is new, unique, or innovative. - Reusability: This dimension evaluates the degree to which a contribution can be used in other areas or projects within the organization. - Complexity: This dimension evaluates the degree of technical skill or effort required to make the contribution. - Community engagement: This dimension evaluates the degree to which the contribution improves or advances community engagement. - Role and level of expertise: This dimension evaluates the role of the user in the organization and the level of expertise of the person. - Time: This dimension evaluates the time a user invested in the project. en down into sub-criteria and evaluated using a numerical scale (e.g. 1-5, 1-10) or a set of predefined categories (e.g. low, medium, high). The values of each dimension can be multiplied by a weight to reflect the importance that the organization assigns to each dimension. It's worth noting that the matrix should be flexible and adaptable to the organization's needs and goals, and it could be reviewed and updated regularly.