o TÃi"ã@sddlmZmZddlmZddlmZmZddlm Z ddl mZeGdd„dƒƒZdd „Z e d ƒZedkreeƒZe ¡dZedd dZeje ejejddZejZe ejdd¡ed<e e¡ZejdddZejre ej¡ejejdddSdSdS)é)Ú dataclassÚfield)ÚOptional)ÚfeaturesÚload_dataset)Ú ModelCard)ÚHfArgumentParserc@s^eZdZUdZedddidZeed<edddidZe ed <ed ddidZ eeed<d S) ÚScriptArgumentsa¼ Arguments for the script. Args: push_to_hub (`bool`, *optional*, defaults to `False`): Whether to push the dataset to the Hugging Face Hub. repo_id (`str`, *optional*, defaults to `"trl-lib/rlaif-v"`): Hugging Face repository ID to push the dataset to. dataset_num_proc (`int` or `None`, *optional*, defaults to `None`): Number of workers to use for dataset processing. FÚhelpz4Whether to push the dataset to the Hugging Face Hub.)ÚdefaultÚmetadataÚpush_to_hubztrl-lib/rlaif-vz2Hugging Face repository ID to push the dataset to.Úrepo_idNz0Number of workers to use for dataset processing.Údataset_num_proc) Ú__name__Ú __module__Ú__qualname__Ú__doc__rr ÚboolÚ__annotations__rÚstrrrÚint©rrúM/home/ubuntu/.local/lib/python3.10/site-packages/examples/datasets/rlaif-v.pyr s þþþr cCsbdddid|ddœgdœg}dd|d dœgdœg}dd|d dœgdœg}||dg||dœS)a Convert prompt from "xxx" to [{"role": "user", "content": [{"type": "image"}, {"type": "text", "text": "xxx"}]}] and chosen and rejected from "xxx" to [{"role": "assistant", "content": [{"type": "text", "text": "xxx"}]}]. Images are wrapped into a list. ÚuserÚtypeÚimageÚtextÚquestion)rr)ÚroleÚcontentÚ assistantÚchosenÚrejected)ÚpromptÚimagesr"r#r)Úexampler$r"r#rrrÚto_conversational3sr'a> --- tags: [trl] --- # RLAIF-V Dataset ## Summary The RLAIF-V dataset is a processed version of the [openbmb/RLAIF-V-Dataset](https://huggingface.co/datasets/openbmb/RLAIF-V-Dataset#dataset-card-for-rlaif-v-dataset), specifically curated to train vision-language models using the [TRL library](https://github.com/huggingface/trl) for preference learning tasks. It contains 83,132 high-quality comparison pairs, each comprising an image and two textual descriptions: one preferred and one rejected. This dataset enables models to learn human preferences in visual contexts, enhancing their ability to generate and evaluate image captions. ## Data Structure - **Format**: [Conversational](https://huggingface.co/docs/trl/main/dataset_formats#conversational) - **Type**: [Preference](https://huggingface.co/docs/trl/main/dataset_formats#preference) Columns: - `"prompt"`: The task related to the image. - `"images"`: The image. - `"chosen"`: The preferred answer. - `"rejected"`: An alternative answer that was not preferred. This structure allows models to learn to prefer the _chosen_ response over the _rejected_ one, thereby aligning with human preferences in visual tasks. ## Generation script The script used to generate this dataset can be found [here](https://github.com/huggingface/trl/blob/main/examples/datasets/rlaif-v.py). Ú__main__zopenbmb/RLAIF-V-DatasetÚtrain)Úsplité€)Únum_procÚremove_columnsÚwriter_batch_sizeT)Údecoder%g{®Gáz„?)Ú test_sizer.Údataset)Ú repo_typeN)ÚdataclassesrrÚtypingrÚdatasetsrrÚhuggingface_hubrÚtransformersrr r'Ú model_cardrÚparserÚparse_args_into_dataclassesÚscript_argsr1ÚmaprÚcolumn_namesÚfÚSequenceÚImageÚcastÚtrain_test_splitr rrrrrÚs8ü ë