o Tأi7م@s@ddlmZmZddlmZddlmZeGdd„deƒƒZdS)é)ع dataclassعfield)عOptional)عTrainingArgumentscs>eZdZUdZedddidچZeed<edddidچZeed <ed ddidچZ e eed<ed ddidچZeed<edddidچZ e eed<edddidچZe eed<ed ddidچZe eed<ed ddidچZeed<edddidچZeed<edddidچZeed<ed dd idچZe eed!<‡fd"d#„Z‡ZS)$ع PRMConfiga1 Configuration class for the [`PRMTrainer`]. This class includes only the parameters that are specific to PRM training. For a full list of training arguments, please refer to the [`~transformers.TrainingArguments`] documentation. Note that default values in this class may differ from those in [`~transformers.TrainingArguments`]. Using [`~transformers.HfArgumentParser`] we can turn this class into [argparse](https://docs.python.org/3/library/argparse#module-argparse) arguments that can be specified on the command line. Parameters: max_length (`int` or `None`, *optional*, defaults to `1024`): Maximum length of the sequences (prompt + completion) used for truncation. max_prompt_length (`int` or `None`, *optional*, defaults to `512`): Maximum length of the prompt used for truncation. max_completion_length (`int` or `None`, *optional*, defaults to `None`): Maximum length of the completion used for truncation. The completion is the concatenation of the steps. disable_dropout (`bool`, *optional*, defaults to `True`): Whether to disable dropout in the model. step_separator (`str`, *optional*, defaults to `"\n"`): Separator used to separate each step of the reasoning process. train_on_last_step_only (`bool`, *optional*, defaults to `False`): Whether to train only on the last step. dataset_num_proc (`int`, *optional*, defaults to `None`): Number of processes to use for processing the dataset. gٌhمˆµّن>عhelpz$The initial learning rate for AdamW.)عdefaultعmetadataع learning_rateé z•Log every X updates steps. Should be an integer or a float in range `[0,1)`. If smaller than 1, will be interpreted as ratio of total training steps.ع logging_stepsNzرWhether to use bf16 (mixed) precision instead of 32-bit. Requires Ampere or higher NVIDIA architecture or Intel XPU or using CPU (use_cpu) or Ascend NPU. If not set, it defaults to `True` if `fp16` is not set.عbf16TzضWhether or not to average tokens across devices. If enabled, will use all_reduce to synchronize num_tokens_in_batch for precise loss calculation. Reference: https://github.com/huggingface/transformers/issues/34242 عaverage_tokens_across_devicesizJMaximum length of the sequences (prompt + completion) used for truncation.ع max_lengthiz1Maximum length of the prompt used for truncation.عmax_prompt_lengthzgMaximum length of the completion used for truncation. The completion is the concatenation of the steps.عmax_completion_lengthzSeparator used to separate each step of the reasoning process.عstep_separatorFz'Whether to train only on the last step.عtrain_on_last_step_onlyz6Number of processes to use for processing the dataset.عdataset_num_proccs(|jdur |jn|j|_tƒ ،dS)N)r عfp16عsuperع __post_init__)عself©ع __class__©ْJ/home/ubuntu/.local/lib/python3.10/site-packages/trl/trainer/prm_config.pyrrszPRMConfig.__post_init__)ع__name__ع __module__ع__qualname__ع__doc__rr عfloatع__annotations__rr rعboolrrعintrrrrعstrrrrع __classcell__rrrrrsf ‏ے‏ے‏ے‏‏‏ے‏‏‏‏‏rN)عdataclassesrrعtypingrعtransformersrrrrrrعs