o c²“iz<ã@s¢ddlmZmZmZddlZddlZddlZddl Z ddl mZmZmZddl mZddlmZddlmZddlmZmZmZe e¡ZeGdd „d eƒƒZdS) é)ÚBoxÚ MultiDiscreteÚTupleN)ÚCallableÚOptionalr)Ú MultiAgentEnv)Ú PolicySpec)Ú PublicAPI)ÚMultiAgentDictÚPolicyIDÚAgentIDc sÆeZdZdZdZdZdZ dd ed ee de de d e de f‡fdd„ Zdede eeeeeffdd„Zdddœde eeffdd„Zdd„Zedede eeegefffdd„ƒZ‡ZS)Ú Unity3DEnvapA MultiAgentEnv representing a single Unity3D game instance. For an example on how to use this Env with a running Unity3D editor or with a compiled game, see: `rllib/examples/unity3d_env_local.py` For an example on how to use it inside a Unity game client, which connects to an RLlib Policy server, see: `rllib/examples/envs/external_envs/unity3d_[client|server].py` Supports all Unity3D (MLAgents) examples, multi- or single-agent and gets converted automatically into an ExternalMultiAgentEnv, when used inside an RLlib PolicyClient for cloud/distributed training of Unity games. iŒirNFé,éèÚ file_nameÚportÚseedÚno_graphicsÚtimeout_waitÚepisode_horizoncsøtƒ ¡|dur tdƒddl}ddlm}d} | dur't t dd¡¡|p0|r.|j n|j} |r6tj nd} tj d7_ z||| | |||d|_td | | ¡ƒWn|jjy`Ynwnq|jj d ¡|_dd„|jDƒ|_||_d|_dS) a Initializes a Unity3DEnv object. Args: file_name (Optional[str]): Name of the Unity game binary. If None, will assume a locally running Unity3D editor to be used, instead. port (Optional[int]): Port number to connect to Unity environment. seed: A random seed value to use for the Unity3D game. no_graphics: Whether to run the Unity3D simulator in no-graphics mode. Default: False. timeout_wait: Time (in seconds) to wait for connection from the Unity3D instance. episode_horizon: A hard horizon to abide to. After at most this many steps (per-agent episode `step()` calls), the Unity3D game is reset and will start again (finishing the multi-agent episode that the game represents). Note: The game itself may contain its own episode length limits, which are always obeyed (on top of this value here). NzŠNo game binary provided, will use a running Unity editor instead. Make sure you are pressing the Play (|>) button in your editor to start.r)ÚUnityEnvironmentTéé )rÚ worker_idÚ base_portrrrz$Created UnityEnvironment for port {}Ú.cSsg|]}t|ƒ‘qS©)Úint)Ú.0ÚsrrúV/home/ubuntu/.local/lib/python3.10/site-packages/ray/rllib/env/wrappers/unity3d_env.pyÚ osz'Unity3DEnv.__init__..)ÚsuperÚ__init__ÚprintÚ mlagents_envsÚmlagents_envs.environmentrÚtimeÚsleepÚrandomÚrandintÚ_BASE_PORT_ENVIRONMENTÚ_BASE_PORT_EDITORr Ú _WORKER_IDÚ unity_envÚformatÚ exceptionÚUnityWorkerInUseExceptionÚAPI_VERSIONÚsplitÚapi_versionrÚepisode_timesteps)Úselfrrrrrrr%rÚport_Ú worker_id_©Ú __class__rr r#'sF ÿÿúÿæ zUnity3DEnv.__init__Úaction_dictÚreturncCs„ddlm}g}|jjD]}|jddks#|jddkrg|jddkrgg}|j |¡djD]}|d |¡}| |¡| ||¡q.|rf|dj t jkrW|t |¡d}n|t |¡d}|j ||¡q|j |¡dj ¡D]}|d |¡}| |¡|j ||||¡qrq|j ¡| ¡\} } }}} |jd7_|j|jkr»| | |tdd ifid d„|Dƒ¤Ž| fS| | ||| fS)aPerforms one multi-agent step through the game. Args: action_dict: Multi-agent action dict with: keys=agent identifier consisting of [MLagents behavior name, e.g. "Goalie?team=1"] + "_" + [Agent index, a unique MLAgent-assigned index per single agent] Returns: tuple: - obs: Multi-agent observation dict. Only those observations for which to get new actions are returned. - rewards: Rewards dict matching `obs`. - dones: Done dict with only an __all__ multi-agent entry in it. __all__=True, if episode is done for all agents. - infos: An (empty) info dict. r)ÚActionTupleréú_{})Ú continuous)ÚdiscreteÚ__all__TcSsi|]}|d“qS)Tr)rÚagent_idrrr Ú ºsz#Unity3DEnv.step..)Úmlagents_envs.base_envr=r.Úbehavior_specsr4Ú get_stepsrCr/ÚappendÚdtypeÚnpÚfloat32ÚarrayÚset_actionsÚagent_id_to_indexÚkeysÚset_action_for_agentÚstepÚ_get_step_resultsr5rÚdict)r6r;r=Ú all_agentsÚ behavior_nameÚactionsrCÚkeyÚaction_tupleÚobsÚrewardsÚterminatedsÚ truncatedsÚinfosrrr rQvsL € ÿþ ÿû ûzUnity3DEnv.step)rÚoptionscCs*d|_|j ¡| ¡\}}}}}||fS)z?Resets the entire Unity3D scene (a single multi-agent episode).r)r5r.ÚresetrR)r6rr^rYÚ_r]rrr r_Às zUnity3DEnv.resetc s,i}i}i}|jjD]€}|j |¡\}}|j ¡D]3\}‰|d |¡}t‡fdd„|jDƒƒ} t| ƒdkr:| dn| } | ||<|j ˆ|j ˆ||<q|j ¡D]7\}‰|d |¡}||vr}t‡fdd„|jDƒƒ} t| ƒdkrw| dn| ||<} |j ˆ|j ˆ||<qRq ||ddiddi|fS) aCollects those agents' obs/rewards that have to act in next `step`. Returns: Tuple: obs: Multi-agent observation dict. Only those observations for which to get new actions are returned. rewards: Rewards dict matching `obs`. dones: Done dict with only an __all__ multi-agent entry in it. __all__=True, if episode is done for all agents. infos: An (empty) info dict. r?c3ó|]}|ˆVqdS©Nr©rÚo©Úidxrr Ú áó€z/Unity3DEnv._get_step_results..rrc3rarbrrcrerr rgírhrBF)r.rFrGrNÚitemsr/ÚtuplerYÚlenÚrewardÚgroup_reward) r6rYrZr]rUÚdecision_stepsÚterminal_stepsrCrWÚosrrer rRÉs* ÿ ÿø zUnity3DEnv._get_step_resultsÚ game_namecs¶ttdƒtdƒdƒttdƒtdƒdƒttdƒtdƒdƒtttdƒtdƒdƒttdƒtdƒdƒttdƒtdƒdƒttdƒtdƒdƒgƒttdd d ƒtdd dƒgƒttdƒtdƒdƒtttdƒtdƒd ƒttdƒtdƒdƒgƒtttdƒtdƒdƒttdƒtdƒdƒttdƒtdƒdƒgƒttdƒtdƒdƒttdƒtdƒdƒttdƒtdƒdƒtttdƒtdƒdƒttdƒtdƒdƒgƒdœ}tdd dtjdtdd dtjdtgd¢ƒtdgƒtgd¢ƒtgd¢ƒtgd¢ƒtgd¢ƒtdd dƒtdgƒtdd dƒtgd¢ƒdœ}ˆdkr#t|d|dd t|d!|d!d d"œ}d#d$„}||fSˆd%krEt|d&|d&d t|d&|d&d d'œ}d(d$„}||fSˆt|ˆ|ˆd i}‡fd)d$„}||fS)*Nz-infÚinf)é)é-)é(rué)é8)r>gð¿gð?)i)éH)iâ)éç)é?)éé)r)é)éTr~é)éÔ)é1)Ú3DBallÚ 3DBallHardÚGridFoodCollectorÚPyramidsÚSoccerPlayerÚGoalieÚStrikerÚSorterÚTennisÚ VisualHallwayÚWalkerÚ FoodCollector)é)rI)rrrrŽé)rrr)r)é')r‚rƒr„r…r‡rˆr†r‰rŠr‹rŒrÚSoccerStrikersVsGoalier‡)Úobservation_spaceÚaction_spacerˆ)r‡rˆc[sd|vrdSdS)Nrˆr‡r©rCÚepisodeÚworkerÚkwargsrrr Úpolicy_mapping_fn`ózAUnity3DEnv.get_policy_configs_for_game..policy_mapping_fnÚ SoccerTwosr†)ÚPurplePlayerÚ BluePlayerc[sd|vrdSdS)NÚ1_rœr›rr”rrr r˜or™csˆSrbrr”©rqrr r˜zs)rÚfloatÚ TupleSpacerJrKrr)rqÚ obs_spacesÚ action_spacesÚpoliciesr˜rržr Úget_policy_configs_for_gameös¨ üÿ þÿþÿýöÿþÿË> é þþû æþþû÷þÿz&Unity3DEnv.get_policy_configs_for_game)NNrFrr)Ú__name__Ú __module__Ú__qualname__Ú__doc__r,r+r-ÚstrrrÚboolr#r rrQr_rRÚstaticmethodrSrrrr¤Ú __classcell__rrr9r r sTùþýüûúùOÿÿ þKÿ þ -ÿþr )Úgymnasium.spacesrrrr ÚloggingÚnumpyrJr)r'ÚtypingrrÚray.rllib.env.multi_agent_envrÚray.rllib.policy.policyrÚray.rllib.utils.annotationsr Úray.rllib.utils.typingr rrÚ getLoggerr¥Úloggerr rrrr Ús