o c²“i†gã @s~dZddlZddlZddlmZddlZddlZddlm Z e e¡Zzddl mZdZdaWn ey<dZdaYnwz ddlmZdZWneyRdZYnwz dd lmZdZWneyhdZYnwd d„Zdd „Zdd„ZGdd„deƒZeƒadd„Ze jj ddfde!de!de"de!fdd„Z#e jj ddfde!dee!de"de!fdd„Z$dbde"ddfd d!„Z%dbde"de!fd"d#„Z&dbde"de!fd$d%„Z'de j(j)fde"fd&d'„Z*de j(j)fd(e+de"fd)d*„Z,dbde"fd+d,„Z-dde j(j)fd-e!de"fd.d/„Z.ddde j(j)fd(e+d-e!d0e!de"fd1d2„Z/dcd3e!de"fd4d5„Z0 ddd3e!d6e!de"fd7d8„Z1dbd(e+de"fd9d:„Z2 dbd;e+d„Z3de j(j)fd(e+de"fd?d@„Z4de j(j)fde"fdAdB„Z5dbd-e!de"fdCdD„Z6 ded-e!dEe!de"dFe!fdGdH„Z7dbd3e!de"fdIdJ„Z8 ded3e!dKe!de"dFe!fdLdM„Z9dNe!fdOdP„Z:dbde"fdQdR„Z;dSdT„Zde!fdZd[„Z?d\d]„Z@d^d_„ZAd`da„ZBdS)fz5APIs exposed under the namespace ray.util.collective.éN)ÚList)Útypes)Ú NCCLGroupTF)Ú GLOOGroup)ÚTorchGLOOGroupcCst ¡r tr t d¡datS)NzqNCCL seems unavailable. Please install Cupy following the guide at: https://docs.cupy.dev/en/stable/install.html.F)ÚrayÚget_gpu_idsÚ_LOG_NCCL_WARNINGÚloggerÚwarningÚ_NCCL_AVAILABLE©r r úR/home/ubuntu/.local/lib/python3.10/site-packages/ray/util/collective/collective.pyÚnccl_available(sÿrcCótS©N)Ú_GLOO_AVAILABLEr r r rÚgloo_available4órcCrr)Ú_TORCH_DISTRIBUTED_AVAILABLEr r r rÚtorch_distributed_available8rrc@s8eZdZdZdd„Zdd„Zdd„Zdd „Zd d„ZdS) ÚGroupManageraUse this class to manage the collective groups we created so far. Each process will have an instance of `GroupManager`. Each process could belong to multiple collective groups. The membership information and other metadata are stored in the global `_group_mgr` object. cCsi|_i|_dSr)Ú_name_group_mapÚ_group_name_map)Úselfr r rÚ__init__Ds zGroupManager.__init__cCsÐt |¡}|tjjkrtdƒ‚|tjjkr(t d |¡¡t|||dd|d}n1|tjj kr=t d |¡¡t |||ƒ}n|tjjkrRt d |¡¡t|||ƒ}ntd|›ƒ‚||j |<||j|<|j |S) z¥The entry to create new collective groups in the manager. Put the registration and the group information into the manager metadata as well. zRay does not support MPI.zCreating GLOO group: '{}'...Úray_internal_kvÚtcp)Ú store_typeÚdevice_typeÚgloo_timeoutzCreating NCCL group: '{}'...z.Creating torch.distributed GLOO group: '{}'...zUnexpected backend: )rÚBackendÚMPIÚRuntimeErrorÚGLOOr ÚdebugÚformatrÚNCCLrÚ TORCH_GLOOrrr)rÚbackendÚ world_sizeÚrankÚ group_namer Úgr r rÚcreate_collective_groupHs2 úÿ z$GroupManager.create_collective_groupcCs ||jvSr)r©rr,r r rÚis_group_existms zGroupManager.is_group_existcCs(| |¡st d |¡¡dS|j|S)z,Get the collective group handle by its name.z"The group '{}' is not initialized.N)r0r rr&rr/r r rÚget_group_by_nameps zGroupManager.get_group_by_namecCsx| |¡st d |¡¡dS|j|}|j|=|j|=| ¡d|}z t |¡}t |¡WdSt y;YdSw)zGroup destructor.zThe group '{}' does not exist.NÚinfo_)r0r rr&rrÚ destroy_grouprÚ get_actorÚkillÚ ValueError)rr,r-ÚnameÚstorer r rÚdestroy_collective_groupws ÿz%GroupManager.destroy_collective_groupN) Ú__name__Ú __module__Ú__qualname__Ú__doc__rr.r0r1r9r r r rr<s%rcCs t |¡S)zDCheck if the group is initialized in this process by the group name.)Ú _group_mgrr0©r,r r rÚis_group_initialized‘s r@Údefaulté0ur*r+r,r cCsvtƒt |¡}t|ƒ|std |¡ƒ‚t |¡rtdƒ‚|dks$J‚|dks*J‚||ks0J‚t |||||¡dS)a=Initialize a collective group inside an actor process. Args: world_size: the total number of processes in the group. rank: the rank of the current process. backend: the CCL backend to use, NCCL or GLOO. group_name: the name of the collective group. Returns: None z%group_name '{}' needs to be a string.ú#Trying to initialize a group twice.rN) Ú_check_inside_actorrr!Ú_check_backend_availabilityr6r&r>r0r#r.)r*r+r)r,r r r rÚinit_collective_group–s ÿrFÚranksc Cs6t |¡}t|ƒd|}z t |¡tdƒ‚tyYnwt|ƒt|ƒkr4td t|ƒt|ƒ¡ƒ‚t |ƒt t t|ƒƒƒkrRtd t|ƒd dd„|Dƒ¡¡ƒ‚|dkr]td |¡ƒ‚t|ƒdksgtd ƒ‚t|ƒ|ksqtdƒ‚ddl m}d|}d d„|Dƒ}|j|dd ¡} t | j |||||¡g¡dS)a»Declare a list of actors as a collective group. Note: This function should be called in a driver process. Args: actors: a list of actors to be set in a collective group. world_size: the total number of processes in the group. ranks (List[int]): the rank of each actor. backend: the CCL backend to use, NCCL or GLOO. group_name: the name of the collective group. Returns: None r2rCzHEach actor should correspond to one rank. Got '{}' ranks but '{}' actorsz5Ranks must be a permutation from 0 to '{}'. Got '{}'.ÚcSsg|]}t|ƒ‘qSr )Ústr)Ú.0Úrr r rÚ äsz+create_collective_group..rz/World size must be greater than zero. Got '{}'.zRanks must be non-negative.z(Ranks cannot be greater than world_size.)ÚInfocSsg|]}|j‘qSr )Ú _ray_actor_id)rJÚar r rrLösÚdetached)r7ÚlifetimeN)rr!rErr4r#r6Úlenr&ÚsetÚrangeÚjoinÚallÚray.util.collective.utilrMÚoptionsÚremoteÚgetÚset_info) Úactorsr*rGr)r,r r7rMÚ actors_idÚinfor r rr.»sB ÿþÿÿÿ r.ÚreturncCstƒt |¡dS)z0Destroy a collective group given its group name.N)rDr>r9r?r r rr9ýsr9cCó"tƒt|ƒs dSt |¡}|jS)aReturn the rank of this process in the given group. Args: group_name: the name of the group to query Returns: the rank of this process in the named group, -1 if the group does not exist or the process does not belong to the group. éÿÿÿÿ)rDr@r>r1r+©r,r-r r rÚget_ranks rccCr`)aReturn the size of the collective group with the given name. Args: group_name: the name of the group to query Returns: The world size of the collective group, -1 if the group does not exist or the process does not belong to the group. ra)rDr@r>r1r*rbr r rÚget_collective_group_sizes rdcCs.t|ƒt|ƒ}tj}||_| |g|¡dS)aCollective allreduce the tensor across the group. Args: tensor: the tensor to be all-reduced on this process. group_name: the collective group name to perform allreduce. op: The reduce operation. Returns: None N)Ú_check_single_tensor_inputÚget_group_handlerÚAllReduceOptionsÚreduceOpÚ allreduce)Útensorr,Úopr-Úoptsr r rri's riÚtensor_listcCs<t ¡stdƒ‚t|ƒt|ƒ}tj}||_| ||¡dS)aCollective allreduce a list of tensors across the group. Args: tensor_list (List[tensor]): list of tensors to be allreduced, each on a GPU. group_name: the collective group name to perform allreduce. Returns: None ú&Multigpu calls requires NCCL and Cupy.N)rÚcupy_availabler#Ú_check_tensor_list_inputrfrgrhri)rmr,rkr-rlr r rÚallreduce_multigpu9s rqcCst|ƒ}| ¡dS)zBarrier all processes in the collective group. Args: group_name: the name of the group to barrier. Returns: None N)rfÚbarrierrbr r rrrOs rrÚdst_rankcCsFt|ƒt|ƒ}t||ƒt ¡}||_||_d|_| |g|¡dS)a:Reduce the tensor across the group to the destination rank. Args: tensor: the tensor to be reduced on this process. dst_rank: the rank of the destination process. group_name: the collective group name to perform reduce. op: The reduce operation. Returns: None rN) rerfÚ_check_rank_validrÚ ReduceOptionsrhÚ root_rankÚroot_tensorÚreduce)rjrsr,rkr-rlr r rrx\s rxÚ dst_tensorcCsbt ¡stdƒ‚t|ƒt|ƒ}t||ƒtt|ƒ|ƒt ¡}||_ ||_ ||_| ||¡dS)aÆReduce the tensor across the group to the destination rank and destination tensor. Args: tensor_list: the list of tensors to be reduced on this process; each tensor located on a GPU. dst_rank: the rank of the destination process. dst_tensor: the index of GPU at the destination. group_name: the collective group name to perform reduce. op: The reduce operation. Returns: None rnN) rror#rprfrtÚ_check_root_tensor_validrRrurhrvrwrx)rmrsryr,rkr-rlr r rÚreduce_multigpuvs r{Úsrc_rankcCs@t|ƒt|ƒ}t||ƒt ¡}||_d|_| |g|¡dS)a(Broadcast the tensor from a source process to all others. Args: tensor: the tensor to be broadcasted (src) or received (destination). src_rank: the rank of the source process. group_name: the collective group name to perform broadcast. Returns: None rN)rerfrtrÚBroadcastOptionsrvrwÚ broadcast©rjr|r,r-rlr r rr~šs r~Ú src_tensorcCs\t ¡stdƒ‚t|ƒt|ƒ}t||ƒtt|ƒ|ƒt ¡}||_ ||_ | ||¡dS)agBroadcast the tensor from a source GPU to all other GPUs. Args: tensor_list: the tensors to broadcast (src) or receive (dst). src_rank: the rank of the source process. src_tensor: the index of the source GPU on the source process. group_name: the collective group name to perform broadcast. Returns: None rnN)rror#rprfrtrzrRr}rvrwr~)rmr|r€r,r-rlr r rÚbroadcast_multigpu°s rcCsLt|ƒt|ƒt|ƒ}t|ƒ|jkrtdƒ‚t ¡}| |g|g|¡dS)a Allgather tensors from each process of the group into a list. Args: tensor_list: the results, stored as a list of tensors. tensor: the tensor (to be gathered) in the current process group_name: the name of the collective group. Returns: None zPThe length of the tensor list operands to allgather must be equal to world_size.N) rerprfrRr*r#rÚAllGatherOptionsÚ allgather)rmrjr,r-rlr r rrƒÌsÿrƒÚoutput_tensor_listsÚinput_tensor_listcCsBt ¡stdƒ‚t|ƒt|ƒt|ƒ}t ¡}| |||¡dS)a“Allgather tensors from each gpus of the group into lists. Args: output_tensor_lists (List[List[tensor]]): gathered results, with shape must be num_gpus * world_size * shape(tensor). input_tensor_list: (List[tensor]): a list of tensors, with shape num_gpus * shape(tensor). group_name: the name of the collective group. Returns: None rnN)rror#Ú_check_tensor_lists_inputrprfr‚rƒ)r„r…r,r-rlr r rÚallgather_multigpuåsr‡cCsRt|ƒt|ƒt|ƒ}t|ƒ|jkrtdƒ‚t ¡}||_| |g|g|¡dS)aÂReducescatter a list of tensors across the group. Reduce the list of the tensors across each process in the group, then scatter the reduced list of tensors -- one tensor for each process. Args: tensor: the resulted tensor on this process. tensor_list: The list of tensors to be reduced and scattered. group_name: the name of the collective group. op: The reduce operation. Returns: None zXThe length of the tensor list operands to reducescatter must not be equal to world_size.N) rerprfrRr*r#rÚReduceScatterOptionsrhÚ reducescatter)rjrmr,rkr-rlr r rr‰ýsÿr‰cCsHt ¡stdƒ‚t|ƒt|ƒt|ƒ}t ¡}||_| |||¡dS)a‘Reducescatter a list of tensors across all GPUs. Args: output_tensor_list: the resulted list of tensors, with shape: num_gpus * shape(tensor). input_tensor_lists: the original tensors, with shape: num_gpus * world_size * shape(tensor). group_name: the name of the collective group. op: The reduce operation. Returns: None. rnN) rror#r†rprfrˆrhr‰)Úoutput_tensor_listÚinput_tensor_listsr,rkr-rlr r rÚreducescatter_multigpusrŒcCóRt|ƒt|ƒ}t||ƒ||jkrtd |¡ƒ‚t ¡}||_| |g|¡dS)zìSend a tensor to a remote process synchronously. Args: tensor: the tensor to send. dst_rank: the rank of the destination process. group_name: the name of the collective group. Returns: None ú"The destination rank '{}' is self.N) rerfrtr+r#r&rÚSendOptionsrsÚsend)rjrsr,r-rlr r rr8ó rÚ dst_gpu_indexÚ n_elementscCó„t ¡stdƒ‚t|ƒt|ƒ}t||ƒ||jkr!td |¡ƒ‚|dkr,td |¡ƒ‚t ¡}||_ ||_ ||_| |g|¡dS)aSend a tensor to a remote GPU synchronously. The function assumes each process owns >1 GPUs, and the sender process and receiver process has equal number of GPUs. Args: tensor: the tensor to send, located on a GPU. dst_rank: the rank of the destination process. dst_gpu_index: the destination gpu index. group_name: the name of the collective group. n_elements: if specified, send the next n elements from the starting address of tensor. Returns: None z!send_multigpu call requires NCCL.úGThe dst_rank '{}' is self. Considering doing GPU to GPU memcpy instead?rz The n_elements '{}' should >= 0.N) rror#rerfrtr+r&rrsr’r“r)rjrsr’r,r“r-rlr r rÚ send_multigpuMs" þr–cCr)zíReceive a tensor from a remote process synchronously. Args: tensor: the received tensor. src_rank: the rank of the source process. group_name: the name of the collective group. Returns: None rŽN) rerfrtr+r#r&rÚRecvOptionsr|Úrecvrr r rr˜wr‘r˜Ú src_gpu_indexcCr”)aÁReceive a tensor from a remote GPU synchronously. The function asssume each process owns >1 GPUs, and the sender process and receiver process has equal nubmer of GPUs. Args: tensor: The received tensor, located on a GPU. src_rank: The rank of the source process. src_gpu_index: The index of the source GPU on the src process. group_name: The name of the collective group. Returns: None z!recv_multigpu call requires NCCL.r•rz#The n_elements '{}' should be >= 0.N) rror#rerfrtr+r&r—r|r™r“r˜)rjr|r™r,r“r-rlr r rÚ recv_multigpuŒs" þršÚgpu_idcCs,t ¡stdƒ‚ddl}|j |¡ ¡dS)zŽSynchronize the current process to a give device. Args: gpu_id: the GPU device id to synchronize. Returns: None z(synchronize call requires CUDA and NCCL.rN)rror#ÚcupyÚcudaÚDeviceÚsynchronize)r›Úcpr r rrŸ´s rŸc Cstƒt|ƒs„z3d|}tj|d}t |j ¡¡\}}}}}tjjj }|j ¡} || | ¡} t ||| ||¡WnItyƒ}z=dtjvrqtjd|krqttjdƒ}ttjdƒ}tjd}t dd¡}t |||||¡ntd |¡ƒ|‚WYd }~nd }~wwt |¡}|S)z·Check if the group is initialized and return the group handle. Args: group_name: the name of the collective group. Returns: The collective group handle. r2)r7Úcollective_group_nameÚcollective_rankÚcollective_world_sizeÚcollective_backendÚcollective_gloo_timeoutrBzr.r6ÚosÚenvironÚintÚgetenvr#r&r1) r,r7ÚmgrÚidsr*r+r)r r¨Úid_rKÚexcr-r r rrfÄsF ÿ ÿ ÿþýü€ö rfcCsVt|tjƒrdSt ¡rt|tjjƒrdSt ¡r"t|tjjƒr"dSt d t|ƒ¡ƒ‚)z-Check if the tensor is with a supported type.Nz[Unrecognized tensor type '{}'. Supported types are: np.ndarray, torch.Tensor, cupy.ndarray.)Ú isinstanceÚnpÚndarrayrror Útorch_availableÚthÚTensorr#r&Útype)rjr r rreõs þrer)cCs^|tjjkrtƒs tdƒ‚dS|tjjkrtƒstdƒ‚dS|tjjkr+tƒs-tdƒ‚dSdS)z'Check whether the backend is available.zGLOO is not available.zNCCL is not available.z#torch.distributed is not available.N) rr!r$rr#r'rr(r)r)r r rrEsÿÿþrEcCs"tjjj}|jtjkr dStdƒ‚)z1Check if currently it is inside a Ray actor/task.NzBThe collective APIs shall be only used inside a Ray actor or task.)rr§r¨r©ÚmodeÚWORKER_MODEr#)r¨r r rrDs ÿrDcCs6|dkrtd |¡ƒ‚||jkrtd ||j¡ƒ‚dS)z'Check the rank: 0 <= rank < world_size.rzrank '{}' is negative.z+rank '{}' must be less than world size '{}'N)r6r&r*)r-r+r r rrts ÿÿrtcCs>t|tƒstd t|ƒ¡ƒ‚|stdƒ‚|D]}t|ƒqdS)z7Check if the input is a list of supported tensor types.z.The input must be a list of tensors. Got '{}'.zGot an empty list of tensors.N)rµÚlistr#r&r»re)rmÚtr r rrp's þ ÿrpcCsDt|tƒstd t|ƒ¡ƒ‚|std|›ƒ‚|D]}t|ƒqdS)z@Check if the input is a list of lists of supported tensor types.z7The input must be a list of lists of tensors. Got '{}'.zDid not receive tensors. Got: N)rµr¾r#r&r»rp)Útensor_listsr¿r r rr†4s þ ÿr†cCs2|dkrtd |¡ƒ‚||krtd ||¡ƒ‚dS)z9Check the root_tensor device is 0 <= root_tensor < lengthrzroot_tensor '{}' is negative.z9root_tensor '{}' is greater than the number of GPUs: '{}'N)r6r&)Úlengthrwr r rrzAsþÿrz)rA)rrA)rrrA)rAr)Cr=ÚloggingrÚtypingrÚnumpyr¶rÚray.util.collectiverÚ getLoggerr:r Ú:ray.util.collective.collective_group.nccl_collective_grouprrr ÚImportErrorÚ:ray.util.collective.collective_group.gloo_collective_grouprrÚ@ray.util.collective.collective_group.torch_gloo_collective_grouprrrrrÚobjectrr>r@r!r'r¯rIrFr.r9rcrdÚReduceOpÚSUMrir¾rqrrrxr{r~rrƒr‡r‰rŒrr–r˜ršrŸrfrerErDrtrpr†rzr r r rÚs, þÿÿRûÿþü û)úþýû úBÿÿ ÿ ÿÿ ÿûÿþý ü$ÿÿÿ ÿÿÿÿ ÿÿÿ ÿ!ü ýûþýü û*ûþýü û(1