o ╘pйi╕<у@sФdZddlmZmZmZmZmZddlmZddl m Z ddlZddl ZddlmZddlmZdd lmZdd lmZddlmZddlmZdd lmZddlmZd,ddДZ d-dededejdejdejf ddДZ d.dededejdejdejdejfddДZdeeeefd eeeeffd!d"ДZ # $d/dejd%ed&eeefdeeeefd eeeefd'efd(d)ДZ!d0d*d+ДZ"dS)1z┐ ============================================================== Hierarchical clustering (:mod:`pyannote.core.utils.hierarchy`) ============================================================== щ)┌Text┌Callable┌List┌Tuple┌Union)┌Counter)┌ signatureNщ)┌to_condensed)┌ to_squared)┌l2_normalize)┌pdist)┌cdist)┌ squareform)┌ csr_matrix)┌connected_components┌single┌ euclideancKs`|dkrt|fd|i|дОS|dkr|dvrt|Г}d}t||dН}tjjj|f||dЬ|дОS)zJSame as scipy.cluster.hierarchy.linkage with more metrics and methods ┌pool┌metricr)┌centroid┌median┌wardйr)┌methodr)rrr ┌scipy┌cluster┌ hierarchy┌linkage)┌Xrr┌kwargs┌distanceйr"·Q/home/ubuntu/.local/lib/python3.10/site-packages/pyannote/core/utils/hierarchy.pyr4s r┌u┌v┌S┌C┌returncKs0||||||||||||S)aКCompute average of newly merged cluster Parameters ---------- u : int Cluster index. v : int Cluster index. C : (2 x n_observations - 1, dimension) np.ndarray Cluster average. S : (2 x n_observations - 1, ) np.ndarray Cluster size. Returns ------- Cuv : (dimension, ) np.ndarray Average of newly formed cluster. r")r$r%r&r'r r"r"r#┌_average_pooling_funcKs0r)r┌d┌KcKsJtаtа|||gббd}tаtj||ddЕ|fddНб}|||S)aфCompute centroid of newly merged cluster Parameters ---------- u : int Cluster index. v : int Cluster index. X : (n_observations, dimension) np.ndarray Observations. d : (n_observations, n_obversations) np.ndarray Distance between observations. K : (n_observations, ) np.ndarray, optional Cluster assignment. Returns ------- Cuv : (dimension, ) np.ndarray Centroid of newly formed cluster. rN)┌axis)┌np┌where┌isin┌argmin┌mean)r$r%rr*r+r ┌u_or_v┌ir"r"r#┌_centroid_pooling_funccs$r4┌cannot_link┌ must_linkc CsиtddД|DГГ} tГ}|D]6\}}|D]/\}}tt||hа||hбГГ}|s5d|Ыd|ЫdЭ}t|ГВt|ГdkrD||vrD|а|бqq|rN|а|бn t|ГSq )NcssБ|] }tt|ГГVqdS)N)┌tuple┌sorted)┌.0┌uvr"r"r#┌ ЛsАz(propagate_constraints..TzMFound a conflict between 'must_link' and 'cannot_link' constraints for pair (z, z).щ) ┌set┌listr7r8┌symmetric_difference┌ ValueError┌len┌append┌update) r5r6┌new_cannot_link┌x┌yr$r%┌ij┌msgr"r"r#┌propagate_constraintsДs. Аў юrI┌average┌bothr┌pooling_func┌must_link_methodcs╚И dkrtЙ nИ dkrtЙ ntИ tГrИ ЫdЭ}t|ГВИdur"gЙ|dur(g}Иj\Й }tИИdНЙtаИ бЙtj dИ dtj dНЙdИdИ Е<tа dИ d|fбЙИИdИ ЕddЕf<tа И dd fбЙtjtаdИ ddИ ddбЙИИt dИ dgtИ tаИ И ddбГвRО<d tИ ГjvrбtИГЙdЗЗЗЗЗЗЗЗЗЗ З fdd Д }d} Иr╥|dvr┐tИ|ГЙtИО\} }tjИt dИ d| |Г<|dvРr,|Рr,tj И И ftjdН}|D] \} }d|| |f<qцtt|ГdddН\} }t|ГабD])\}}|dkРr Рqtа||kбd^} }|D]}|| || ddН} | d7} РqРqt| И dГD].} tаИб}И|tjkРrOtаИdkбd^} }} ntdИ d|Г\} }|| || Г} Рq3ИS)a|'pool' linkage Parameters ---------- X : np.ndarray (n_samples, dimension) obversations. metric : {"euclidean", "cosine", "angular"}, optional Distance metric. Defaults to "euclidean" pooling_func: callable, optional Defaults to "average". cannot_link : list of pairs, optional Pairs of indices of observations that cannot be linked. For instance, [(1, 2), (5, 6)] means that first and second observations cannot end up in the same cluster, as well as 5th and 6th obversations. must_link : list of pairs, optional Pairs of indices of observations that must be linked. For instance, [(1, 2), (5, 6)] means that first and second observations must end up in the same cluster, as well as 5th and 6th obversations. must_link_method : {"merge", "propagate", "both"}, optional Method used for taking "must link" constraints into account. * use "merge" to initialize clusters by merging "must link" observations before any other regular clustering iterations. * use "propagate" to infer additional "cannot link" constraints by applying the following propagation rule: if u and v cannot be linked and v and w must be linked, then u and w cannot be linked. * use "both" to apply both methods. Defaults to "both". rJrza pooling is not supported. Choose between 'average' and 'centroid', or provide your own function.Nrr<r )┌dtypeщr*Fc sОtdИ d||Г}|r%И|tjkr%|И kr|n|}d|ЫdЭ}t|ГВИ|И|kr/|n|И|df<И|df|kr?|n|И|df<|rIdnИ|И|df<И|И|И|df<И|И|ИИ |<И ||ИИИИИdНИИ |<dИ|<dИ|<И |ИИ|k<И |ИИ|k<Иd И |Еdk}tdИ dИ |tаИ |б|Г}tИtjИ |d d ЕfИd И |Еd d Еf|d d ЕfИd НИ|<tdИ d|tа|бГ}tdИ d|tа|dИ |бГ} tdИ d|tа|бГ} tdИ d|tа|dИ |бГ}ИРrЧtdИ d|И|tjkГ\}} tjИtdИ dИ ||Г<tdИ d| И| tjkГ\} }tjИtdИ dИ ||Г<tdИ d| И| tjkГ\}} tjИtdИ dИ ||Г<tdИ d|И|tjkГ\} }tjИtdИ dИ ||Г<tjИ|<tjИ| <tjИ| <tjИ|<tdИ dИ |tаИ |б|Г}tjИ|<И |S)aWMerge two clusters Parameters ---------- u, v : int Indices of clusters to merge. iteration : int Current clustering iteration. constraint : bool Set to True to indicate that this merge is coming from a 'must_link' constraint. This will artificially set Z[iteration, 2] to 0.0. Returns ------- uv : int Indices of resulting cluster. Raises ------ "ValueError" in case of conflict between "must_link" and "cannot_link" constraints. r<r zSFound a conflict between 'must_link' and 'cannot_link' constraints for observation ┌.rgщ)rr*r+r&r'Nr)r r-┌inftyr@┌aranger┌newaxisr)r$r%┌ iteration┌ constraint┌k┌wrH┌empty┌_u┌u_┌_v┌v_rE┌_йr'┌Dr+r&r┌Zr5r*r┌nrLr"r#┌merge№sT ($ ¤$$$$$$ & zpool..merger)┌ propagaterK)rcrKT)┌directed┌ return_labels)rV)F)r)r4┌ isinstancerr@┌shaper r-rS┌zeros┌int16rR┌onesr rr┌ parametersrrI┌zip┌int8rrr┌itemsr.┌ranger0)rrrLr5r6rMrH┌ dimensionrcrUr$r%┌graphr^┌K_initrW┌count┌othersr"r_r#rбsj& (4"] ■ rc CsNg}|ddЕdfD]>}tjjj||ddН}g}tа|бD]}|||k}tj|dddН} |аt| ||dНа d ббq|аtаtа |бdббq tа|б}t|Г} | d |d}}d |d } }||| |}d }| ||||| }tа |tаd | б|||бtа|d|dб}|tа|бdf}tjjj||ddНS)a+Forms flat clusters using within-class sum of square elbow criterion Parameters ---------- X : `np.ndarray` (n_samples, n_dimensions) feature vectors. Z : `np.ndarray` The hierarchical clustering encoded with the matrix returned by the `linkage` function. metric : `str` The distance metric to use. See `pdist` function for a list of valid distance metrics. Returns ------- T : ndarray An array of length n. T[i] is the flat cluster number to which original observation i belongs. Reference --------- H. Delgado, X. Anguerra, C. Fredouille, J. Serrano. "Fast Single- and Cross-Show Speaker Diarization Using Binary Key Speaker Modeling". IEEE Transactions on Audio Speech and Language Processing Nr<r!)┌ criterionrT)r,┌keepdimsrщ r )rrr┌fclusterr-┌uniquer1rBr┌reshape┌hstack┌arrayrA┌absrS┌sqrt┌argmax)rrar┌wcss┌ threshold┌y_tr`rW┌Xk┌Ckrb┌x1┌y1┌x2┌y2┌a┌b┌cr!r"r"r#┌ fcluster_autoНs* 8rН)rr)NN)NNN)rrJNNrK)r)#┌__doc__┌typingrrrrr┌collectionsr┌inspectr┌numpyr-┌scipy.cluster.hierarchyrr!r rrr r┌scipy.spatial.distancer┌scipy.sparser┌scipy.sparse.csgraphrr┌int┌ndarrayr)r4rIrrНr"r"r"r#┌sИ ■√ ■¤№√ ∙! · ■ ¤№√ ·m