o
    sif{                     @   s   d Z ddlZddlZddlZddlZddlmZ dd Z	dd Z
d$d
dZd%ddZdd Zd&ddZ						d'ddZdd Zdd Zdd Zd(ddZ	d(ddZd(d d!Z	d)d"d#ZdS )*at  
Melody extraction algorithms aim to produce a sequence of frequency values
corresponding to the pitch of the dominant melody from a musical
recording.  For evaluation, an estimated pitch series is evaluated against a
reference based on whether the voicing (melody present or not) and the pitch
is correct (within some tolerance).

For a detailed explanation of the measures please refer to:

    J. Salamon, E. Gomez, D. P. W. Ellis and G. Richard, "Melody Extraction
    from Polyphonic Music Signals: Approaches, Applications and Challenges",
    IEEE Signal Processing Magazine, 31(2):118-134, Mar. 2014.

and:

    G. E. Poliner, D. P. W. Ellis, A. F. Ehmann, E. Gomez, S.
    Streich, and B. Ong. "Melody transcription from music audio:
    Approaches and evaluation", IEEE Transactions on Audio, Speech, and
    Language Processing, 15(4):1247-1256, 2007.

For an explanation of the generalized measures (using non-binary voicings),
please refer to:

    R. Bittner and J. Bosch, "Generalized Metrics for Single-F0 Estimation
    Evaluation", International Society for Music Information Retrieval
    Conference (ISMIR), 2019.


Conventions
-----------

Melody annotations are assumed to be given in the format of a 1d array of
frequency values which are accompanied by a 1d array of times denoting when
each frequency value occurs.  In a reference melody time series, a frequency
value of 0 denotes "unvoiced".  In a estimated melody time series, unvoiced
frames can be indicated either by 0 Hz or by a negative Hz value - negative
values represent the algorithm's pitch estimate for frames it has determined as
unvoiced, in case they are in fact voiced.

Metrics are computed using a sequence of reference and estimated pitches in
cents and voicing arrays, both of which are sampled to the same
timebase.  The function :func:`mir_eval.melody.to_cent_voicing` can be used to
convert a sequence of estimated and reference times and frequency values in Hz
to voicing arrays and frequency arrays in the format required by the
metric functions.  By default, the convention is to resample the estimated
melody time series to the reference melody time series' timebase.

Metrics
-------

* :func:`mir_eval.melody.voicing_measures`: Voicing measures, including the
  recall rate (proportion of frames labeled as melody frames in the reference
  that are estimated as melody frames) and the false alarm
  rate (proportion of frames labeled as non-melody in the reference that are
  mistakenly estimated as melody frames)
* :func:`mir_eval.melody.raw_pitch_accuracy`: Raw Pitch Accuracy, which
  computes the proportion of melody frames in the reference for which the
  frequency is considered correct (i.e. within half a semitone of the reference
  frequency)
* :func:`mir_eval.melody.raw_chroma_accuracy`: Raw Chroma Accuracy, where the
  estimated and reference frequency sequences are mapped onto a single octave
  before computing the raw pitch accuracy
* :func:`mir_eval.melody.overall_accuracy`: Overall Accuracy, which computes
  the proportion of all frames correctly estimated by the algorithm, including
  whether non-melody frames where labeled by the algorithm as non-melody

    N   )utilc                 C   s   | j dkr
td |j dkrtd |  dkrtd | dkr*td | jd |jd kr8td| |fD ]}t|dk |dk rNtdq<d	S )
zCheck that voicing inputs to a metric are in the correct format.

    Parameters
    ----------
    ref_voicing : np.ndarray
        Reference voicing array
    est_voicing : np.ndarray
        Estimated voicing array

    r   z!Reference voicing array is empty.z!Estimated voicing array is empty.z&Reference melody has no voiced frames.z&Estimated melody has no voiced frames.zAReference and estimated voicing arrays should be the same length.r   z'Voicing arrays must be between 0 and 1.N)	sizewarningswarnsumshape
ValueErrornp
logical_orany)ref_voicingest_voicingvoicing r   C/home/ubuntu/.local/lib/python3.10/site-packages/mir_eval/melody.pyvalidate_voicingM   s"   





r   c                 C   sp   |j dkr
td |j dkrtd | jd |jd ks2|jd |jd ks2|jd |jd kr6tddS )a  Check that voicing and frequency arrays are well-formed.  To be used in
    conjunction with :func:`mir_eval.melody.validate_voicing`

    Parameters
    ----------
    ref_voicing : np.ndarray
        Reference voicing array
    ref_cent : np.ndarray
        Reference pitch sequence in cents
    est_voicing : np.ndarray
        Estimated voicing array
    est_cent : np.ndarray
        Estimate pitch sequence in cents

    r   z#Reference frequency array is empty.z#Estimated frequency array is empty.z;All voicing and frequency arrays must have the same length.N)r   r   r   r   r	   )r   ref_centr   est_centr   r   r   validatek   s   



r         $@c                 C   sB   t | jd }t | }t | | | }dt | ||< |S )a  Convert an array of frequency values in Hz to cents.
    0 values are left in place.

    Parameters
    ----------
    freq_hz : np.ndarray
        Array of frequencies in Hz.
    base_frequency : float
        Base frequency for conversion.
        (Default value = 10.0)
    r        @)r
   zerosr   flatnonzeroabslog2)freq_hzbase_frequency	freq_centfreq_nonz_indnormalized_frequencyr   r   r   hz2cents   s
   
r!   c                 C   s2   |durd|| dk< n| dk t}t| |fS )a  Convert from an array of frequency values to frequency array +
    voice/unvoiced array

    Parameters
    ----------
    frequencies : np.ndarray
        Array of frequencies.  A frequency <= 0 indicates "unvoiced".

    voicing : np.ndarray
        Array of voicing values.
        (Default value = None)
        Default None, which means the voicing is inferred from `frequencies`:

           - frames with frequency <= 0.0 are considered "unvoiced"
           - frames with frequency > 0.0 are considered "voiced"

        If specified, `voicing` is used as the voicing array, but
        frequencies with value 0 are forced to have 0 voicing.

           - Voicing inferred by negative frequency values is ignored.

    Returns
    -------
    frequencies : np.ndarray
        Array of frequencies, all >= 0.
    voiced : np.ndarray
        Array of voicings between 0 and 1, same length as frequencies,
        which indicates voiced or unvoiced

    Nr   )astypefloatr
   r   )frequenciesr   r   r   r   freq_to_voicing   s   r%   c              	   C   sN   t |d}t d| tt ||   tt ||  d }t |d}|S )a@  Generate a time series from 0 to ``end_time`` with times spaced ``hop`` apart

    Parameters
    ----------
    hop : float
        Spacing of samples in the time series
    end_time : float
        Time series will span ``[0, end_time]``

    Returns
    -------
    times : np.ndarray
        Generated timebase

    
   r   r   )r
   roundlinspaceintfloor)hopend_timetimesr   r   r   constant_hop_timebase   s   *r.   linearc              	   C   s  | j |j krt| |r||fS tt| t|  sAtt| dd t| dd  r<|d |d ksAtd t| d} t|d}| |  krit	| | } t	|d}t	|d}|dkr|dkrt
|}t|dd D ]\}}|dkr|| ||d < q~tj| |||}tj| |d|}	||	dk9 }n
tj| |||}ttt|dt|d}
|dks|dkr|
stj| |||}||fS tj| |d|}||fS )	a  Resamples frequency and voicing time series to a new timescale. Maintains
    any zero ("unvoiced") values in frequencies.

    If ``times`` and ``times_new`` are equivalent, no resampling will be
    performed.

    Parameters
    ----------
    times : np.ndarray
        Times of each frequency value
    frequencies : np.ndarray
        Array of frequency values, >= 0
    voicing : np.ndarray
        Array which indicates voiced or unvoiced. This array may be binary
        or have continuous values between 0 and 1.
    times_new : np.ndarray
        Times to resample frequency and voicing sequences to
    kind : str
        kind parameter to pass to scipy.interpolate.interp1d.
        (Default value = 'linear')

    Returns
    -------
    frequencies_resampled : np.ndarray
        Frequency array resampled to new timebase
    voicing_resampled : np.ndarray
        Voicing array resampled to new timebase

    r   Nr   zNon-uniform timescale passed to resample_melody_series.  Pitch will be linearly interpolated, which will result in undesirable behavior if silences are indicated by missing values.  Silences should be indicated by nonpositive frequency values.r&   zeronearestr/   )r   r
   allclosediffmeanr   r   r'   maxappendarray	enumeratescipyinterpolateinterp1dallr   equal)r-   r$   r   	times_newkindfrequencies_heldn	frequencyfrequencies_resampledfrequency_maskis_binary_voicingvoicing_resampledr   r   r   resample_melody_series   sX   *
rG   c	                 C   s  | d dkr#t | dd} t |d|d }|dur#t |d|d }|d dkrFt |dd}t |d|d }|durFt |d|d }t||\}}	t||\}}t||}
t||}|durt| |
|	t||  |\}
}	t|||t|| |\}}n
t|||| |\}}|
jd |jd  }|dkrt |t 	|}t |t 	|}n|d|
jd  }|d|	jd  }|	|
||fS )a  Convert reference and estimated time/frequency (Hz) annotations to sampled
    frequency (cent)/voicing arrays.

    A zero frequency indicates "unvoiced".

    If est_voicing is not provided, a negative frequency indicates:
        "Predicted as unvoiced, but if it's voiced,
        this is the frequency estimate".

    If it is provided, negative frequency values are ignored, and the voicing
    from est_voicing is directly used.

    Parameters
    ----------
    ref_time : np.ndarray
        Time of each reference frequency value

    ref_freq : np.ndarray
        Array of reference frequency values

    est_time : np.ndarray
        Time of each estimated frequency value

    est_freq : np.ndarray
        Array of estimated frequency values

    est_voicing : np.ndarray
        Estimate voicing confidence.
        Default None, which means the voicing is inferred from est_freq:

          - frames with frequency <= 0.0 are considered "unvoiced"
          - frames with frequency > 0.0 are considered "voiced"

    ref_reward : np.ndarray
        Reference voicing reward.
        Default None, which means all frames are weighted equally.

    base_frequency : float
        Base frequency in Hz for conversion to cents
        (Default value = 10.)

    hop : float
        Hop size, in seconds, to resample,
        default None which means use ref_time

    kind : str
        kind parameter to pass to scipy.interpolate.interp1d.
        (Default value = 'linear')

    Returns
    -------
    ref_voicing : np.ndarray
        Resampled reference voicing array
    ref_cent : np.ndarray
        Resampled reference frequency (cent) array
    est_voicing : np.ndarray
        Resampled estimated voicing array
    est_cent : np.ndarray
        Resampled estimated frequency (cent) array

    r   N)
r
   insertr%   r!   rG   r.   r5   r   r6   r   )ref_timeref_freqest_timeest_freqr   
ref_rewardr   r+   r?   r   r   r   len_diffr   r   r   to_cent_voicing<  sN   I


	
rO   c                 C   sP   | j dks
|j dkrdS | dkt}t|dkrdS t|| t| S )a  Compute the voicing recall given two voicing
    indicator sequences, one as reference (truth) and the other as the estimate
    (prediction).  The sequences must be of the same length.

    Examples
    --------
    >>> ref_time, ref_freq = mir_eval.io.load_time_series('ref.txt')
    >>> est_time, est_freq = mir_eval.io.load_time_series('est.txt')
    >>> (ref_v, ref_c,
    ...  est_v, est_c) = mir_eval.melody.to_cent_voicing(ref_time,
    ...                                                  ref_freq,
    ...                                                  est_time,
    ...                                                  est_freq)
    >>> recall = mir_eval.melody.voicing_recall(ref_v, est_v)

    Parameters
    ----------
    ref_voicing : np.ndarray
        Reference boolean voicing array
    est_voicing : np.ndarray
        Estimated boolean voicing array

    Returns
    -------
    vx_recall : float
        Voicing recall rate, the fraction of voiced frames in ref
        indicated as voiced in est
    r           r   r   r"   r#   r
   r   r   r   ref_indicatorr   r   r   voicing_recall     rT   c                 C   sP   | j dks
|j dkrdS | dkt}t|dkrdS t|| t| S )a"  Compute the voicing false alarm rates given two voicing
    indicator sequences, one as reference (truth) and the other as the estimate
    (prediction).  The sequences must be of the same length.

    Examples
    --------
    >>> ref_time, ref_freq = mir_eval.io.load_time_series('ref.txt')
    >>> est_time, est_freq = mir_eval.io.load_time_series('est.txt')
    >>> (ref_v, ref_c,
    ...  est_v, est_c) = mir_eval.melody.to_cent_voicing(ref_time,
    ...                                                  ref_freq,
    ...                                                  est_time,
    ...                                                  est_freq)
    >>> false_alarm = mir_eval.melody.voicing_false_alarm(ref_v, est_v)

    Parameters
    ----------
    ref_voicing : np.ndarray
        Reference boolean voicing array
    est_voicing : np.ndarray
        Estimated boolean voicing array

    Returns
    -------
    vx_false_alarm : float
        Voicing false alarm rate, the fraction of unvoiced frames in ref
        indicated as voiced in est
    r   rP   rQ   rR   r   r   r   voicing_false_alarm  rU   rV   c                 C   s&   t | | t| |}t| |}||fS )a  Compute the voicing recall and false alarm rates given two voicing
    indicator sequences, one as reference (truth) and the other as the estimate
    (prediction).  The sequences must be of the same length.

    Examples
    --------
    >>> ref_time, ref_freq = mir_eval.io.load_time_series('ref.txt')
    >>> est_time, est_freq = mir_eval.io.load_time_series('est.txt')
    >>> (ref_v, ref_c,
    ...  est_v, est_c) = mir_eval.melody.to_cent_voicing(ref_time,
    ...                                                  ref_freq,
    ...                                                  est_time,
    ...                                                  est_freq)
    >>> recall, false_alarm = mir_eval.melody.voicing_measures(ref_v,
    ...                                                        est_v)

    Parameters
    ----------
    ref_voicing : np.ndarray
        Reference boolean voicing array
    est_voicing : np.ndarray
        Estimated boolean voicing array

    Returns
    -------
    vx_recall : float
        Voicing recall rate, the fraction of voiced frames in ref
        indicated as voiced in est
    vx_false_alarm : float
        Voicing false alarm rate, the fraction of unvoiced frames in ref
        indicated as voiced in est
    )r   rT   rV   )r   r   	vx_recallvx_false_almr   r   r   voicing_measures  s   
!

rY   2   c           	      C   s   t | | t| ||| | jdks!|  dks!|jdks!|jdkr#dS t|dk|dk}t|dkr5dS t|| | }||k }t| | | t|  }|S )a  Compute the raw pitch accuracy given two pitch (frequency) sequences in
    cents and matching voicing indicator sequences. The first pitch and voicing
    arrays are treated as the reference (truth), and the second two as the
    estimate (prediction).  All 4 sequences must be of the same length.

    Examples
    --------
    >>> ref_time, ref_freq = mir_eval.io.load_time_series('ref.txt')
    >>> est_time, est_freq = mir_eval.io.load_time_series('est.txt')
    >>> (ref_v, ref_c,
    ...  est_v, est_c) = mir_eval.melody.to_cent_voicing(ref_time,
    ...                                                  ref_freq,
    ...                                                  est_time,
    ...                                                  est_freq)
    >>> raw_pitch = mir_eval.melody.raw_pitch_accuracy(ref_v, ref_c,
    ...                                                est_v, est_c)

    Parameters
    ----------
    ref_voicing : np.ndarray
        Reference voicing array. When this array is non-binary, it is treated
        as a 'reference reward', as in (Bittner & Bosch, 2019)
    ref_cent : np.ndarray
        Reference pitch sequence in cents
    est_voicing : np.ndarray
        Estimated voicing array
    est_cent : np.ndarray
        Estimate pitch sequence in cents
    cent_tolerance : float
        Maximum absolute deviation in cents for a frequency value to be
        considered correct
        (Default value = 50)

    Returns
    -------
    raw_pitch : float
        Raw pitch accuracy, the fraction of voiced frames in ref_cent for
        which est_cent provides a correct frequency values
        (within cent_tolerance cents).
    r   rP   )r   r   r   r   r
   logical_andr   )	r   r   r   r   cent_tolerancenonzero_freqsfreq_diff_centscorrect_frequenciesrpar   r   r   raw_pitch_accuracy*  s   
)


ra   c           
      C   s   t | | t| ||| | jdks!|  dks!|jdks!|jdkr#dS t|dk|dk}t|dkr5dS t|| | }dt|d d  }t|| |k }t| | | t|  }	|	S )a  Compute the raw chroma accuracy given two pitch (frequency) sequences
    in cents and matching voicing indicator sequences. The first pitch and
    voicing arrays are treated as the reference (truth), and the second two as
    the estimate (prediction).  All 4 sequences must be of the same length.

    Examples
    --------
    >>> ref_time, ref_freq = mir_eval.io.load_time_series('ref.txt')
    >>> est_time, est_freq = mir_eval.io.load_time_series('est.txt')
    >>> (ref_v, ref_c,
    ...  est_v, est_c) = mir_eval.melody.to_cent_voicing(ref_time,
    ...                                                  ref_freq,
    ...                                                  est_time,
    ...                                                  est_freq)
    >>> raw_chroma = mir_eval.melody.raw_chroma_accuracy(ref_v, ref_c,
    ...                                                  est_v, est_c)

    Parameters
    ----------
    ref_voicing : np.ndarray
        Reference voicing array. When this array is non-binary, it is treated
        as a 'reference reward', as in (Bittner & Bosch, 2019)
    ref_cent : np.ndarray
        Reference pitch sequence in cents
    est_voicing : np.ndarray
        Estimated voicing array
    est_cent : np.ndarray
        Estimate pitch sequence in cents
    cent_tolerance : float
        Maximum absolute deviation in cents for a frequency value to be
        considered correct
        (Default value = 50)

    Returns
    -------
    raw_chroma : float
        Raw chroma accuracy, the fraction of voiced frames in ref_cent for
        which est_cent provides a correct frequency values (within
        cent_tolerance cents), ignoring octave errors

    r   rP   r   i  g      ?)r   r   r   r   r
   r[   r   r*   )
r   r   r   r   r\   r]   r^   octavecorrect_chromarcar   r   r   raw_chroma_accuracyn  s   
,


re   c                 C   s   t | | t| ||| | jdks |jdks |jdks |jdkr"dS t|dk|dk}t|| | }||k }| dkt}tt| }	t	| dkrPd}
n
t	|t	|  }
|
t	| | ||  |  t	d| d|   |	 }|S )a  Compute the overall accuracy given two pitch (frequency) sequences
    in cents and matching voicing indicator sequences. The first pitch and
    voicing arrays are treated as the reference (truth), and the second two
    as the estimate (prediction).  All 4 sequences must be of the same length.

    Examples
    --------
    >>> ref_time, ref_freq = mir_eval.io.load_time_series('ref.txt')
    >>> est_time, est_freq = mir_eval.io.load_time_series('est.txt')
    >>> (ref_v, ref_c,
    ...  est_v, est_c) = mir_eval.melody.to_cent_voicing(ref_time,
    ...                                                  ref_freq,
    ...                                                  est_time,
    ...                                                  est_freq)
    >>> overall_accuracy = mir_eval.melody.overall_accuracy(ref_v, ref_c,
    ...                                                     est_v, est_c)

    Parameters
    ----------
    ref_voicing : np.ndarray
        Reference voicing array. When this array is non-binary, it is treated
        as a 'reference reward', as in (Bittner & Bosch, 2019)
    ref_cent : np.ndarray
        Reference pitch sequence in cents
    est_voicing : np.ndarray
        Estimated voicing array
    est_cent : np.ndarray
        Estimate pitch sequence in cents
    cent_tolerance : float
        Maximum absolute deviation in cents for a frequency value to be
        considered correct
        (Default value = 50)

    Returns
    -------
    overall_accuracy : float
        Overall accuracy, the total fraction of correctly estimates frames,
        where provides a correct frequency values (within cent_tolerance).

    r   rP   g      ?)
r   r   r   r
   r[   r   r"   r#   lenr   )r   r   r   r   r\   r]   r^   r_   
ref_binaryn_framesratioaccuracyr   r   r   overall_accuracy  s:   
)



	rk   c                 K   s   t jt| |||||fi |\}}}}	t }
t jt||fi ||
d< t jt||fi ||
d< t jt||||	fi ||
d< t jt||||	fi ||
d< t jt	||||	fi ||
d< |
S )a  Evaluate two melody (predominant f0) transcriptions, where the first is
    treated as the reference (ground truth) and the second as the estimate to
    be evaluated (prediction).

    Examples
    --------
    >>> ref_time, ref_freq = mir_eval.io.load_time_series('ref.txt')
    >>> est_time, est_freq = mir_eval.io.load_time_series('est.txt')
    >>> scores = mir_eval.melody.evaluate(ref_time, ref_freq,
    ...                                   est_time, est_freq)

    Parameters
    ----------
    ref_time : np.ndarray
        Time of each reference frequency value

    ref_freq : np.ndarray
        Array of reference frequency values

    est_time : np.ndarray
        Time of each estimated frequency value

    est_freq : np.ndarray
        Array of estimated frequency values

    est_voicing : np.ndarray
        Estimate voicing confidence.
        Default None, which means the voicing is inferred from est_freq:

          - frames with frequency <= 0.0 are considered "unvoiced"
          - frames with frequency > 0.0 are considered "voiced"

    ref_reward : np.ndarray
        Reference pitch estimation reward.
        Default None, which means all frames are weighted equally.

    **kwargs
        Additional keyword arguments which will be passed to the
        appropriate metric or preprocessing functions.

    Returns
    -------
    scores : dict
        Dictionary of scores, where the key is the metric name (str) and
        the value is the (float) score achieved.

    References
    ----------
    .. [#] J. Salamon, E. Gomez, D. P. W. Ellis and G. Richard, "Melody
        Extraction from Polyphonic Music Signals: Approaches, Applications
        and Challenges", IEEE Signal Processing Magazine, 31(2):118-134,
        Mar. 2014.

    .. [#] G. E. Poliner, D. P. W. Ellis, A. F. Ehmann, E. Gomez, S.
        Streich, and B. Ong. "Melody transcription from music audio:
        Approaches and evaluation", IEEE Transactions on Audio, Speech, and
        Language Processing, 15(4):1247-1256, 2007.

    .. [#] R. Bittner and J. Bosch, "Generalized Metrics for Single-F0
        Estimation Evaluation", International Society for Music Information
        Retrieval Conference (ISMIR), 2019.

    zVoicing RecallzVoicing False AlarmzRaw Pitch AccuracyzRaw Chroma AccuracyzOverall Accuracy)
r   filter_kwargsrO   collectionsOrderedDictrT   rV   ra   re   rk   )rI   rJ   rK   rL   r   rM   kwargsr   r   r   scoresr   r   r   evaluate  sL   C







rq   )r   )N)r/   )NNr   Nr/   )rZ   )NN)__doc__numpyr
   scipy.interpolater9   rm   r    r   r   r   r!   r%   r.   rG   rO   rT   rV   rY   ra   re   rk   rq   r   r   r   r   <module>   s6   D

&
e
}%%
'E

EP