o
    i!                     @   s(   d Z ddlmZ ddlmZ dd ZdS )u  Vietnamese cleaner.

Created on 10:19 AM, 10/17/19
@author: ngunhuconchocon
@brief: Пролетарии всех стран, соединяйтесь! да здравствует наша советская родина
        Vietnamese cleaner. This is a naive implementationm which only seperate punctuation
        and handle some abbreviation. You should see `regex_tokenize.py` for more details.
        This must be updated later for a "cleaner" cleaner
   )tokenize)UniStdc                 C   s   t | } t| ddS )uw  Perform Vietnamese cleaning.

    Handle the Vietnamese oldstyle of putting tones (òa or oà, úy or uý, ...).
    This action can directly benefit the result if you train the model with letter.
    In case of phoneme training, this cleaner will facilitate the dictionary
    (syllable->phonemes) preparation process.

    Many thanks to Thang Tat Vu and Thanh-Le Ha.

    text)format)r   r   )r    r   Z/home/ubuntu/.local/lib/python3.10/site-packages/vietnamese_cleaner/vietnamese_cleaners.pyvietnamese_cleaner   s   r   N)__doc__regex_tokenizer   vietnameseNormUniStdr   r   r   r   r   r   <module>   s   
