o
    2wi                     @   s   d Z ddlZddlZddlmZ ddlmZ ddlmZm	Z	m
Z
mZ ddlmZ ddlmZmZmZ ddlmZmZ d	Zd
edfdede	e de	e defddZ		ddede	e de
e deeeeeeef f f fddZdS )uw  
This script creates the BUT Reverb DB dataset, which is available at:
https://speech.fit.vutbr.cz/software/but-speech-fit-reverb-database

The following description is taken from the official website:

This is the first release of BUT Speech@FIT Reverb Database. The database is being built
with respect to collect a large number of various Room Impulse Responses,
Room environmental noises (or "silences"), Retransmitted speech (for ASR and SID testing),
and meta-data (positions of microphones, speakers etc.).

The goal is to provide speech community with a dataset for data enhancement and distant
microphone or microphone array experiments in ASR and SID.

The BUT Speech@FIT Reverb Dataset consists of 9 rooms:

Size [m x m x m]	Volume [m^3]	# RIRs	Ret.	Type	In RIR-Only set	In LibriSpeech-Only set
Q301	10.7x6.9x2.6	192	31 x 3	1	Office	Yes	Yes
L207	4.6x6.9x3.1	98	31 x 6	3	Office	Yes	Yes
L212	7.5x4.6x3.1	107	31 x 5	2	Office	Yes	Yes
L227	6.2x2.6x14.2	229	31 x 5	3	Stairs	Yes	Yes
R112	4.4x2.8x2.6*	~40	31 x 5	0	Hotel room	Yes	No
CR2	28.2x11.1x3.3	1033	31 x 4	0	Conf. room	Yes	No
E112	11.5x20.1x4.8*	~900	31 x 2	0	Lect. room	Yes	No
D105	17.2x22.8x6.9*	~2000	31 x 6	1	Lect. room	Yes	Yes
C236	7.0x4.1x3.6	102	31 x 10	0	Meeting room	Yes	No

We placed 31 microphones in both rooms. The source (a Hi-Fi loudspeaker) was placed on 5
positions in average. We measured RIRs (using exponential sine sweep method) for each
speaker position. Next we recorded environmental noise (silence). There was a radio at
background playing in one speaker position in the office.

All microphone positions are measured and stored in meta-files. We pre-calculated positions
of microphones and speakers in Cartesian and polar coordinates as absolute and relative (to the speaker).

The corpus can be cited as follows:
@ARTICLE{8717722,
  author={Szöke, Igor and Skácel, Miroslav and Mošner, Ladislav and Paliesek, Jakub and Černocký, Jan},
  journal={IEEE Journal of Selected Topics in Signal Processing},
  title={Building and evaluation of a real room impulse response dataset},
  year={2019},
  volume={13},
  number={4},
  pages={863-876},
  doi={10.1109/JSTSP.2019.2917582}}
    N)defaultdict)Path)DictOptionalSequenceUnion)tqdm)CutSet	RecordingRecordingSet)Pathlikeresumable_downloadzGhttp://merlin.fit.vutbr.cz/ReverbDB/BUT_ReverbDB_rel_19_06_RIR-Only.tgz.F
target_dirurlforce_downloadreturnc                 C   s   t | } | jddd d}| | }| r |s td| d t|||d | d }| sTtd| d	 t|}|j| d
 W d   |S 1 sOw   Y  |S )a'  
    Download and untar the BUT Reverb DB dataset.

    :param target_dir: Pathlike, the path of the dir to store the dataset.
    :param url: str, the url that downloads file called BUT_ReverbDB.tgz.
    :param force_download: bool, if True, download the archive even if it already exists.
    Tparentsexist_okzBUT_ReverbDB.tgzz	Skipping z because file exists.)r   BUT_ReverbDBz
Untarring r   )pathN)	r   mkdirexistslogginginfor   tarfileopen
extractall)r   r   r   tgz_nametgz_pathtgz_dirtar r#   Y/home/ubuntu/sommelier/.venv/lib/python3.10/site-packages/lhotse/recipes/but_reverb_db.pydownload_but_reverb_db?   s    
r%   silencerir
corpus_dir
output_dirpartsc              
   C   s^  t | } |  sJ d|  |stdt|tr|g}tt}t| dD ]M}|j	j
 }||vr5q(|j	j	j	j	j	j}|j	j	j	j	j}|j	j	j	j}|j	j	j}	|jdd }
| d| d| d|	 d|
 	}tj||d}|| | q(tt}|D ]}t|| || d	< q||d
urt |}|jddd |D ]}|| d	 |d| d  q|S )z
    Prepare the BUT Speech@FIT Reverb Database corpus.

    :param corpus_dir: Pathlike, the path of the dir to store the dataset.
    :param output_dir: Pathlike, the path of the dir to write the manifests.
    zNo such directory: z,No parts specified for manifest preparation.z*.wavr   -z-v)recording_id
recordingsNTr   zbut-reverb-db_z_recordings.jsonl.gz)r   is_dir
ValueError
isinstancestrr   listr   rglobparentnamelowerstemsplitr
   	from_fileappenddictr   from_recordingsr   to_file)r)   r*   r+   r/   wav_filepartroom_idmic_idspk_iduidversionr.   	recording	manifestsr#   r#   r$   prepare_but_reverb_db[   s<   

 rI   )Nr&   )__doc__r   r   collectionsr   pathlibr   typingr   r   r   r   r   lhotser	   r
   r   lhotse.utilsr   r   BUT_REVERB_DB_URLr3   boolr%   rI   r#   r#   r#   r$   <module>   sD    .
