Datasets:
The dataset viewer is not available for this split.
Error code: StreamingRowsError Exception: FileNotFoundError Message: [Errno 2] No such file or directory: 'waxholm/alloktrainfiles' Traceback: Traceback (most recent call last): File "/src/services/worker/src/worker/utils.py", line 257, in get_rows_or_raise return get_rows( File "/src/services/worker/src/worker/utils.py", line 198, in decorator return func(*args, **kwargs) File "/src/services/worker/src/worker/utils.py", line 235, in get_rows rows_plus_one = list(itertools.islice(ds, rows_max_number + 1)) File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 1379, in __iter__ for key, example in ex_iterable: File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/iterable_dataset.py", line 233, in __iter__ yield from self.generate_examples_fn(**self.kwargs) File "/tmp/modules-cache/datasets_modules/datasets/KTH--waxholm/2f603121d564863a31cecdb78b5439700d9b7299af8155b1cc63c3d169de229d/waxholm.py", line 116, in _generate_examples with open(f"./waxholm/{files}") as input_file: File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/streaming.py", line 74, in wrapper return function(*args, download_config=download_config, **kwargs) File "/src/services/worker/.venv/lib/python3.9/site-packages/datasets/download/streaming_download_manager.py", line 491, in xopen return open(main_hop, mode, *args, **kwargs) FileNotFoundError: [Errno 2] No such file or directory: 'waxholm/alloktrainfiles'
Need help to make the dataset viewer work? Open a discussion for direct support.
THE WAXHOLM CORPUS
The Waxholm corpus was collected in 1993 - 1994 at the department of Speech, Hearing and Music (TMH), KTH. It is described in several publications. Two are included in this archive. Publication of work using the Waxholm corpus should refer to either of these. More information on the Waxholm project can be found on the web page http://www.speech.kth.se/waxholm/waxholm2.html
FILE INFORMATION
SAMPLED FILES
The .smp files contain the speech signal. The identity of the speaker is coded by the two digits after 'fp20' in the file name. The smp file format was developed by TMH. Recording information is stored in a header as a 1024 byte text string. The speech signal in the Waxholm corpus is quantised into 16 bits, 2 bytes/sample and the byte order is big-endian (most significant byte first). The sampling frequency is 16 kHz. Here is an example of a file header:
>head -9 fp2001.1.01.smp
file=samp ; file type is sampled signal
msb=first ; byte order
sftot=16000 ; sampling frequency in Hz
nchans=1 ; number of channels
preemph=no ; no signal preemphasis during recording
view=-10,10
born=/o/libhex/ad_da.h25
range=-12303,11168 ; amplitude range
=
LABEL FILES
Normally, each sample file has a label file. This has been produced in four steps. The first step was to manually enter the orthographic text by listening. From this text a sequence of phonemes were produced by a rule-based text-to-phoneme module. The endpoint time positions of the phonemes were computed by an automatic alignment program, followed by manual correction. Some of the speech files have no label file, due to different problems in this process. These files should not be used for training or testing.
The labels are stored in .mix files. Below is an example of the beginning of a mix file.
>head -20 fp2001.1.01.smp.mix
CORRECTED: OK jesper Jesper Hogberg Thu Jun 22 13:26:26 EET 1995
AUTOLABEL: tony A. de Serpa-Leitao Mon Nov 15 13:44:30 MET 1993
Waxholm dialog. /u/wax/data/scenes/fp2001/fp2001.1.01.smp
TEXT:
jag vill }ka h{rifr}n .
J'A:+ V'IL+ "]:K'A H'[3RIFR]N.
CT 1
Labels: J'A: V'IL "]:KkA H'[3RIFR]N .
FR 11219 #J >pm #J >w jag 0.701 sec
FR 12565 $'A: >pm $'A:+ 0.785 sec
FR 13189 #V >pm #V >w vill 0.824 sec
FR 13895 $'I >pm $'I 0.868 sec
FR 14700 $L >pm $L+ 0.919 sec
The orthographic text representation is after the label 'TEXT:' CT is the frame length in number of sample points. (Always = 1 in Waxholm mix files) Each line starting with 'FR' contains up to three labels at the phonetic, phonemic and word levels. FR is immediately followed by the frame number of the start of the segment. Since CT = 1, FR is the sample index in the file. If a frame duration is = 0, the label has been judged as a non-pronounced segment and deleted by the manual labeller, although it was generated by the text-to-phoneme or the automatic alignment modules. Column 3 in an FR line is the phonetic label. Initial '#' indicates word initial position. '$' indicates other positions. The optional label '>pm' precedes the phonemic label, which has been generated by the text-to-phoneme rules. Often, the phonemic and the phonetic labels are identical. The optional '>w' is followed by the identity of the word beginning at this frame. The phoneme symbol inventory is mainly STA, used by the KTH/TMH RULSYS system. It is specified in the included file 'sampa_latex_se.pdf'.
Some extra labels at the phonetic level have been defined. The most common ones are:
sm | lip or tongue opening |
p: | silent interval |
pa | aspirative sound from breathing |
kl | click sound |
v | short vocalic segment between consonants |
upper case of stops | occlusion |
lower case of stops | burst |
The label 'Labels:' before the FR lines is a text string assembled from the FR labels
The mix files in this archive correspond to those with the name extension .mix.new in the original corpus. Besides a few other corrections, the main difference is that burst segments after retroflex stops were not labelled as retroflex in the original .mix files ( d, t after 2D and 2T have been changed to 2d and 2t).
REFERENCES
Bertenstam, J., Blomberg, M., Carlson, R., Elenius, K., Granström, B., Gustafson, J., Hunnicutt, S., Högberg, J., Lindell, R., Neovius, L., Nord, L., de Serpa-Leitao, A., and Ström, N.,(1995). "Spoken dialogue data collected in the WAXHOLM project" STL-QPSR 1/1995, KTH/TMH, Stockholm.
Bertenstam, J., Blomberg, M., Carlson, R., Elenius, K., Granström, B., Gustafson, J., Hunnicutt, S., Högberg, J., Lindell, R., Neovius, L., de Serpa-Leitao, A., Nord, L., & Ström, N. (1995). The Waxholm application data-base. In Pardo, J.M. (Ed.), Proceedings Eurospeech 1995 (pp. 833-836). Madrid.
Comments and error reports are welcome. These should be sent to: Mats Blomberg [email protected] or Kjell Elenius [email protected]
- Downloads last month
- 170