Schott’s General American dictionary 0.2

May 1st, 2012

About two years ago, I published Ralf’s General American dictionary version 0.1.1. I decided to develop the next version 0.2 of this dictionary from scratch. The dictionary gets a new name: Schott's General American dictionary instead of Ralf's General American dictionary. This article explains the creation of the dictionary:

1. Get an American English spelling dictionary with 390.000 words.

2. License is GPL version 2.
3. Encoding of the files en_US.dic and en_US.aff is UTF-8.
4. Linux Mint terminal:

cd /home/ubuntu/Documents/american-english
unmunch en_US.dic en_US.aff > american-wordlist

5. Add speak tags at the beginning and the end of american-wordlist.
6. Linux Mint terminal:

espeak -f american-speak-audio -m -v en-us -q -x --phonout="american-espeak"

7. Adding <lexicon> tags to the file american-espeak (<lexicon> at the beginning of the file; </lexicon> at the end of the file).
8. Linux Mint terminal:

saxonb-xslt -ext:on -s:american-espeak -xsl:'http://spirit.blau.in/simon/files/2010/04/replace-newline-newline-space-by-phoneme-element.xsl' -o:american-phoneme-elements
mkdir espeak
paste american-speak-audio american-phoneme-elements > espeak/general-american-dictionary.xml

9. Download the dictionary (eSpeak edition).

10. I am planning to release an IPA version of this dictionary.

Ralf’s Canadian English dictionary 0.1

April 28th, 2012

This article explains the creation of Ralf’s Canadian English dictionary version 0.1.

1. Get a Canadian spelling dictionary with 390.000 words.
2. License is GPL.
3. Encoding of the files en_CA.dic and en_CA.aff is UTF-8.
4. Linux Mint terminal:

cd /home/ubuntu/Documents/canadian-english
unmunch en_CA.dic en_CA.aff > canadian-wordlist

5. Add tags at the beginning and at the end of canadian-wordlist.
6. Linux Mint terminal:

saxonb-xslt -ext:on -s:canadian-wordlist -xsl:'http://spirit.blau.in/simon/files/2010/04/create-audio-elements.xsl' -o:canadian-speak-audio
espeak -f canadian-speak-audio -m -v en -q -x --phonout="canadian-espeak"

7. Adding <lexicon> tags to the file canadian-espeak (<lexicon> at the beginning of the file; </lexicon> at the end of the file).

8. Create elements:

saxonb-xslt -ext:on -s:canadian-espeak -xsl:'http://spirit.blau.in/simon/files/2010/04/replace-newline-newline-space-by-phoneme-element.xsl' -o:canadian-phoneme-elements

9. Combine and elements:

mkdir espeak
paste canadian-speak-audio canadian-phoneme-elements > espeak/canadian-english-dictionary.xml

10. Download the dictionary (eSpeak edition).

I am planning to create an IPA version of this dictionary.

Ralf’s British English dictionary 0.1

April 28th, 2012

This article explains the creation of a British English pronunciation dictionary.

1. Download the 390.000 words version of the dictionary.

2, License is GPL.
3. Linux Mint terminal:

cd /home/ubuntu/Documents/british

4. Now I install Geany because I want to check the encoding of the files en_GB.aff and en_GB.dic: sudo apt-get install geany
The encoding of both files is UTF-8.

5. Linux Mint terminal:

sudo apt-get install hunspell-tools
cd /home/ubuntu/Documents/british
unmunch en_GB.dic en_GB.aff > british-wordlist
sudo apt-get install espeak

I need to know which voice I should use.
Linux Mint terminal:

espeak --voices

I will use en-uk. What is the proper command? I had generated US English phonemes. The command was:

espeak -f english-grapheme -m -v en-us -q -x --phonout="english-espeak"

I will have to markup the dictionary file with speak and audio tags.

6. Now I install saxonb-xslt with the following command:

sudo apt-get install libsaxonb-java

7. Add speak tags at the beginning and at the end of the file british-wordlist.
8. Linux Mint terminal:

saxonb-xslt -ext:on -s:british-wordlist -xsl:'http://spirit.blau.in/simon/files/2010/04/create-audio-elements.xsl' -o:british-speak-audio
espeak -f british-speak-audio -m -v en-uk -q -x --phonout="british-espeak"

9. Adding <lexicon> tags to the file british-espeak (<lexicon> at the beginning of the file; </lexicon> at the end of the file).

10. Create phoneme elements (compare with this article):

saxonb-xslt -ext:on -s:british-espeak -xsl:'http://spirit.blau.in/simon/files/2010/04/replace-newline-newline-space-by-phoneme-element.xsl' -o:british-phoneme-elements

11. Combine grapheme elements with phoneme elements:

paste british-speak-audio british-phoneme-elements > british-dictionary-espeak.xml

12. Download the dictionary (eSpeak edition).

I am planning to create an IPA version of this PLS dictionary.

Compare graphemes of two dictionaries

December 5th, 2011

Here is the concept of a script that can compare two dictionaries with each other. The first dictionary uses grapheme elements which are in upper case letters, the second dictionary distinguishes between upper case and lower case. If there is a corresponding entry in the second dictionary, the entry of the first dictionary will be set to lower case in the resulting output tree. Here is the script:

<?php
// Compare the grapheme elements of two dictionaries

if (file_exists(‘general-american-dictionary.xml’)) {
$xml = simplexml_load_file(‘general-american-dictionary.xml’);
$english = simplexml_load_file(‘english-dictionary.xml’);

foreach ($xml->lexeme as $lexeme) {
$grapheme = $lexeme->grapheme;
foreach ($english->lexeme as $lexemeenglish) {
$graphemeenglish = $lexemeenglish->grapheme;
if ($grapheme == strtoupper($graphemeenglish)) {
$grapheme = $graphemeenglish;
}
}
echo $grapheme, ‘    ‘, $lexeme->phoneme, PHP_EOL;
}
} else {
exit(‘Failed to open general-american-dictionary.xml.’);
}
?>

Of course, this script is not yet finished. This script is working very slow, but never mind.

What are amplitude, frequency, wavelength?

August 30th, 2009

I think that 16 kHz / 16 bit recordings should be sufficient for the development of a speech model. But what does that mean? A good article explains the differences between amplitude (16 bit recordings are more precise than 8 bit recordings), and frequency, wavelength (the human ear can distinguish up to 20 kHz; you need the double amount of kHz for recording; 16 kHz means that 8 kHz are distinguished – should be sufficient for speech).

Replacing HTK by Sphinx?

August 28th, 2009

You need to install HTK if you want to run simon with the whole functionality. HTK is not included, it has to be downloaded from a different source (registration required). From my point of view, this information could hint people who are familiar with Sphinx and Qt into the right direction:

“simon uses the NON-FREE HTK for that. Only _one_ class in simon comes into contact with the HTK. The model compilation manager. This class: http://speech2text.svn.sourceforge.net/viewvc/speech2text/trunk/simonlib…. Those 1200 lines (including other, julius related stuff) are everything that links simon to the HTK. The class could very, very easily be replaced with one that uses something else.”

simon should continue to make use of HTK because there are things that you never should do:

“They did it by making the single worst strategic mistake that any software company can make:

They decided to rewrite the code from scratch.”

Well, but maybe there is someone out there who would want to start a fork of simon, and replace HTK by Sphinx? Of course, this would be a completely different project.

I think that Sphinx could use a GUI.

Sound frames – a, e, i, o, u, b(e)

July 18th, 2009

Let’s take a look at a few sound frames (click picture to enlarge):

sound-a

U+0061 (a), U+02D0
The sounds in this article correspond to the German pronunciation.

sound-e
U+0065 (e), U+02D0

sound-i
U+0069 (i), U+02D0

sound-o
U+006F (o), U+02D0

sound-u
U+0075 (u), U+02D0

sound-be
First part: U+0062 (b) – second part: U+0065 (e), U+02D0

Julius package for Ubuntu

June 18th, 2009

Soon, there should be an updated Julius package for Ubuntu (4.0.2 -> 4.1.2).

Characteristics of the sound “a”

May 20th, 2009

Let’s take a look at the characteristics of the sound “a” (spoken like in father). Here is a screenshot of Audacity which shows the repetitive pattern of the sound “a”:

waveform-a

I have marked the different waves with numbers 1, 2, 3, 4, 5. The waves with the same number are slightly different one from another, but they are similar. It is a repetitive pattern. Let’s extract a complete frame of the sound “a”:

waveform-a-small

The above picture shows the first frame. Let’s compare the first frame with the second frame:

waveform-a-small2

Take a look at the yellow marked area, and compare it with the corresponding area of the previous picture. It is slightly diffent.

This was a short introduction into signal processing. These sound waves can be analysed by software like the HTK toolkit.

Installing simon-juliusd-0.1-alpha2.exe

June 22nd, 2008

I just downloaded the obviously recently released program simon-juliusd-0.1-alpha2.exe. Before running the program, I checked it with ClamWin (I always do that before I install new software). It is OK, so I will install this program on my computer. The program is licensed under the GPL. On my computer, the program simon.exe was installed on the location “H:\Program Files\simon\simon-0.1-alpha-2.”

Here is a screenshot:

Simon configuration

But now, it is beginning to get complicated. Take a look at the next screenshot:

Simon checklist

So to use this program successfully, there are several additional programs needed. I need the HTK toolkit, and Julius. And there are further components necessary. I think I will stop the installation now. Or should I continue? At the moment, I am not sure. I think, that I will hit the next button.

I won’t publish a screenshot from the next step. But it is about HTK programs HDman, HCopy, and several other programs. I think (but I am not sure) that it is necessary to tell Simon the path on which location those programs are installed. A few months ago, I made some first steps with HTK and Julius, but everything was pretty complicated. At the moment, I am reading a few pages in the HTK book, everything is very abstract. And it takes a lot of time to get involved. But it is possible! You just have to stay focused.

VoxForge dictionary isn’t encoded in UTF-8

June 21st, 2008

I just downloaded the VoxForge dictionary (2.6 MB), and opened it with Notepad++. Obviously, it is encoded in ANSI, not in UTF-8. That’s OK because it does contain just standard characters. I am guessing that this dictionary is compatible with ASCII. But I would suggest that future versions should be published in UTF-8.

Switching from Arpabet to IPA

June 21st, 2008

Obviously, the CMU pronouncing dictionary is using the Arpabet. The Arpabet has the advantage that it is possible

“to represent phonemes with ASCII characters.”

But today, the UTF-8 standard is becoming more and more common. In my opinion, there should be a discussion to switch from Arpabet/ASCII to IPA/UTF-8. The IPA is easier to read than the Arpabet. And UTF-8 should be backwards compatible to ASCII (at least, as far as I know).

RFC 4267: VoiceXML, PLS, SSML, SRGS, CCXML

June 19th, 2008

Recently, I read the document RFC 4267. In my opinion, this framework is something very interesting.

learning sphinx automatic speech recognition

March 24th, 2008

You can learn to use the CMU Sphinx automatic speech recognition system. I followed several steps of this tutorial, but I didn’t succeed. I used Ubuntu Linux. What was the problem? Well, there occurred several smaller problems. I could solve a few of them, but not all. I will try again.

2008-04-01: Doubt about sphinx3 installation

eSpeak’s German pronunciation rules

March 12th, 2008

Obviously, there are two separate methods for translating words into phonemes: pronunciation rules and dictionary. To build a pronunciation dictionary for the German language, it may be possible to use the pronunciation rules, and convert them via the script espeak2phones.pl.

How good is this approach?

How can I create an acoustic model with Sphinx?

March 11th, 2008

I have just read this article. It is a good idea to donate speech to the VoxForge project. But at the moment, I do have a different question: how is it possible to create an acoustic model (after you have donated some speech)? At the moment, there are acoustic models in the development for the English, Dutch, German and Russian language. Obviously, all (or most) of them are created with CMU Sphinx. But how do you use CMU Sphinx? This is one of the main questions I do have at the moment. It is not easy to find the answer, but I am trying. It should be possible to use Sphinx (and German or English speech files) to create an acoustic model (German or English). But how is it possible to achieve this goal? It would be helpful if there were a good documentation out there. Probably there is, but it is not easy to get involved.

Extracting zipped HTK folder containing Windows binaries

February 23rd, 2008

This is a short flash tutorial (about 200 kilobytes) for new developers of speech recognition software that shows how to extract a zipped folder containing HTK Windows binaries.

analyzing waveforms with wavesurfer

February 18th, 2008

I just took a look into the Hieroglyphs (PDF). At page 80 of this PDF-book I found a remark about the program wavesurfer that should be great in analyzing waveforms. I took a look into the wavesurfer user manual. This seems to be an interesting program because it might be helpful in analyzing the waveforms of sound waves.

Screencast: transcribing prompts, encoding into FLAC, creating ZIP archive, submitting to VoxForge

February 15th, 2008

I have just recorded a Screencast. This Screencast shows the following steps in the process of creating/submitting prompts in the German language to the VoxForge project. In detail:

- taking a look at the prompts.txt file using Notepad++,
- transcribing several wav files using Dragon NaturallySpeaking into text,
- comparing the transcribed text with the corresponding line in the prompts.txt file,
- encoding 99 wav files into the FLAC format using FLAC front-end. This step reduces the file size from one hundred percent to about forty percent.
- copy and paste the content of the files “readme.txt”, “prompts.txt” and “license-english.txt” from the Notepad++ into the Web form of VoxForge speech submission system,
- creating a zip archive using 7-Zip,
- and finally submitting the whole zip archive from the personal desktop to the VoxForge website.

Download the screencast (about 12 megabytes). This Screencast contains some explanations. It is not perfect, but you should get an impression what to do, if you want to contribute your speech to the VoxForge project.

You can see that it is not very easy to create those prompts. But it is not impossible. There are just a lot of steps that have to be done. And we are using very powerful tools to create the prompts.

How is the quality of the sound card?

December 25th, 2007

How is it possible to measure the quality of the sound card? The sound card should be able to record audio in high-quality. I just found here that the software RightMark Audio Analyzer might be a good choice to check the quality of my onboard sound card and of my Andrea USB recording sound card. I am thinking about buying the M-Audio Transit USB sound card. But how much better would this sound card be than my current solution? There are a lot of values that I don’t understand the meaning of. For example, what does signal to noise ratio mean? What does it mean if the signal-to-noise ratio is 100 dB? I am sure that the higher the value is the better it is. But how is the signal-to-noise ratio of the Andrea USB soundcard? I don’t know the answer. I could try to find the answer using the RightMark Audio Analyzer.


Bad Behavior has blocked 67 access attempts in the last 7 days.