What are amplitude, frequency, wavelength?

August 30th, 2009

I think that 16 kHz / 16 bit recordings should be sufficient for the development of a speech model. But what does that mean? A good article explains the differences between amplitude (16 bit recordings are more precise than 8 bit recordings), and frequency, wavelength (the human ear can distinguish up to 20 kHz; you need the double amount of kHz for recording; 16 kHz means that 8 kHz are distinguished - should be sufficient for speech).

Replacing HTK by Sphinx?

August 28th, 2009

You need to install HTK if you want to run simon with the whole functionality. HTK is not included, it has to be downloaded from a different source (registration required). From my point of view, this information could hint people who are familiar with Sphinx and Qt into the right direction:

“simon uses the NON-FREE HTK for that. Only _one_ class in simon comes into contact with the HTK. The model compilation manager. This class: http://speech2text.svn.sourceforge.net/viewvc/speech2text/trunk/simonlib…. Those 1200 lines (including other, julius related stuff) are everything that links simon to the HTK. The class could very, very easily be replaced with one that uses something else.”

simon should continue to make use of HTK because there are things that you never should do:

“They did it by making the single worst strategic mistake that any software company can make:

They decided to rewrite the code from scratch.”

Well, but maybe there is someone out there who would want to start a fork of simon, and replace HTK by Sphinx? Of course, this would be a completely different project.

I think that Sphinx could use a GUI.

Sound frames - a, e, i, o, u, b(e)

July 18th, 2009

Let’s take a look at a few sound frames (click picture to enlarge):

sound-a

U+0061 (a), U+02D0
The sounds in this article correspond to the German pronunciation.

sound-e
U+0065 (e), U+02D0

sound-i
U+0069 (i), U+02D0

sound-o
U+006F (o), U+02D0

sound-u
U+0075 (u), U+02D0

sound-be
First part: U+0062 (b) - second part: U+0065 (e), U+02D0

Julius package for Ubuntu

June 18th, 2009

Soon, there should be an updated Julius package for Ubuntu (4.0.2 -> 4.1.2).

Characteristics of the sound “a”

May 20th, 2009

Let’s take a look at the characteristics of the sound “a” (spoken like in father). Here is a screenshot of Audacity which shows the repetitive pattern of the sound “a”:

waveform-a

I have marked the different waves with numbers 1, 2, 3, 4, 5. The waves with the same number are slightly different one from another, but they are similar. It is a repetitive pattern. Let’s extract a complete frame of the sound “a”:

waveform-a-small

The above picture shows the first frame. Let’s compare the first frame with the second frame:

waveform-a-small2

Take a look at the yellow marked area, and compare it with the corresponding area of the previous picture. It is slightly diffent.

This was a short introduction into signal processing. These sound waves can be analysed by software like the HTK toolkit.

Installing simon-juliusd-0.1-alpha2.exe

June 22nd, 2008

I just downloaded the obviously recently released program simon-juliusd-0.1-alpha2.exe. Before running the program, I checked it with ClamWin (I always do that before I install new software). It is OK, so I will install this program on my computer. The program is licensed under the GPL. On my computer, the program simon.exe was installed on the location “H:\Program Files\simon\simon-0.1-alpha-2.”

Here is a screenshot:

Simon configuration

But now, it is beginning to get complicated. Take a look at the next screenshot:

Simon checklist

So to use this program successfully, there are several additional programs needed. I need the HTK toolkit, and Julius. And there are further components necessary. I think I will stop the installation now. Or should I continue? At the moment, I am not sure. I think, that I will hit the next button.

I won’t publish a screenshot from the next step. But it is about HTK programs HDman, HCopy, and several other programs. I think (but I am not sure) that it is necessary to tell Simon the path on which location those programs are installed. A few months ago, I made some first steps with HTK and Julius, but everything was pretty complicated. At the moment, I am reading a few pages in the HTK book, everything is very abstract. And it takes a lot of time to get involved. But it is possible! You just have to stay focused.

VoxForge dictionary isn’t encoded in UTF-8

June 21st, 2008

I just downloaded the VoxForge dictionary (2.6 MB), and opened it with Notepad++. Obviously, it is encoded in ANSI, not in UTF-8. That’s OK because it does contain just standard characters. I am guessing that this dictionary is compatible with ASCII. But I would suggest that future versions should be published in UTF-8.

Switching from Arpabet to IPA

June 21st, 2008

Obviously, the CMU pronouncing dictionary is using the Arpabet. The Arpabet has the advantage that it is possible

“to represent phonemes with ASCII characters.”

But today, the UTF-8 standard is becoming more and more common. In my opinion, there should be a discussion to switch from Arpabet/ASCII to IPA/UTF-8. The IPA is easier to read than the Arpabet. And UTF-8 should be backwards compatible to ASCII (at least, as far as I know).

RFC 4267: VoiceXML, PLS, SSML, SRGS, CCXML

June 19th, 2008

Recently, I read the document RFC 4267. In my opinion, this framework is something very interesting.

learning sphinx automatic speech recognition

March 24th, 2008

You can learn to use the CMU Sphinx automatic speech recognition system. I followed several steps of this tutorial, but I didn’t succeed. I used Ubuntu Linux. What was the problem? Well, there occurred several smaller problems. I could solve a few of them, but not all. I will try again.

2008-04-01: Doubt about sphinx3 installation

eSpeak’s German pronunciation rules

March 12th, 2008

Obviously, there are two separate methods for translating words into phonemes: pronunciation rules and dictionary. To build a pronunciation dictionary for the German language, it may be possible to use the pronunciation rules, and convert them via the script espeak2phones.pl.

How good is this approach?

How can I create an acoustic model with Sphinx?

March 11th, 2008

I have just read this article. It is a good idea to donate speech to the VoxForge project. But at the moment, I do have a different question: how is it possible to create an acoustic model (after you have donated some speech)? At the moment, there are acoustic models in the development for the English, Dutch, German and Russian language. Obviously, all (or most) of them are created with CMU Sphinx. But how do you use CMU Sphinx? This is one of the main questions I do have at the moment. It is not easy to find the answer, but I am trying. It should be possible to use Sphinx (and German or English speech files) to create an acoustic model (German or English). But how is it possible to achieve this goal? It would be helpful if there were a good documentation out there. Probably there is, but it is not easy to get involved.

Extracting zipped HTK folder containing Windows binaries

February 23rd, 2008

This is a short flash tutorial (about 200 kilobytes) for new developers of speech recognition software that shows how to extract a zipped folder containing HTK Windows binaries.

analyzing waveforms with wavesurfer

February 18th, 2008

I just took a look into the Hieroglyphs (PDF). At page 80 of this PDF-book I found a remark about the program wavesurfer that should be great in analyzing waveforms. I took a look into the wavesurfer user manual. This seems to be an interesting program because it might be helpful in analyzing the waveforms of sound waves.

Screencast: transcribing prompts, encoding into FLAC, creating ZIP archive, submitting to VoxForge

February 15th, 2008

I have just recorded a Screencast. This Screencast shows the following steps in the process of creating/submitting prompts in the German language to the VoxForge project. In detail:

- taking a look at the prompts.txt file using Notepad++,
- transcribing several wav files using Dragon NaturallySpeaking into text,
- comparing the transcribed text with the corresponding line in the prompts.txt file,
- encoding 99 wav files into the FLAC format using FLAC front-end. This step reduces the file size from one hundred percent to about forty percent.
- copy and paste the content of the files “readme.txt”, “prompts.txt” and “license-english.txt” from the Notepad++ into the Web form of VoxForge speech submission system,
- creating a zip archive using 7-Zip,
- and finally submitting the whole zip archive from the personal desktop to the VoxForge website.

Download the screencast (about 12 megabytes). This Screencast contains some explanations. It is not perfect, but you should get an impression what to do, if you want to contribute your speech to the VoxForge project.

You can see that it is not very easy to create those prompts. But it is not impossible. There are just a lot of steps that have to be done. And we are using very powerful tools to create the prompts.

How is the quality of the sound card?

December 25th, 2007

How is it possible to measure the quality of the sound card? The sound card should be able to record audio in high-quality. I just found here that the software RightMark Audio Analyzer might be a good choice to check the quality of my onboard sound card and of my Andrea USB recording sound card. I am thinking about buying the M-Audio Transit USB sound card. But how much better would this sound card be than my current solution? There are a lot of values that I don’t understand the meaning of. For example, what does signal to noise ratio mean? What does it mean if the signal-to-noise ratio is 100 dB? I am sure that the higher the value is the better it is. But how is the signal-to-noise ratio of the Andrea USB soundcard? I don’t know the answer. I could try to find the answer using the RightMark Audio Analyzer.

Julius 4.0 released

December 22nd, 2007

Obviously, a few days ago Julius 4.0 has been released. Is there a good tutorial or Screencast available that explains how to use this software with the English language?

Installing Cygwin, HTK, Julius

December 21st, 2007

A few months ago, I had installed Cygwin, HTK, Julius. Because I didn’t use those programs (it was too complicated for me) I decided to reinstall HTK and Julius. A few minutes ago, I followed the steps concerning HTK and Julius that are explained here. I didn’t reinstall Cygwin.

For the installation I downloaded the following files:

  1. htk-3.3-windows-binary.zip - it seems that this isn’t the actual version. The actual version is obviously the version 3.4. But I will try it with the previous version.
  2. HTK-samples-3.3.zip - I believe that this isn’t the actual version. But in the tutorial this file was linked directly. So I downloaded it.
  3. julius-3.5.2-multipath-win32bin.zip - this is obviously not the actual version. But I took this version because it was linked directly.

I made some changes in the file H:\cygwin\etc\bash.bashrc. It was necessary to add the following lines:

"PATH=/Julius/julius-3.5.2-multipath-win32bin/bin:/HTK/htk-3.3-windows-binary/htk:$PATH
export PATH"

I don’t understand what the meaning of this line might be. But there will be two possibilities: it will work, or it won’t.

Afterwards, I opened the Cygwin console window by clicking “Start>All Programs>Cygwin>Cygwin Bash Shell.”

Then I typed in “HVite.” There were listed the options that are available for this command.

Afterwards, I typed in “julian.” It was displayed that version 3.5.2 is available.

Until now everything seems to work properly. Now it would be necessary to install audacity. But I have already installed audacity. So I can skip this step.

Until now, everything seems fine. So I can now begin with the data preparation. Obviously there are five major steps which are necessary. The first step is task grammar. The second step is about the pronunciation dictionary. The third step is about recording the data. The fourth step is about creating the transcription files. And the fifth step is about coding the audio data. I think that I am familiar with step three (recording the data). Let’s see whether there will be some problems or not.

Maybe I will continue this article.

Building a dictionary for the German language

December 19th, 2007

To create a speech recognition software for the German language, it is necessary to build a dictionary. Maybe the problem can be solved with the Help of eSpeak.

Screencast (Beta) - how to create prompts

December 19th, 2007

Here is a short screencast [2007-12-21: New version of the screencast] how to create prompts. The Macromedia flash player is necessary to view this screencast. This is my first screencast (created with Wink), I am planning to create better ones.

To create the prompts, it is necessary not only to have the text but also line numbers. I create the text using the DragonPad. And after wards, I copy and paste the text from the DragonPad into the Notepad++. And this screencast is about the Notepad++ which I use to create the line numbers. The process is semi-automated. That means that there is a macro involved, but I have to type the specific number of the line.


Bad Behavior has blocked 88 access attempts in the last 7 days.