The speech synthesis method

n order to record EVP it seems to be necessary to provide some kind of "raw material" which is used as a "carier" for the voices. Inside of soundproof rooms without any acoustic source, no voices seem to be formed at all. The frequency spectrum of this raw material - no matter if it's a direct sound event or if it's modulated onto various carriers (radio waves, light etc.) - apparently has to be within the audible range. One of the most common views is, that the voices result from paranormal transformations of the raw material. However, up to now this assumption could not yet be confirmed clearly in experiments. Usually the voices are already contained within the raw material. In my article Seite Hypotheses about the origin of the Electronic Voice Phenomenon I have made some reflections on this topic.

There are various methods existing to generate, transmit and record the raw material. Many experimenters use a radio set which is tuned to a foreign-language radio station, or to a mixture of several radio stations, as a raw material source. It is important that the experimenter does not understand the language of the raw material because this would be distracting. I personally don't prefer this method because it's too error-prone since it can be very delusive if you are interpreting something as a message which in fact is a usual word in the foreign language. A raw material that is "unsuspicious" but is as similar as possible to language had to be found!

Fidelio Köberle, VTF e.V.
Fidelio Köberle, VTF e.V.
 

Already in 1988, when most people knew computers only from science fiction movies, the former president of the German Association For Transcommunication Research (VTF e.V.) [Link], Fidelio Köberle, had the idea "[...] to provide an artificially generated raw material, like for instance the often used noise from rippling water or Struck's "rubbing method". Ideal could be a continuously produced synthetic raw material which comes as close to speech as possible. As close as possible in order to allow the interlocutors on the other side to form real speech from it using as little energy as possible. Little energy, because we know that this works best (see transformations). The raw material must of course not already be speech but should be transformed into reasonable speech easily. It should bubble without periodicity. It should, like usual speech, contain pauses. Without the use of random generators this won't be managed. [...]". (Source: VTF-Post P 51, issue 2/88 - 1.4.1988, page 42)


Random controlled speech synthesis

Helmut Schmidt with his Psi testing device
Helmut Schmidt with his Psi testing device
 

Inspired by this suggestion, different means to generate such a raw material had been developed. Electronics engineer Peter Stein (Denmark) [Link] for example used two stereo cassette players (walkmen) and switched continuously between their four audio tracks by means of an electronic switch. The switching speed could be varied. However, for me personally this method wasn't flexible enough and the generated raw material wasn't "dynamic" enough. Inspired by the psychokinesis experiments of the parapsychologist Helmut Schmidt [Link 1, Link 2], I wanted to use a real random number generator (RNG) [Link German] for the focus of potential paranormal influences. Each random number should be assigned to a certain phoneme of the German language. While the RNG was running, the corresponding phonemes should be played back via a speaker. With a little "practice" the originators of the voices should be able to synthesize their messages by influencing the RNG mentally. Since it probably would have been rather expensive to realize this by electronic means, I decided to make use of a computer.


Audio-Ed

The C64 home computer (Photo: Boris Klug)
The C64 home computer (Photo: Boris Klug)
 

In those days (1989), PCs were still expensive, and sound cards weren't available anyway. The Amiga already had fantastic sound capabilities, but at that time I owned the C64 home computer, so I gave it a try. It is true that the C64 had a sound chip which could produce sounds artificially, but you couldn't use it to synthesize speech. Then, by chance, I found in an electronics shop a kit for a so called "Audio Interface" which could be connected to the User Port of the C64 in order to digitize audio signals, store them into the computer memory and play them back later. The Audio Interface used "Delta Modulation", an 1-bit A/D and D/A conversion. It was very unprecise, and sounded totally noisy, but occupied very little computer memory. Nevertheless the only 48 kilobytes of memory were already full after 13 seconds of recording time. But this was sufficient for my purposes; after all it was possible to store about 100 phonemes (if you base on an average phoneme duration of 0.1 to 0.2 seconds), and that was absolutely sufficient to synthesize artificial speech.

Screenshot: Audio-Ed (PC version)
Screenshot: Audio-Ed (PC version)
 

To operate the Audio Interface, a little BASIC program was enclosed (of course as a listing for typing in) which POKEd some Assembler routines from DATA lines into the main memory. I used a "Machine Language Monitor" (something similar to DEBUG under MS-DOS) to disassemble the program and to add some further functions to edit, save and load the audio signal. I called the resulting program simply Audio-Ed. The part "Ed" indicates, that this program was able to edit an audio signal: You could cut out short sound sequences by ear and store them as individual segments onto "Floppy-Disk". If you had collected enough phonemes, you could make a phoneme file from them. These phonemes could be used to generate "random speech" which sounded like "gibberish". Additionally, there was a function to control the choice of the phonemes via the Game Port, so that an external random generator could be connected to the C64.

But regarding the quantity and quality of the obtained EVP, it turned out in practice that it is irrelevant if "real" phonemes are used, or if simply small pieces (segments) of equal duration are cut out automatically from the audio signal. Also if "real" random numbers or only "pseudo" random numbers are used to select the phonemes/segments turned out to be insignificant. In this connection, here a little anecdote:

Once during an EVP session using female speech as a raw material, I received the quite distinct and striking voice: Sound "Computer ist kaputt!" (German for: "Computer is kaput!"). I didn't know what to do with it, because apparently my device worked very well. On the next day I used the same raw material again for an EVP recording (I had stored the sample on a diskette). I was very surprised when I suddenly heard exactly the same voice "Computer ist kaputt" again! What has happened?

Since the C64 had no built-in real time clock, the RNG had been initialized with the same starting values each time it was switched on - until then I hadn't been aware of this fact. But how could this be? Such a distinct voice from pseudo random numbers? Moreover, through this incidence the voice obtained a certain meaning by giving me to understand in a coded form (which is often the case with EVP) that the RNG did not work the way I assumed. Since the literature always mentioned a "random" number generator, I wrongly assumed that it would produce "real" random numbers. Therefore my hypothesis was that this RNG could be influenced paranormally. But obviously this did not happen, since otherwise this voice would not have been produced once more.

The conclusions from this incident were far-reaching: Apparently it was not necessary for the generation of distinct and meaningful EVP to influence the devices in some paranormal way! (Meanwhile my theory is that the EVP phenomenon as well as other paranormal phenomena is based on some kind of "synchronicity".)

In 1990, Audio-Ed had been rewritten for a PC (80286 CPU) under MS-DOS and extended by a "logging" feature. With it, the generated sequence of random numbers could be logged into a logfile during the (pseudo) random controlled playback of the segments. Later exactly the same "raw material" could be played back again in order to "repeat" an EVP session. By this means it was possible to examine if the same voices were obtained again. (In my own experiments, this was always the case - thus no "transformations"!)


Audigit

A picture of the self-made sound digitizer
A picture of the self-made sound digitizer
 

Since editing each individual phoneme and compiling a complete phoneme set was always a very time consuming task, and because randomly cut segments work as well as phonemes, the next program version got no more editing function at all, thus it was given the name "Audigit" [Link 1 German, Link 2]. This version - now written in C - worked with 8-bit D/A and A/D conversion. Sound cards were still nearly unaffordable, therefore I used a self-made "Sound Digitizer" which I built from a construction manual which was printed in the October 1990 issue of the German computer magazine "DOS International" (now "PC Magazin") [Link German].

Screenshot: Audigit
Screenshot: Audigit
 

Sound card support was added later. Since Audigit was a DOS program, the usable memory was still limited to 640 KB which allowed only a short recording time (about 25 seconds at a sample frequency of 20 kHz and a resolution of 8 bit). Moreover it only worked with 100% Soundblaster compatible sound cards. Therefore in May/June 2000 I wrote a "successor", the program EVPmaker, which runs under Windows 9x and Windows NT, works with any sound card and has no memory limitations.


EVPmaker

Compared to "Audigit", EVPmaker has several new features. If a sound editor such as Seite Adobe Audition (formerly known as CoolEdit) is used to determine the phonemes contained in the source audio file, then "real" phonemes can be used for the generation of the EVP raw material instead of arbitrarily cut chunks. And if the phonemes are labeled with their phonetic spelling symbols, the phonetic spelling of the EVP can also be displayed. So if you hear an EVP, you can additionally read it in phonetic transcription.

Screenshot: EVPmaker
Screenshot: EVPmaker
 

Another new feature of EVPmaker is the possibility to work with "EVP sessions". An EVP session can consist of any number of individual EVP. Each EVP is automatically provided with the current date and time. For each EVP a question and an interpretation can be entered. EVP sessions can be saved to disk and loaded again later. The raw material sequence of each EVP can be repeated as often as you wish, it can be saved as a WAV file, or it can be loaded into an sound editor where it can be processed or examined in any way. Every single raw material segment, from which the randomly generated sequence has been composed of, is stored in a "cue list". If the sound editor is able to display such a cue list, then these segments can be addressed and played back immediately. In this way you can examine exactly, from which fragments a voice has been composed of.

Furthermore there are differend means for the generation of random numbers which are used to pick the segments/phonemes from the source audio file: In addition to pseudo random numbers, also "true" random numbers can now be used by connecting e.g. a radio which is tuned to "white noise" to the line input of the sound card. However, this feature is so new that I can't say which impact it has on the number and the quality of the EVP. Although I wasn't able to observe direct transformations of the raw material (as stated above), I hope for more meaningful voices since I expect that analog devices which are using "true" random processes can better "synchronize" into paranormal processes than a pure logical machine like a computer.

If you would like to experiment with this method, you can download EVPmaker together with an extensive manual and a quick start guide for beginners [Link]. Some EVP examples which were obtained by using the speech synthesis method can be found on the Seite Examples page.



Deutsch