(...) Unfortunately many people, despite this lack of knowledge about the nature of speech, have strongly held beliefs about speech, which are often quite erroneous from a scientific point of view. Many of these inaccuracies have to do with the assumption that speech is like writing. This assumption is understandable when we consider that as children we were taught that letters represent sounds.

Nevertheless, there are many respects in which speech is not like writing. For example, there no spaces between words in speech; the sounds of speech do not have a one-to-one correspondence with letters (even making allowances for the anomalies of the English alphabet); sounds are not discrete units like letters; the pronunciation of words can vary dramatically depending on the sentence they are spoken in, and so on.

It is a well established principle of psycholinguistics that people can be influenced in their perception of speech by their expectations about what is being said (see any textbook on psycholinguistics, eg. Massaro in reference list below).

The phonetic \'stuff\' of speech under-determines its perception as meaningful language. It has been well known for many years, for example, that individual words excised from the flow of speech have very low intelligibility, even when they are perfectly clear in context.

In normal speech perception, the hearer \'fills in\' information based on their knowledge of the language and their expectations and predictions in the particular context. This is generally understood as showing that speech perception involves both \'bottom-up\' information, from the speech signal, and \'top-down\' information from the hearer\'s knowledge system. Perception can therefore be influenced by the hearer\'s expectations, predictions or biases.

Well-known experiments for example have shown that if subjects are presented with poorly recorded speech, their perception can be altered depending on whether they are told it is a sentence about sport or a sentence about the weather.

It is also a common aspect of everyday experience that speech can be mis-heard. Psycholinguists and phonologists study these as useful data on speech processing. As just one example from large corpora of perception errors consider this one: \'That\'s unbelievable\' heard as \'That\'s happened to me before\'.

Your speech betrays you.

Your speech not only expresses the meaning of the words you are using. The listener can also tell from your speech whether you are a man or a woman, and even your approximate age. She/he may be able to hear what geographical area you are from, what social class you belong to, whether you have a cold or are tired, or whether you are happy or sad. In some courts, it is even used to help to decide whether a suspect is telling the truth. Given the enormous variability in speech behaviour between and within speakers it is important to realize that speech behaviour is only an indication of all these speaker properties. For example, a hoarse voice can indicate that the speaker has a night of heavy drinking behind him/her or that he/she smokes a lot, but it may also be his/her natural voice quality.

A speech sound is not a speech sound.

It is probably clear that the different voice qualities of different speakers also affect the individual speech sounds. But there are many other factors which play a role, such as the immediate context in which a speech sound is produced. For instance, in English or German the so-called voiceless plosives /p,t,k/ are aspirated at the beginning of a syllable (with breathing immediately following the release of the plosive closure), unless they are preceded by /s/ in the same syllable. You can hear this if you compare the pronunciation of \"park\" versus \"spark\". If you record the word \"spark\" and cut off the /s/, you will not hear \"park\" but \"bark\" instead. The aspiration of syllable-initial plosives is stronger in accented than in unaccented syllables (compare \"king\" to \"barking\"). These are only a few of many possible influences on the way a speech sound is produced. Human listeners do not usually have problems with this variable, in fact they can use it to extract information from the signal, e.g. not only about the sound\'s identity, but also about the accentuation of the syllable in which it appears.

Your computer can hear.

Automatic speech recognition (ASR) is usually \"just\" the recognition of sequences of words from the speech signal by a computer, although there have been successful attempts to \"understand\", i.e. to recognize e.g. dialogue acts (checks, queries, etc.) from the speech pattern. Automatic speech recognition uses pattern recognition methodologies (neural networks and particularly hidden Markov modelling) to recognize sequences of speech sounds from the acoustic pattern. The pattern recognition techniques are statistical techniques which have been designed to handle the variability in the speech signal which we touched on before - and they do so very succesfully, as is demonstrated by the application of these automatic speech recognition technologies to dialogue systems, like automatic train timetables.

Speaker recognition uses the same techniques as speech recognition, but focuses on the speaker properties, not the sound structure. Speaker recognition is used for secure services, but also to verify a speaker\'s identity in forensic phonetics.

Although within the forensic phonetics area of speaker recognition by listeners much research has been undertaken in the field of voice line-ups - the aural equivalent of a visual line-up, the police identification parade - for example, the effects of voice sample duration, retention interval, vowel variety on voice line-ups, and the difference in recognition between men and women, the question of how voice imitation can affect the accuracy of a voice line-up has received little attention. This study asks the question whether an imitated voice can affect the accuracy of voice identification in the voice line-up.

For the investigation a recording of a well-known Swedish politician and an imitation of the same speech by a professional imitator were used. In addition to these two recordings eight other voices were used in the line-ups - the natural voice of the imitator, three amateur imitations and four further male voices. The text was the same in all cases.

The voice line-up is used in forensic phonetics to assess whether the victim is able to identifythe criminal in a set of different voices. It has been demonstrated that a high-qualityimitation of a voice can lead to the misidentification of the person whose voice has beenimitated by the victim - the criminal succeeds in deflecting suspicion from themselves. The question examined in this paper is whether there is a change in the success level achieved bythe imitator after greater exposure to the high-quality professional imitation. When theimitated voice was present no significant change was detected after repeated exposure to the imitation. This was also the case when the imitated voice was absent, though there was aninteresting change of voice selection in two of the sixteen line-ups.

