When we speak, we tend to speak in ‘chunks’ – small groups of words. Each chunk has its own change in intonation and there is usually a brief pause before the next chunk. We call these chunks ‘tone units’ and these units appears to be one of the basic building blocks of speech. A tone unit is the minimum unit of speech which can carry intonation. It is typically a few words in length but can be a single word or syllable. Other terms for this unit are ‘intonational unit’ or ‘foot group’. 

Worked example of L2 tone unit division

Here is a stretch of speech spoken by a Chinese student of English who was making an oral presentation in front of the class in an undergraduate module in the UK. You can listen to the audio here.

  • 15: er what is more  er these two words together  er is always ass- associated with images that a sturdy steed er gallop on the grasslands

We can analyse this speech into tone units. Phonology software such as SpeechAnalyzercan help us to segment the speech into tone units. We will use the | piping character to signal the boundaries of each tone unit.

1. | er what is more |
2. | er these two |
3. | words |
4. | together
5. | er |
6. | is |
7. | always ass- |
8. | associated with |
9. | images |
10. | that a sturdy steed |
11. | er gallop on the |
12. | grasslands |

Note how there are some odd parts to this. We might expect tone units 2 and 3 to form one unit together ‘er these two words’ but in fact the student separates them into two tone units. We can perceive this by the short micro pause after ‘two’ and the extra intonation change on ‘words’. In effect ‘er these two’ is under one intonational contour, and ‘words’ is under a separate intonational contour. This has the effect of separating the semantic content. This may seem like just a small feature but might act to miscue the listener.

Notice also how 5, 6 and 7 are separate tone units even though we might assume that they should be together. ‘er’ is a hesitation marker but because it comes under its own intonational contour it forms its own tone unit. ‘is’ is separated from ‘always ass-‘ due to the slight pause and the separate intonational contours. Notice also how the hesitation in unit 7 ‘ass-‘ means that the following words ‘associated with’ are separated from ‘always’. This type of hesitations causes anomalous pauses and junctures which only serve the exacerbate the listening task. The hesitation here may have been due to the polysyllabic nature of the word ‘associated’. 

Nativelike (L1) Production

If a native speaker, or expert presenter, was to produce this speech, they may segment the discourse using longer tone unit, with each tone unit representing  a group of words that form complete semantic units, such as like this:

| what is more |
| these two words together |
| is always  associated with images |
| that a sturdy steed |
| gallop on the grasslands |

We can assume that the listener will find this version more intelligible because the tone units form natural semantic units. With the non-native version, the tone units are less natural and the miscues accumulate leading to a lack of intelligibility.