It is relatively simple to measure speech rate and pause length in spoken output. Consider this audio sample here:

Download the sample to your laptop and then start the WASP app in your browser:

First, upload the audio sample to WASP using this button:

Second, mark the left and right borders using the left and right mouse buttons. Then zoom in using the Zoom In button:

Then accurately mark the end of speech and the start of the next utterance. Remember to exclude er and erm from these:

Now the pause length is given in the bottom left corner. In this case it is l=502ms which equals 0.5 seconds approx.

Mark this in the transcript:

so (erm 0.5) I arrived in Rome

(You can also use Audacity to measure pauses or other software).

Speech Rate

Let’s measure the speech rate of two parts:

so (erm 0.5) I arrived (0.3) in (0.2) Rome (1.1)

6 syllables /4.4 seconds = 82 spm

I don’t know if you can get jet lag· going that way 
but I was certainly very very tired 
having spent forty odd hours on a plane 
or waiting in airports (er 1.2) and (0.7)

40 syllables /8.0 seconds = 300 spm

The speech rate of the second part is much faster than the first part. What does that tell us? It may point to planning. The speaker is planning the discourse in the first part which accounts for the slower delivery. Then in the second part the cognitive load is reduced and the speech can be delivered much more rapidly. This slow-fast delivery is seen throughout the extended turn. This hypothesis is speculative since we cannot really know why the speaker varied the speech rate, but cognitive load is a probable variable.

Slow delivery is also timed with topic changes so it may be that the speaker deliberately designs the speech for the audience showing them (through slow delivery) where topic changes are.

Here is the full transcript:

- so - I · arrived - in · Rome - - 
I don’t know if you can get jet lag· going that way 
but I was certainly very very tired · 
having spent forty odd hours on a plane or waiting in airports - - 

and - from somewhere where · it’s very very quiet 
to somewhere where the bustle and hustle of Rome and people speaking · 
a different language which I had learned · 
but of course you learn it from books and - written · thing 
and you don’t actually have to speak it to survive - - 

so I - - I arrived in Rome - - and - - - 
Rome airport is - not actually in Rome - 
I think it’s actually in a different country · to Rome 
because you have to spend about three or four hours on · buses 
and · trains planes in order to be · 
whereas to actually to get into · something 
that - most people would recognise as being Rome - - 

So I f-f-finally managed to get myself on a on a - a bus - - 
and then I actually worked · had to work out work out how to pay 
which was not entirely obvious - 
as anyone that’s travelled on foreign buses 
they all have different system - 

and - - I didn’t actually work out until - 
a couple of months later when I was actually living in Florence 
how you actually pay for buses in Italy 
which is you buy a ticket before you get on the bus - 
and then you click it once you’re on the bus 
so I could have - 
you know if someone had checked 
I probably would have got - 
slammed in jail · 
having just arrived in the country - - 
(h)after an hour or so

Machine Speech

Here is a machine reading the same text above. Listen to the delivery in terms of speech and pauses. How does it differ from the (original) human delivery? Which sounds more natural?

Machine-read speech often sounds unnatural because of the flat, even delivery.