Chapter 9 Kaleidoscope

9.1 Converting a sound file to a wordcloud

Would it not be great to visualise the summary of a speech or lecture? All that is needed are two R-packages and a few minutes of patience. Thanks to my work I found this website when I googled for how to make sound files from seismic records (by th ewqy, another chapter):

Following this vignette we end up with a script like this one.

## load libraries

## set key and input data
API_KEY <- "61c3d762-3fdd-445e-841a-baa460f2b26c"
WAV_DIR <- "~/Desktop/SPEECH/input/"
CSV_LOCATION <- "~/Desktop/SPEECH/output/text_out.txt"

## send audio file to the web
sendAudioGetJobs(wav.dir = WAV_DIR,
                 api.key = API_KEY,
                 csv.location = CSV_LOCATION)

## be patient, actually much more than the 60 s suggest

## check/retrieve the transcibed text
x <- retrieveText(job.file = CSV_LOCATION,
                  api.key = API_KEY)

## make the wordcloud of the text
w <- wordcloud(words = x$TRANSCRIPT)
## Loading required package: tm
## Loading required package: NLP

Now let us pull this a bit apart. We need the libraries for sending and retrieving the words from speech recognition (library("transcribeR")) and for building a word cloud from a text file (library("wordcloud")).

transcribeR simply manages the task of sending a sound file in the *.wav-format to the HPE website that does the conversion and queries if the file is processes to return it then. In order to use transcribeR you need an account at the HPE website ( Then you will get the recuired API key.

Next, your .wav-files to be processed must be present in an input directory. Note, you don’t specify a file but a directory. So make sure the directory only contains what you wish to be transcribed. If you only have .mp3-files… Well, there are many websites that handle the conversion job, e.g., or whatever software on your computer.

Next, specify a text file, where the transcritpion output will be stored. Usually this will be an empty file. transcribeR will write some header information and then the transcribed words when you evaluate these two functions with a significant pause between them. Allow for as much time as the soundfile plays for upload, transscription, post-processing and so on.

Finally, the wordcloud can be created from the transcription part, being isolated by x$TRANSCRIPT.

sendAudioGetJobs(wav.dir = WAV_DIR,
                 api.key = API_KEY,
                 interval = "-1",
                 encode = "multipart",
                 existing.csv = NULL,
                 csv.location = CSV_LOCATION,
                 language = "en-US",
                 verbose = TRUE)

x <- retrieveText(job.file = CSV_LOCATION,
             api.key = API_KEY)

The function sendAudioGetJobs() also allows other than American English to be transcribed. Use printLanguages() to see the supported languages. Reading the documentation for usage of the argument interval might also be worth the time. The wordcloud can also be modified, e.g., in terms of colours, the minimum count of a word to be included, the overall number of words, etc. By the way, I used the brilliant speech of George W. Bush (wow I made five typos while typing this name) about the ultimatum to Saddam Hussein for this wordcloud.

Now, the rest is imagination. For example writing a wrapper that submits snippets of speech in 30 s slices and builds a wordcloud as a speech is being held. Or summarising a lecture of 90 minutes in just a few words. Or simply map the essence of a card game evening with friends – given they all speek clear enough, in a language supported by transcribeR.