Word clouds are a popular way of “summarizing” a large repository or corpus of text. In this tutorial we will be taking Winston Churchill’s speech “We Shall Fight on the Beaches”, and create the word cloud pictured below.

End result from this tutorial

Creating the Word Cloud

To create the word cloud, we will be using two libraries; Matplotlib and Wordcloud. If these two libraries are not installed in your environment, go ahead and install them using for example pip:

pip install matplotlib
pip install wordcloud

Now, in the IDE of your choice, we are going to import both of the libraries:

import matplotlib.pyplot as plt
from wordcloud import WordCloud, STOPWORDS

The first thing we are doing is importing pyplot from Matplotlib and using the alias plt. Next, we imported two methods from the wordcloud library; WordCloud and Stopwords.

If you are new to natural language processing, you may be asking “what are stop words?”. Stop words are common words in a language that little to no value. Examples of stop words are words like the, is, at, which, and on. These words occur often and would therefor make up most of our word cloud.

Next we will define a function that creates the word cloud when it is called with the speech as a parameter.

def show_wordcloud(text):
    wordcloud = WordCloud(
        background_color='white',
        stopwords=set(STOPWORDS),
        max_words=200,
        max_font_size=40, 
        scale=3,
        random_state=1
    ).generate(str(text))

    fig = plt.figure(1, figsize=(20, 12))
    plt.axis('off')
    plt.imshow(wordcloud)
    plt.show()

Let’s walk through the function we just created.

First we initialize the WordCloud class from the library into a variable named wordcloud. Here we set the background color to white. We are using the stop words that came with the word cloud library. We are setting the maximum word count to 200 and the font size to 40, and scale and random_state to 3 and 1 respectively. Random state is simply a seed for the randomness of the word cloud.

Next we initialize a figure from the Matplotlib library and we set the size of the figure to 20 wide and 12 tall. We are turning off the axis, assigning the wordcloud to the figure and then printing the word cloud to the screen using the plt.show() command.

Next we need to call the function.

speech = "The speech goes here..."

show_wordcloud(speech)

You should now have successfully created a word cloud. Congratulations!

If you enjoyed this tutorial, then check out some of my other ones here.

I have half a decade of experience working with data science and data engineering in a variety of fields both professionally and in academia. I ahve demonstrated advanced skills in developing machine learning algorithms, econometric models, intuitive visualizations and reporting dashboards in order to communicate data and technical terminology in an easy to understand manner for clients of varying backgrounds.

Write A Comment