Trump word association cloud in Tableau

Presidential elections and Donald Trump is a hot topic of discussion in the US right now so wouldn’t it be interesting to visualize what words people are associating ‘trump’ with? There are only two publicly available large data sets that can be queried for this analysis: twitter and reddit comments. For this visualization, I decided to use big query because of the ability to query by SQL and ready availability of data in BigQuery. BigQuery is Google’s cloud based massively parallel database with SQL interface with response time in seconds. Reddit poster who goes by fhoffa has made the word association SQL available in the /r/bigquery subreddit. The query returns the list of words “trump” is associated with compared to a baseline ( in this case, the words “common”, and “but”). The next task was to modify the query for ‘trump’ specific analysis, export the results in CSV and start the Tableau magic.

How to create a word cloud in Tableau?

The list of steps required to create a word cloud in Tableau are demonstrated in the attached animated GIF.

BigQuery SQL

The following SQL gets the frequency count of words appearing when someone mentions “trump” on Reddit.

SELECT a.word, b.word, c, ratio
SELECT a.word, b.word, c, ratio, RANK() OVER(PARTITION BY a.word ORDER BY c DESC) rank
SELECT a.word, b.word, COUNT(*) c, RATIO_TO_REPORT(c) OVER(PARTITION BY b.word) ratio
SELECT word, id
FROM [fh-bigquery:reddit_comments.2016_01] a
CROSS JOIN (SELECT word FROM (SELECT ‘trump’ word) # ***** REPLACE ‘WORD’ here!!!! ****
,(SELECT ‘common’ word),(SELECT ‘but’ word)) b
WHERE author NOT IN (‘AutoModerator’)
AND subreddit NOT IN (‘leagueoflegends’)
SELECT word, id FROM (
SELECT SPLIT(LOWER(REGEXP_REPLACE(body, r'[\-/!\?\.\”,*:()\[\]|\n]’, ‘ ‘)), ‘ ‘) word, id
FROM [fh-bigquery:reddit_comments.2016_01]
WHERE REGEXP_MATCH(LOWER(body), ‘but|common|when’)
AND NOT word IN (‘but’,’and’,’that’)
) b
WHERE a.word!=b.word
WHERE ratio BETWEEN 0.15 AND 0.95
AND a.word NOT IN (‘common’,’but’) AND b.word NOT IN (‘common’,’but’)
WHERE rank<120
ORDER BY a.word, c DESC

Recent Posts

Will Marketplaces Disrupt the Data Analytics Industry?

Few weeks ago, I came across Rocketgraph. This is a new platform that offers custom reports based on cloud data sources. While the concept is not new, what sets this company apart is the reports & dashboards are sold to users in a marketplace. The platform brings the analytics buyers and sellers together and provides the infrastructure. For years, many vendors have promised custom out-of-the-box solutions. In a majority of cases, most businesses require significant customizations. Will a marketplace approach to analytics offer an intermediate solution with significant time & cost savings? I interviewed Rocketgraph co-founder Constantine Nikitiadis to found out. Take a listen.

The Limitations of Data and Benchmarks

Data visualization blogosphere is filled with great ideas and inspiration. What is missing is the candid conversations about the limitations of data. Unfortunately, finding quality content on this topic is like finding a needle in a haystack. So, when one of the greatest thought leaders in SaaS data world wrote on this topic, I feel obligated to share it with you. Here is Tomasz Tunguz on the limitations of data.

The Myth of Self-Service Analytics

Self-service has been a buzzword in the analytics industry for the last few years. While the self-service movement has been instrumental in bringing about rapid decision making and empowering business users get answers to their data questions, one has to be aware of the key skills still required. Stephen Few highlights this important foundation of building a data-driven culture.

5 Data-Driven Email Newsletters You Should Subscribe To

Subscribing to email newsletters written by experts on growth and analytics is a great way to learn. Here are five newsletters that stand out from the rest. Written by entrepreneurs, data scientists, growth marketers and venture capitalists, each one offers unique insight into the process of using data to make better decisions and build a better company.