Tableau not only displays graphs, but also provides cluster analysis capabilities.
Python is free, and it is easy enough to perform cluster analysis with Python, but when trying to perform cluster analysis in Python, work such as standardizing the data as a preliminary step, drawing a graph of the elbow method, and considering the number of clusters is required. In addition, when replacing variables or changing formulas that use variables to variables, the code must be modified each time, which can be a bit time-consuming.
In such cases, Tableau allows you to perform trial-and-error analysis via GUI and automatically performs troublesome adjustments, making it easier to perform cluster analysis.
Below are notes on how to use Tableau to perform common cluster analysis tasks.
data preparation
To see how the data could be clustered, we used the following nutritional data from the McDonald's menu.
Nutritional components are "calories," "cholesterol," "protein," "carbohydrates," "fat," "salt," and "fiber," converted to grams held per gram.
Scatterplot Matrix for Zachary Correlations
First, we draw scatter plots and trend lines for each of the two variables to get a rough idea of the overall distribution situation.
Nutritional ingredients are thrown into "columns" and "rows"
A matrix of plots of total nutrient values can be made.
This is made into a scatter plot.
Uncheck [Analyze]-[Aggregate Measure
Also, drop "Category" and "Menu" into the "Label" of "Mark" so that each plot shows what the menu is.
- I see a group of menus with high cholesterol.
- I see a lot of high protein on the menu.
and so on, and somehow look at the overall distribution situation.
Next, let's add a trend line to the scatterplot.
From the Analytics pane on the left, drop [Model]-[Trend Line] into the graph
Visualizing correlation coefficients is tedious, so "confidence intervals" are used instead.
[Analysis]-[Trendlines]-[Edit all trendlines]>check "Show confidence intervals
The middle line is the linear correlation function, and the upper and lower lines show the confidence interval. The narrower the confidence interval, the higher the correlation.
- I guess fat and carbohydrates are high in calories.
- I wonder if salt is a flavor enhancer, so it has little relevance to other nutrients.
I see a correlation somehow, such as
Cluster analysis.
Next, let's look at the main mission, cluster analysis.
Before doing so, uncheck [Analysis]-[Trend Lines]-[Show All Trend Lines]
and remove the trend lines that were just displayed because they are in the way.
From the Analytics pane on the left, drop [Model]-[Cluster] into the graph
The "Cluster" dialog will appear and you can specify variables to be used in the cluster analysis. We will not use "Category" and "Menu" here, so drop them.
The cluster will then be created. That's it!
It also does the normalization automatically and makes the number of clusters look good. (This time Tableau suggested 5 clusters).
As will be described later, changes in variables and the number of clusters can also be made via GUI, allowing for easy trial and error until a satisfactory classification is achieved.
Guess the name of the cluster
It is the job of the human to decide what each cluster means, so we do that.
Estimated at median
The center point of a cluster can be seen in [Cluster Description] under "Color".
I honestly don't know....
Guess by distribution
When you mouse over a point on the scatter diagram you just created, the "category" and "menu name" for that point will appear, so you can see and guess what exactly the menu is.
- Blue cluster menu items with high cholesterol are "Egg McMuffins" etc., so I wonder if the blue cluster is "egg-based"?
- The green cluster on the high calorie/high carb outlier is "chocolate chip cookies" etc., so I guess the green cluster is "flour-based"?
The others overlap each other, so it seems difficult to guess from the scatterplot....
Inferred from other graphs
With the clusters you create, you can color-code the other graphs, so you can infer from them.
Drop the "cluster" of "color" into the "dimension" of the data on the left
Then the clusters become dimensions, so all that is left is to drop the clusters of dimensions into "color" in the graph you wish to apply.
Example: Cluster distribution by category
Example: Detailed menu list of clusters
From these, it follows that
- Is the orange cluster "meaty"?
- Are the red clusters "fried"?
- Is the light blue cluster "vegetable and dairy-based"?
I guessed.
Expansion to other graphs
Now that the cluster names have been determined, we will apply the cluster analysis results by dropping "cluster" on the "color" of other graphs, as above.
For example, the menu classification results for McDonald's are shown below.
Adjustment of clusters
Changing cluster variables and changing the number of clusters is done in the Cluster dialog.
The Cluster dialog is displayed under "Edit Cluster" under "Cluster" under "Color" in the scatterplot that was initially created.
I think that the cluster will not be determined by one shot from the beginning, and it will take many trials and errors here.
Therefore, if you want to do cluster analysis easily, I recommended Tableau, which is easier to try than Python.