Welcome to new things

[Technical] [Electronic work] [Gadget] [Game] memo writing

Tableau is recommended over Python for easy cluster analysis

Tableau not only displays graphs, but also provides cluster analysis capabilities.

Python is free, and it is easy enough to perform cluster analysis with Python, but when trying to perform cluster analysis in Python, work such as standardizing the data as a preliminary step, drawing a graph of the elbow method, and considering the number of clusters is required. In addition, when replacing variables or changing formulas that use variables to variables, the code must be modified each time, which can be a bit time-consuming.

In such cases, Tableau allows you to perform trial-and-error analysis via GUI and automatically performs troublesome adjustments, making it easier to perform cluster analysis.

Below are notes on how to use Tableau to perform common cluster analysis tasks.

手軽にクラスター分析するならPythonよりTableauがオススメ
クラスター分析イメージ

data preparation

To see how the data could be clustered, we used the following nutritional data from the McDonald's menu.

www.kaggle.com

Nutritional components are "calories," "cholesterol," "protein," "carbohydrates," "fat," "salt," and "fiber," converted to grams held per gram.

Scatterplot Matrix for Zachary Correlations

First, we draw scatter plots and trend lines for each of the two variables to get a rough idea of the overall distribution situation.

Nutritional ingredients are thrown into "columns" and "rows"

I recommend Tableau over Python for easy cluster analysis

A matrix of plots of total nutrient values can be made.

This is made into a scatter plot.

Uncheck [Analyze]-[Aggregate Measure

Also, drop "Category" and "Menu" into the "Label" of "Mark" so that each plot shows what the menu is.

I recommend Tableau over Python for easy cluster analysis

  • I see a group of menus with high cholesterol.
  • I see a lot of high protein on the menu.

and so on, and somehow look at the overall distribution situation.

Next, let's add a trend line to the scatterplot.

From the Analytics pane on the left, drop [Model]-[Trend Line] into the graph

Visualizing correlation coefficients is tedious, so "confidence intervals" are used instead.

[Analysis]-[Trendlines]-[Edit all trendlines]>check "Show confidence intervals

I recommend Tableau over Python for easy cluster analysis

The middle line is the linear correlation function, and the upper and lower lines show the confidence interval. The narrower the confidence interval, the higher the correlation.

  • I guess fat and carbohydrates are high in calories.
  • I wonder if salt is a flavor enhancer, so it has little relevance to other nutrients.

I see a correlation somehow, such as

Cluster analysis.

Next, let's look at the main mission, cluster analysis.

Before doing so, uncheck [Analysis]-[Trend Lines]-[Show All Trend Lines] and remove the trend lines that were just displayed because they are in the way.

From the Analytics pane on the left, drop [Model]-[Cluster] into the graph

The "Cluster" dialog will appear and you can specify variables to be used in the cluster analysis. We will not use "Category" and "Menu" here, so drop them.

I recommend Tableau over Python for easy cluster analysis

The cluster will then be created. That's it!

It also does the normalization automatically and makes the number of clusters look good. (This time Tableau suggested 5 clusters).

As will be described later, changes in variables and the number of clusters can also be made via GUI, allowing for easy trial and error until a satisfactory classification is achieved.

Guess the name of the cluster

It is the job of the human to decide what each cluster means, so we do that.

Estimated at median

The center point of a cluster can be seen in [Cluster Description] under "Color".

I recommend Tableau over Python for easy cluster analysis

I honestly don't know....

Guess by distribution

When you mouse over a point on the scatter diagram you just created, the "category" and "menu name" for that point will appear, so you can see and guess what exactly the menu is.

  • Blue cluster menu items with high cholesterol are "Egg McMuffins" etc., so I wonder if the blue cluster is "egg-based"?

I recommend Tableau over Python for easy cluster analysis

  • The green cluster on the high calorie/high carb outlier is "chocolate chip cookies" etc., so I guess the green cluster is "flour-based"?

I recommend Tableau over Python for easy cluster analysis

The others overlap each other, so it seems difficult to guess from the scatterplot....

Inferred from other graphs

With the clusters you create, you can color-code the other graphs, so you can infer from them.

Drop the "cluster" of "color" into the "dimension" of the data on the left

Then the clusters become dimensions, so all that is left is to drop the clusters of dimensions into "color" in the graph you wish to apply.

Example: Cluster distribution by category

I recommend Tableau over Python for easy cluster analysis

Example: Detailed menu list of clusters

I recommend Tableau over Python for easy cluster analysis

From these, it follows that

  • Is the orange cluster "meaty"?
  • Are the red clusters "fried"?
  • Is the light blue cluster "vegetable and dairy-based"?

I guessed.

Expansion to other graphs

Now that the cluster names have been determined, we will apply the cluster analysis results by dropping "cluster" on the "color" of other graphs, as above.

For example, the menu classification results for McDonald's are shown below.

I recommend Tableau over Python for easy cluster analysis

Adjustment of clusters

Changing cluster variables and changing the number of clusters is done in the Cluster dialog.

The Cluster dialog is displayed under "Edit Cluster" under "Cluster" under "Color" in the scatterplot that was initially created.

I recommend Tableau over Python for easy cluster analysis

I think that the cluster will not be determined by one shot from the beginning, and it will take many trials and errors here.

Therefore, if you want to do cluster analysis easily, I recommended Tableau, which is easier to try than Python.

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com

www.ekwbtblog.com