Saturday May 27th 2017




Principal Component Analysis with Rapid Miner

rapidI read Principal Component Analysis topic from variety of sources and found to be overly complicated. Eventually came across this resource, this provided an easy to do exercises in Rapid Miner on the Principal Component Analysis.

In part 1 they started off with a brief introduction into principal component analysis (PCA) and its application logic for a business analytics project. In this part, they will start with a real data set and use Rapidminer 5.0 to perform the PCA. Furthermore, for illustrative reasons, they will work with non-standardized or non-normalized data. In the next part they will standardize the data and explain why it may be important sometimes to do so.

The dataset includes information on ratings and nurtritional information on 77 breakfast cereals. There are a total of 15 variables, including 13 numerical parameters. The objective is to reduce this set of 13 numerical predictors to a much smaller list using PCA. The data comes from a publicly available statistical database and can also be downloaded below at the end of this article.

Read more