Here you can find the dataset that I will be working with, where you can find the dataset (with the license information), the attributes and characteristics of the attributes, as well as the reasons behind why I chose this dataset.
| Attributes | Characteristic |
|---|---|
| Type | Categorical |
| Fixed Acidity | Numerical |
| Volatile Acidity | Numerical |
| Citric Acid | Numerical |
| Residual Sugar | Numerical |
| Chlorides | Numerical |
| Free Sulfur Dioxide | Numerical |
| Total Sulfur Dioxide | Numerical |
| Density | Numerical |
| pH | Numerical |
| Sulphates | Numerical |
| Alcohol | Numerical |
| Quality | Numerical |
The orginal dataset was two csv files (although separated by ";") split by red and white wines. You can find the dataset and more information at this link. I used R to combine these two files and rename the columns, as well as add in a column to show whether it was red or white wine. Below is an snapshot of the dataset:
| fixedAcidity | volatileAcidity | citricAcid | residualSugar | chlorides | freeSulfurDioxide | totalSulfurDioxide | density | pH | sulphates | alcohol | quality | type |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 7.0 | 0.270 | 0.36 | 20.70 | 0.045 | 45.0 | 170.0 | 1.00100 | 3.00 | 0.45 | 8.8 | 6 | white |
| 6.3 | 0.300 | 0.34 | 1.60 | 0.049 | 14.0 | 132.0 | 0.99400 | 3.30 | 0.49 | 9.5 | 6 | white |
| 8.1 | 0.280 | 0.40 | 6.90 | 0.050 | 30.0 | 97.0 | 0.99510 | 3.26 | 0.44 | 10.1 | 6 | white |
| 7.8 | 0.760 | 0.04 | 2.30 | 0.092 | 15.0 | 54.0 | 0.99700 | 3.26 | 0.65 | 9.8 | 5 | red |
| 7.8 | 0.880 | 0.00 | 2.60 | 0.098 | 25.0 | 67.0 | 0.99680 | 3.20 | 0.68 | 9.8 | 5 | red |
| 7.4 | 0.700 | 0.00 | 1.90 | 0.076 | 11.0 | 34.0 | 0.99780 | 3.51 | 0.56 | 9.4 | 5 | red |
You can view the entire dataset as well as the original datasets (split between red and white wine) in my repository here.
The main reason I chose this dataset is because I LOVE WINE. Aside from that, I thought it would be interesting to see the different attributes of the different types of wine and how it correlates with how people like each wine. I wanted to see if there is a difference between red and white wines. I think that there are a lot of components to make a good wine and visualizing this will help to understand it more.
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.