Project Dataset

Here you can find the dataset that I will be working with, where you can find the dataset (with the license information), the attributes and characteristics of the attributes, as well as the reasons behind why I chose this dataset.

Wine Quality Dataset

Red and White variants of the Portuguese "Vinho Verde" wine

Type of Data

Attributes Characteristic
Type Categorical
Fixed Acidity Numerical
Volatile Acidity Numerical
Citric Acid Numerical
Residual Sugar Numerical
Chlorides Numerical
Free Sulfur Dioxide Numerical
Total Sulfur Dioxide Numerical
Density Numerical
pH Numerical
Sulphates Numerical
Alcohol Numerical
Quality Numerical

Original Data Set

The orginal dataset was two csv files (although separated by ";") split by red and white wines. You can find the dataset and more information at this link. I used R to combine these two files and rename the columns, as well as add in a column to show whether it was red or white wine. Below is an snapshot of the dataset:

fixedAcidity volatileAcidity citricAcid residualSugar chlorides freeSulfurDioxide totalSulfurDioxide density pH sulphates alcohol quality type
7.0 0.270 0.36 20.70 0.045 45.0 170.0 1.00100 3.00 0.45 8.8 6 white
6.3 0.300 0.34 1.60 0.049 14.0 132.0 0.99400 3.30 0.49 9.5 6 white
8.1 0.280 0.40 6.90 0.050 30.0 97.0 0.99510 3.26 0.44 10.1 6 white
7.8 0.760 0.04 2.30 0.092 15.0 54.0 0.99700 3.26 0.65 9.8 5 red
7.8 0.880 0.00 2.60 0.098 25.0 67.0 0.99680 3.20 0.68 9.8 5 red
7.4 0.700 0.00 1.90 0.076 11.0 34.0 0.99780 3.51 0.56 9.4 5 red

You can view the entire dataset as well as the original datasets (split between red and white wine) in my repository here.

Why I Chose this Dataset

The main reason I chose this dataset is because I LOVE WINE. Aside from that, I thought it would be interesting to see the different attributes of the different types of wine and how it correlates with how people like each wine. I wanted to see if there is a difference between red and white wines. I think that there are a lot of components to make a good wine and visualizing this will help to understand it more.

License Information

P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis.
Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009.