-
Stacked Horizontal Bar Chart
For this plot, I only used two of the columns in the data set, the type of wine and the quality. I created a new object that mapped the number of wine samples for each quality score by the type of wine. You can see that each bar indicates a different quality score and there are two colors stacked for the different types of wine. Originally, I had this chart as a vertical stacked bar chart. I thought that it would be easiest to see the data this way because I had the dashboard set up with each svg on top of each other. When I was working with the different sizes, I realized it would be better to put the bar chart next to the scatterplot. This was a little too narrow to have vertical bars which is why I decided to put it horizontal.
For the evaluation of the data density, lie factor, and data-ink ratio, I believe this plot is very strong in all three of them. The lie factor is probably right around 1. There is no 3D visualizations in this and the size of the bars is exactly the size of the chart. The size of the chart makes the data density very high and I have only shown the axes, and the bars to make the data-ink ratio high as well. I don't believe anything could be changed in this plot without losing information. Instead of adding grid lines, I have a tooltip to show the amount of wine samples in each bar.
This plot is a great overview of the data. I wanted to be able to see the distribution of the quality scores before looking at the details of the other columns in the dataset. I think this highlights why we will see a lot of overlap with the many of the other plots because a lot of the wines fall into the 5, 6, and 7 quality scores.
-
Scatterplot Matrix
For the scatter plot matrix, I used all of the columns in the dataset. Initially, I thought that I would choose the most interesting metrics and only plot those, but because I decided to allow the user to choose which variables they wanted to see, I ended up including everything. With the five drop down menus, I create a list of the five variables selected. With these, I use the dataset to fill in each of the plots. To be able to keep coloring consistent, I had the colors reflect the red and white wine types. I thought about having the option to have color by type or quality score but I thought that that might get a little too messy with 10 different quality scores. From this, I chose to not include grid lines to eliminate clutter, and chose to have the y axis on the right hand side so that there wasnt a huge break between the two plots. I wanted to keep both of the axes on either end. Since there were too many points on each of the plots, I had them have opacity in order to be able to see some of the overlap.
The data-ink ratio is very high on this chart but I think that the other two metrics, data density, and lie factor are lower in this chart than I would like them to be. For the data-ink ratio, I think I included the necessary components without losing too much information. As I stated above, I excluded the grid lines to not show too much. Instead, I have a tooltip for when you hover over the circle the actual values for that plot show. For the data density, I think the issue lies in the outliers of each of the plots. There are a ton of points that fall on top of each other and into one corner for some of the plots. This is because there is only one or two that lie outside of this cluster causing the max and min to be higher or lower than we want. I do not know how to avoid this because excluding those points would add to my lie factor. For the lie factor I think because there is so much overlap, it is actually hard to see how many points lie in a given area, decreasing my lie factor. Also, the size of the points could make it easier to see all of the point but then I think it would be hard to even see the points. THe opacity helps a little bit but when more than a couple points lie on top of each other, it is difficult to see how many points there actually are.
Although this plot could definitely use some improvements to make it easier to see the details, I think it is very good at showing the relationships between all of the variables. It shows us certain outliers and with the tooltips we can see what quality score those outliers are. This also helps us to interpret what we are trying to determine, what variables reflect the type and score of a wine. If we can find correlations between variables, we can understand the chemical components behind it. I think I chose a good number of variables to chart against each other and with the drop down menus, you can see any variables you want to.
-
Parallel Coordinates
For my final visualization, I chose a subset of the columns. The ones that were not included, were because all of their values were close to each other, except for a couple outliers, which made the axes a lot less interesting to look at. Each column is an axes. I chose to include the quality score as well as the type of wine to make it easier to see where each of these are. Although the colors reflect the type of wine, I thought that including the axis would help a bit more. Because I have implemented panning and brushing on this plot, we can filter and focus on certain values. I have also enabled moving of the axes so that you can see certain variables next to each other. Rather than looking at the colors, the type axis will show where all the samples lie on another variable.
The evaluations for this plot are very similar to those in the scatterplot matrix. The data-ink ratio is high because I did not include any information that was not necessary. I have an appropriate amount of labels on each of the axes, as well as a title, and the lines. Everything on this plot is essential for understanding the plot. Unlike the scatterplot matrix, the data density is very high on the chart. We can still see a couple of the outliers, but most of the values range all over the axis for each variable. The ones that did not, were not included in this plot. Lastly, the lie factor is similar to the scatterplot matrix. There are so many data points that I think some of the values get lost behind the huge clusters of data. This makes it difficult to see the real distribution between the variables. The only way to see this better would be to minimize the data even more than I already have.
This visualization really excels at showing the differences between red and white wines. Although the quality scores are a bit all over the place, for many of these axes, we can see that the red and white wines lie in their own clusters. For example, on the Total Sulfur Dioxide axis we can see that majority of the white wines lie above 100 whereas the red wines lie below. With being able to move the axes, you can see the trends between the different wine samples. I think this is a great visualization for understanding components of red and white wines and how they differ.