This introduces, however, some duplication in our code. Suppose you want to change the y axis to show displacement instead of highway. You would need to change the variable in two places and you might forget to update one. You can avoid this kind of repetition by passing a set of mappings to ggplot (). ggplot2 will treat these mappings as global mappings that apply to each geom on the graph. In other words, this code will produce the same graph as the previous code:
If you put mappings in a geom function, ggplot2 will treat them as local mappings for the layer. These assignments will be used to extend or overwrite the global mappings only for that layer. This allows to show different aesthetics in different layers.
The same idea can be used to specify different sets of data (data) for each layer. Here, our smooth line shows only a subset of the mileage data set: subcompact cars. The local data argument in geom_smooth () overrides the global data argument in ggplot () for that layer only.
(You’ll learn how filter () works in the next chapter – for now, just remember that this command selects subcompact cars.)
What geom would you use to generate a line graph? And for a box plot? And for a histogram? And for an area chart?
Run this code in your mind and predict what the output will look like. Then run the code in R and check your predictions.
What does show.legend = FALSE show? What happens if you remove it? Why do you think we used it earlier in the chapter?
Next, let’s take a look at a bar chart. Bar charts look simple, but they are interesting because they reveal something subtle about the charts. Consider a basic bar chart, like one made with geom_bar (). The graph below shows the total number of diamonds in the diamonds data set, grouped by the variable cut. The diamond dataset is located in the data package and contains information on ~ 54,000 diamonds, including price, carat, color, clarity, and cut for each. The graph shows that there are more diamonds available with high quality cuts than with low quality cuts.
On the x-axis, the graph shows cut, a diamond variable. On the y-axis it shows ‘count’ (count), but the count is not a variable in diamonds! Where does it come from? Many charts, such as scatterplots, plot the raw values of a data set. OR