This is my data:
A1 <- c(1,2,9,6,4)
A2 <- c(5,1,9,2,3)
A3 <- c(1,2,3,4,5)
B1 <- c(2,4,6,8,10)
B2 <- c(0,3,6,9,12)
B3 <- c(1,1,2,8,7)
DF2 <- data.frame(
x = c(c(A1, A2, A3), c(B1, B2, B3)),
y = rep(c("A", "B"), each = 15),
z = rep(rep(1:3, each=5), 2),
stringsAsFactors = FALSE
)
I'm creating grouped boxplots based on two variables (y and z) using ggplot2. When I use only the fill aesthetic, the plot correctly groups boxes by y, with colors representing z, as intended:
ggplot(DF2, aes(x = y, y = x, fill = factor(z))) +
geom_boxplot()
However, when I add the group = factor(z) aesthetic along with fill, the resulting plot changes drastically and doesn't reflect my expected grouping:
ggplot(DF2, aes(x = y, y = x, group = factor(z), fill = factor(z))) +
geom_boxplot()
My specific questions are:
Why does adding group = factor(z) together with fill = factor(z) alter the visualization significantly? When is it appropriate to use both group and fill aesthetics simultaneously?
This is my data:
A1 <- c(1,2,9,6,4)
A2 <- c(5,1,9,2,3)
A3 <- c(1,2,3,4,5)
B1 <- c(2,4,6,8,10)
B2 <- c(0,3,6,9,12)
B3 <- c(1,1,2,8,7)
DF2 <- data.frame(
x = c(c(A1, A2, A3), c(B1, B2, B3)),
y = rep(c("A", "B"), each = 15),
z = rep(rep(1:3, each=5), 2),
stringsAsFactors = FALSE
)
I'm creating grouped boxplots based on two variables (y and z) using ggplot2. When I use only the fill aesthetic, the plot correctly groups boxes by y, with colors representing z, as intended:
ggplot(DF2, aes(x = y, y = x, fill = factor(z))) +
geom_boxplot()
However, when I add the group = factor(z) aesthetic along with fill, the resulting plot changes drastically and doesn't reflect my expected grouping:
ggplot(DF2, aes(x = y, y = x, group = factor(z), fill = factor(z))) +
geom_boxplot()
My specific questions are:
Why does adding group = factor(z) together with fill = factor(z) alter the visualization significantly? When is it appropriate to use both group and fill aesthetics simultaneously?
By default, ggplot
sets the grouping as the interaction of the variables in your plot (see the documentation for details).
You can better understand what is happening by reconstructing some of the examples from that page with your sample data.
In short, when you let ggplot
guess the groupings it is using interaction(y, z)
. When you override that you end up with the unexpected result because you've removed the interaction of factor(z)
with y
.
library(ggplot2)
# Plot without mapping `z` variable
ggplot(DF2, aes(x = y, y = x)) +
geom_boxplot()
# Group boxplots by `factor(z)`
ggplot(DF2, aes(x = y, y = x, group = factor(z))) +
geom_boxplot()
# Group by the interaction of both discrete variables, `y` and `z`
ggplot(DF2, aes(x = y, y = x, group = interaction(y, z))) +
geom_boxplot()
# Compare overriding the group aesthetic with ggplot's default
library(patchwork)
(ggplot(DF2, aes(x = y, y = x, fill = factor(z))) +
geom_boxplot()) +
(ggplot(DF2, aes(x = y, y = x, fill = factor(z), group = interaction(y, z))) +
geom_boxplot())