R / ggplot2使用多列的非平凡聚合函数(R/ggplot2 non-trivial aggregation function using multiple columns)

我想ggplot(R)基于表的多个数字列的计算与所述表的某个分类列(这也是“分组依据”)的聚合值的条形图。

DF:

V1 V2 categorical 1 1 c1 2 1 c2 1 3 c2 2 3 c3

我对我的有效聚合函数感兴趣:

sum(V1 * V2) / sum(V2)

我试过这个:

ggplot(df, aes(x = categorical)) + stat_summary_bin(aes(y = V1 * V2), fun.args = list(d = df$V2), fun.y = function(y, d) sum(y) / sum(d), geom = "bar")

但价值低于预期。 我想要的结果是c1:1,c2:1.25,c3:2,但实际结果是:

I would like to ggplot(R) a bar graph of aggregated values based on the computation of multiple numeric columns of a table vs. some categorical column (this is also the "group by") of said table.

df:

V1 V2 categorical 1 1 c1 2 1 c2 1 3 c2 2 3 c3

I am interested in my effective aggregate function to be:

sum(V1 * V2) / sum(V2)

I attempted this:

ggplot(df, aes(x = categorical)) + stat_summary_bin(aes(y = V1 * V2), fun.args = list(d = df$V2), fun.y = function(y, d) sum(y) / sum(d), geom = "bar")

but the values resulted lower than expected. My desired result is c1: 1, c2: 1.25, c3: 2 but the actual result is:

最满意答案

创建所需绘图的最佳方法是在调用ggplot之前手动计算所需的统计数据。 以下是使用tidyverse工具的代码:

library(tidyverse) df %>% group_by(categorical) %>% summarise(stat = sum(V1 * V2) / sum(V2)) %>% ggplot(aes(categorical, stat)) + geom_bar(stat = "identity")

备注

使用stat = "identity" geom_bar不执行任何计算,只绘制预先计算的值。 它专为像你这样的情况而设计。

在c2输出应该是1.25,我猜。

The best way to create the desired plot is to compute the desired statistics manually before calling ggplot. Here is the code using tidyverse tools:

library(tidyverse) df %>% group_by(categorical) %>% summarise(stat = sum(V1 * V2) / sum(V2)) %>% ggplot(aes(categorical, stat)) + geom_bar(stat = "identity")

Notes:

With stat = "identity" geom_bar doesn't perform any computation and just plots the precomputed values. It was designed specifically for the kind of situations like yours.

At c2 output should be 1.25, I presume.

更多推荐