用于公式的eval解析的包装函数(Wrapper function for eval parse for use in formulas)

我有一个函数输入data.frame并输出它的残差版本和一些选择的变量作为预测器。

residuals.DF = function(data, resid.var, suffix="") { lm_f = function(x) { x = residuals(lm(data=data, formula= x ~ eval(parse(text=resid.var)))) } resid = data.frame(apply(data,2,lm_f)) colnames(resid) = paste0(colnames(data),suffix) return(resid) } set.seed(31233) df = data.frame(Age = c(1,3,6,7,3,8,4,3,2,6), Var1 = c(19,45,76,34,83,34,85,34,27,32), Var2 = round(rnorm(10)*100)) df.res = residuals.DF(df, "Age", ".test") df.res Age.test Var1.test Var2.test 1 -1.696753e-17 -25.1351351 -90.20582 2 -1.318443e-19 -0.8108108 31.91892 3 -5.397735e-18 27.6756757 84.10603 4 -5.927747e-18 -15.1621622 -105.83160 5 -3.807699e-18 37.1891892 -57.08108 6 -6.457759e-18 -16.0000000 -25.76923 7 5.117344e-17 38.3513514 -65.01871 8 -3.807699e-18 -11.8108108 35.91892 9 -3.277687e-18 -17.9729730 97.85655 10 -5.397735e-18 -16.3243243 94.10603

这工作正常,但是,在处理lm()的变量输入时,我经常需要使用eval parse组合,所以我决定编写一个包装函数:

#Wrapper function for convenience for evaluating strings evalparse = function(string) { eval(parse(text=string)) }

单独使用时效果很好,例如:

> evalparse("5+5") [1] 10

但是,如果在上面的函数中使用它,可以得到:

> df.res = residuals.DF(df, "Age", ".test") Error in eval(expr, envir, enclos) : object 'Age' not found

我认为这是因为包装函数意味着字符串在其自己的环境中进行评估,其中缺少所选变量。 使用eval parse组合时不会发生这种情况,因为它会在lm()环境中发生,其中所选变量不会丢失。

这个问题有一些聪明的解决方案吗? 在lm()中使用动态公式的更好方法是什么? 否则我将不得不继续输入eval(parse(text = object))

I have a function that inputs a data.frame and outputs the residual version of it with some chosen variable as predictor.

residuals.DF = function(data, resid.var, suffix="") { lm_f = function(x) { x = residuals(lm(data=data, formula= x ~ eval(parse(text=resid.var)))) } resid = data.frame(apply(data,2,lm_f)) colnames(resid) = paste0(colnames(data),suffix) return(resid) } set.seed(31233) df = data.frame(Age = c(1,3,6,7,3,8,4,3,2,6), Var1 = c(19,45,76,34,83,34,85,34,27,32), Var2 = round(rnorm(10)*100)) df.res = residuals.DF(df, "Age", ".test") df.res Age.test Var1.test Var2.test 1 -1.696753e-17 -25.1351351 -90.20582 2 -1.318443e-19 -0.8108108 31.91892 3 -5.397735e-18 27.6756757 84.10603 4 -5.927747e-18 -15.1621622 -105.83160 5 -3.807699e-18 37.1891892 -57.08108 6 -6.457759e-18 -16.0000000 -25.76923 7 5.117344e-17 38.3513514 -65.01871 8 -3.807699e-18 -11.8108108 35.91892 9 -3.277687e-18 -17.9729730 97.85655 10 -5.397735e-18 -16.3243243 94.10603

This works fine, however, I often need to use the eval parse combo when working with variable inputs to lm(), so I decided to write a wrapper function:

#Wrapper function for convenience for evaluating strings evalparse = function(string) { eval(parse(text=string)) }

This works fine when used alone, e.g.:

> evalparse("5+5") [1] 10

However, if one uses it in the above function, one gets:

> df.res = residuals.DF(df, "Age", ".test") Error in eval(expr, envir, enclos) : object 'Age' not found

I figure this is because the wrapper function means that the string gets evaluated in its own environment where the chosen variable is missing. This does not happen when using eval parse combo because it then happens in the lm() environment where the chosen variable is not missing.

Is there some clever solution to this problem? A better way of using dynamic formulas in lm()? Otherwise I will have to keep typing eval(parse(text=object)).

最满意答案

只要您尝试执行修改公式内容的操作,就应该使用update因为它是为此目的而设计的。

在您的情况下,您想要修改您的功能,如下所示:

residuals.DF = function(data, resid.var, suffix="") { lm_f = function(x) { x = residuals(lm(data=data, formula= update(x ~ 0, paste0("~",resid.var)))) } resid = data.frame(apply(data,2,lm_f)) colnames(resid) = paste0(colnames(data),suffix) return(resid) }

基本上, update (或者具体是update.formula方法 )将公式作为其第一个参数,然后允许基于其第二个参数进行修改。 要掌握它,请查看以下示例:

f <- y ~ x f # y ~ x update(f, ~ z) # y ~ z update(f, x ~ y) # x ~ y update(f, "~ x + y") # y ~ x + y update(f, ~ . + z + w) # y ~ x + z + w x <- "x" update(f, paste0("~",x)) # y ~ x

如您所见,第二个参数可以是包含一个或多个变量的公式或字符串。 这极大地简化了动态修改公式的创建,您只想尝试更改公式的一部分。

Anytime you're trying to perform operations that modify the contents of a formula, you should use update because it is designed for this purpose.

In your case, you want to modify your function as follows:

residuals.DF = function(data, resid.var, suffix="") { lm_f = function(x) { x = residuals(lm(data=data, formula= update(x ~ 0, paste0("~",resid.var)))) } resid = data.frame(apply(data,2,lm_f)) colnames(resid) = paste0(colnames(data),suffix) return(resid) }

Basically, update (or the update.formula method specifically) takes a formula as its first argument, and then allows for modifications based on its second argument. To get a handle on it, check out the following examples:

f <- y ~ x f # y ~ x update(f, ~ z) # y ~ z update(f, x ~ y) # x ~ y update(f, "~ x + y") # y ~ x + y update(f, ~ . + z + w) # y ~ x + z + w x <- "x" update(f, paste0("~",x)) # y ~ x

As you can see, the second argument can be a formula or character string containing one or more variables. This greatly simplifies the creation of a dynamically modified formula where you are only trying to change one part of the formula.

更多推荐