如何在进入交互式shell之前初始化Spark中的变量/函数?(How to initialize variable/function in Spark before entering in interactive shell?)

是否可以使用pyspark初始化一些变量x并定义一些函数f(q),它进入交互式shell 之前使用x(并返回一个RDD)? 我想让shell中的另一个用户访问此函数f(q),但我不想向他公开x变量。 可能的解决方案是将此函数附加到spark上下文变量吗? 如果不可能,那怎么可能呢?

Is it possible with pyspark to initialize some variable x and define some function f(q) that makes use of x (and returns an RDD) before entering interactive shell? I want to give access to another user in the shell to this function f(q) but I don't want to expose x variable to him. Would a possible solution be to attach this function to the spark context variable? If not possible, how could one do that?

最满意答案

这是完全可能的,但它不会达到预期的目的。 例如,您可以使用修改后的shell脚本,并通过使用本机扩展来进一步模糊数据,但它只会保护您免受意外暴露。

只要您让用户访问功能齐全的Python环境,他们就会检查现有对象,分析闭包,访问源或调用调试器。 因此,如果假设恶意,这根本就不是可行的方法。 这只是冰山一角。 可以直接访问Spark shell的用户可以在集群上执行任意命令,实际上仅受授予Spark用户的权限的限制。

It is perfectly possible but it won't serve the intended purpose. You could for example use modified shell script and further obfuscate data by using native extensions but it will protect you only from accidental exposure.

As long you give the user access to the fully functional Python environment, they inspect existing objects, analyze closures, access the source or invoke debugger. So if assume malicious intentions this is simply not the way to go. And this is only the tip of the iceberg. User that have direct access to the Spark shell can execute arbitrary commands on the cluster, effectively limited only by the permissions granted to Spark user.

更多推荐