机器学习模型的衡量指标

In our previous article, we gave an in-depth review on how to explain biases in data. The next step in our fairness journey is to dig into how to detect biased machine learning models.

在上一篇文章中,我们对如何解释数据偏差进行了深入的回顾。 我们公平之旅的下一步是深入研究如何检测偏向机器学习模型。

However, before detecting (un)fairness in machine learning, we first need to be able to define it. But fairness is an equivocal notion — it can be expressed in various ways to reflect the specific circumstances of the use case or the ethical perspectives of the stakeholders. Consequently, there can’t be a consensus in research about what fairness in machine learning actually is.

但是,在检测机器学习中的(不)公平性之前,我们首先需要能够对其进行定义。 但是,公平是一个模棱两可的概念,可以用各种方式来表达,以反映用例的特定情况或利益相关者的道德观点。 因此,关于机器学习实际上是什么公平的研究尚无共识。

In this article, we will explain the main fairness definitions used in research and highlight their practical limitations. We will also underscore the fact that those definitions are mutually exclusive and that, consequently, there is no “one-size-fits-all” fairness definition to use.

在本文中,我们将解释研究中使用的主要公平性定义,并强调它们的实际局限性。 我们还将强调以下事实:这些定义是互斥的,因此,不存在使用“千篇一律”的公平定义。

记号 (Notations)

To simplify the exposition, we will consider a single protected attribute in a binary classification setting. This can be generalized to multiple protected attributes and all types of machine learning tasks.

为了简化说明,我们将在二进制分类设置中考虑单个受保护的属性。 可以将其概括为多个受保护的属性以及所有类型的机器学习任务。

Throughout the article, we will consider the identification of promising candidates for a job, using the following notations:

在整篇文章中,我们将使用以下符号考虑确定有前途的候选人:

  • 𝑋 ∈ Rᵈ: the features of each candidate (level of education, high school, previous work, total experience, and so on)

    𝑋∈Rᵈ:每个候选人的特征(学历,高中,以前的工作,总经验等)
  • 𝐴 ∈ {0; 1}: a binary indicator of the sensitive attribute

    𝐴∈{0; 1}:敏感属性的二进制指示符
  • 𝐶 = 𝑐(𝑋, 𝐴) ∈ {0; 1}: the classifier output (0 for rejected, 1 for accepted)

    𝐶=𝑐(𝑋,𝐴)∈{0; 1}:分类器输出(0表示拒绝,1表示接受)
  • 𝑌 ∈ {0; 1}: the target variable. Here, it is whether the candidate should be selected or not.

    𝑌∈{0; 1}:目标变量。 在这里,是应否选择候选人。
  • We denote by 𝐷, the distribution from which (𝑋, 𝐴, 𝑌) is sampled.

    我们用note表示,从中采样(𝑋,𝐴,𝑌)的分布。
  • We will also note 𝑃₀(𝑐) = 𝑃(𝑐|𝑎 = 0)

    我们还将注意到𝑃₀(𝑐)=𝑃(𝑐|𝑎= 0)

公平的许多定义 (The Many Definitions of Fairness)

意识不足(Unawareness)

It defines fairness as the absence of the protected attribute in the model features.

它将公平定义为模型特征中不存在受保护的属性。

Mathematically, the unawareness definition can be written as follows:

从数学上讲,无意识定义可以写成如下形式:

Because of this simplicity, and because implementing it only needs to remove the protected attribute from the data, it is an easy-to-use definition.

由于这种简单性,并且由于实现它仅需要从数据中删除受保护的属性,因此它是一个易于使用的定义。

Limitations. Unawareness has many flaws in practice, which make it a poor fairness definition overall. It is far too weak to prevent bias. As explained in our previous article, removing the protected attribute doesn’t guarantee that all the information concerning this attribute is removed from the data. Moreover, unaware correction methods can even be less performant when it comes to fairness improvement than aware methods.

局限性。 意识不足在实践中有很多缺陷,这使得总体上对公平的定义不佳。 它太弱了,无法防止偏差。 如我们上一篇文章中所述,删除protected属性并不能保证将与该属性有关的所有信息都从数据中删除。 此外,当涉及到公平性改进时,无意识的校正方法甚至可能比有意识的方法性能差。

人口平价 (Demographic Parity)

It stipulates that the predictions’ distribution should be identical across subpopulations.

它规定,预测的分布在各个子群体之间应相同。

Mathematically speaking, demographic parity can be defined by 𝐶 being independent from 𝐴:

从数学上讲,可以通过𝐶独立于𝐴来定义人口统计均等:

However, it is almost impossible to reach strict equality in practice. On top of that, the double condition on 𝐶 is not always necessary, as some applications require to only focus on the positive outcome (getting a job, being granted a loan, etc.). The following relaxed version of the demographic parity rule is used in practice:

但是,在实践中几乎不可能达到严格的平等。 最重要的是,on的双重条件并非总是必要的,因为某些应用程序只需要关注积极的结果(找工作,获得贷款等)。 在实践中使用了以下人口统计奇偶规则的宽松版本:

This relaxed version is called the “p%-rule” (p being a parameter), and was defined by Zafar, Valera, Rodriguez and Gummardi as a generalization of the 80% rule, also known as the “Four-Fifths Rule” of the U.S. Uniform Guidelines for Employee Selection Procedures.

此宽松版本称为“ p%规则”(p为参数),由Zafar,Valera,Rodriguez和Gummardi定义为80%规则的泛化,也称为“四分之五规则”。美国《员工选拔程序统一指南》。

In practice, one of this definition’s advantages is that implementing demographic parity may helps improve the professional image of the minority class in the long term. This improvement is due to the progressive settlement of a “positive feedback loop” and justifies the implementation of demographic parity policies in the short- to medium-term horizon.

在实践中,该定义的优点之一是,实现人口统计学上的平等可能有助于长期改善少数群体的职业形象。 这项改进归因于“正反馈回路”的逐步解决,并证明了在短期至中期范围内实施人口均等政策的合理性。

A second advantage is technical: as this fairness definition is independent from 𝑌 (the target variable), there is no need to have access to its data to measure and correct bias. This makes this method particularly suitable for applications when the target is hard to qualify (employment qualification, credit default, justice, etc.)

第二个优势是技术优势:由于这种公平性定义独立于𝑌 (目标变量),因此无需访问其数据即可测量和纠正偏差。 这使得该方法特别适用于难以达到目标(就业资格,信用违约,正义等)的应用。

Demographic parity (with laziness)
人口平价(懒惰)

Limitations. On the other hand, demographic parity has various flaws. First, it can be used in an inappropriate context, meaning where disproportionality is truly present and independent from a protected attribute, or from a proxy for the protected attribute. In our example use case, enforcing demographic parity would result in discrimination against some qualified candidates, which could be seen as unfair.

局限性。 另一方面,人口均等具有多种缺陷。 首先,它可以在不适当的上下文中使用,这意味着不成比例的情况确实存在并且独立于受保护的属性或受保护的属性的代理。 在我们的示例用例中,强制执行人口统计均等会导致歧视某些合格的候选人,这可能被认为是不公平的。

This first flaw underlines one of demographic parity’s problems: it concerns only the final outcome of the model but doesn’t focus on equality of treatment. This lack of fair treatment necessity is demographic parity’s second problem, which is called laziness. Nothing would prevent the use of a trained model to select candidates from the majority group, while candidates from the minority group were selected randomly with a coin toss — as long as the number of selected candidates from each group is valid.

这个第一个缺陷突出了人口均等问题:它仅涉及模型的最终结果,而不关注待遇平等。 缺乏公平对待的必要性是人口均等的第二个问题,即懒惰。 没有什么可以阻止使用经过训练的模型来从多数组中选择候选人的,而从少数组中选出的候选人是通过掷硬币来随机选择的-只要从每个组中选择的候选人的数量是有效的即可。

A third flaw is due to demographic parity’s independence from the target variable: in the case where the fractions of suitable candidates in both classes are not equal, which is mostly always the case, demographic parity rejects the optimal classifier 𝐶 = 𝑌.

第三个缺陷是由于人口奇偶校验与目标变量的独立性:在两种类别的合适候选人的分数不相等的情况下(多数情况通常如此),人口奇偶校验拒绝了最佳分类器𝐶=。

The last flaw is highlighted by Weisskopf: because demographic parity leads to affirmative action, it leads to recurrent criticism. Such criticism can contribute to undermining the reputation of the minority group in the long term.

魏斯科普夫(Weisskopf )强调了最后一个缺陷:由于人口均等会导致平权行动,因此会导致经常性的批评。 从长远来看,这种批评可能会损害少数群体的声誉。

机会均等,机会均等 (Equality of Odds, Equality of Opportunity)

As demographic parity’s main flaws are all linked to the inequality of treatment it introduces among subpopulations, two research groups came up with similar definitions of fairness which took into account how each group was treated: Equality of Odds and Disparate Mistreatment. We will use the Equality of Odds denomination in this article.

由于人口均等的主要缺陷都与亚群之间引入的待遇不平等有关,因此两个研究小组提出了相似的公平定义,其中考虑到了如何对待每个群体:赔率平等和完全不平等待遇。 在本文中,我们将使用奇数等价面额。

Equality of Odds is defined as the independence of 𝐶 and 𝐴 conditionally on 𝑌. In other words, a classifier treats every subpopulation the same way if it is has the same error rates for each subpopulation.

赔率的相等性定义为𝐶和ally在𝑌上的独立性。 换句话说,如果分类器的每个亚群具有相同的错误率,则它以相同的方式对待每个亚群。

The mathematical definition of equality of odds is:

赔率相等的数学定义是:

However, in the same way that reaching demographic parity is very hard in practice, finding a model that satisfies equality of odds is challenging and often comes at the price of low model performance.

但是,就像在实践中很难达到人口均等一样,找到一个满足机率相等的模型也很困难,并且通常以降低模型性能为代价。

Equality of Odds. Notice that, in order to satisfy this definition, more than half of the recruited candidates have to be unqualified
赔率相等。 请注意,为了满足此定义,一半以上的应聘候选人必须不合格

In the same way a relaxed version was defined for demographic parity, Hardt, Price, and Srebro defined equality of opportunity as a weaker version of equality of odds. Equality of opportunity results in applying the equality of odds constraint only for the true positive rate so that each subpopulation has the same opportunity to be granted the positive outcome.

用同样的方式为人口均等定义了一个宽松的版本,Hartt,Price和Srebro将机会均等定义为机会均等的较弱版本。 机会均等导致仅对真实的正比率应用机会均等约束,以便每个子群体都有相同的机会被授予正结果。

Equality of opportunity is defined as:

机会均等定义为:

To get back to our recruitment example, satisfying equality of opportunity means that we would recruit the same ratio of qualified candidates from each subpopulation.

回到我们的招聘示例,满足机会均等意味着我们将从每个子群体中招聘相同比例的合格候选人。

Equality of Opportunity
机会均等

In practice, as we cited before, equality of odds and equality of opportunity both have the ability to guarantee equality of treatment among subpopulations. Thereby, they also sanction laziness, which was one of the demographic parity definition flaws. This definition also gets rid of the fact that demographic parity was rejecting the optimal classifier 𝐶 = 𝑌. In that case, both false positive and false negative rates would be 0% for the whole population, meaning 0% for each subpopulation, proving that the optimal classifier satisfies both equality of odds and equality of opportunity.

在实践中,正如我们之前所提到的,机会均等和机会均等均具有保证亚人群平等待遇的能力。 因此,他们还制裁了懒惰现象,这是人口平等定义缺陷之一。 该定义还消除了人口统计均值拒绝最优分类器𝐶= fact的事实。 在那种情况下,假阳性率和假阴性率对整个人口而言都是0%,对于每个子群体而言都是0%,这证明了最优分类器既满足了机会均等,又满足了机会均等。

Limitations. However, there are two main flaws linked to equality of odds (and equality of opportunity) and both can be summed up by the fact that this definition might not help deal with unfairness problems in the long term. First, it doesn’t take into consideration possible discrimination outside of the model. As an example, as protected attributes (race, gender, class, etc.) have more or less always had an impact on access to opportunities such as loans or education, such discrimination could result in an unbalanced ratio between the privileged and unprivileged class, which will just be replicated by a model satisfying equality of opportunity.

局限性。 但是,与机会均等(和机会均等)相关的主要缺陷有两个,而这两个缺陷可以从以下事实来概括:该定义可能无法长期解决不公平问题。 首先,它没有考虑模型外部可能存在的歧视。 举例来说,由于受保护的属性(种族,性别,阶级等)或多或少总是会对诸如贷款或教育等机会的获取产生影响,因此这种歧视可能会导致特权阶级与非特权阶级之间的比例失衡,这将仅由满足机会均等的模型来复制。

So, in our recruiting case, if only 10% of our candidates are from the unprivileged class (because the job is highly qualified and only few people from the unprivileged class have had access to higher education), only 10% of the finally recruited candidates will be from the unprivileged class.

因此,在我们的招聘案例中,如果只有10%的候选人来自非特权阶层(因为工作资格很高,并且只有少数特权阶层的人可以接受高等教育),那么只有10%的最终候选人将来自弱势阶层。

The second flaw is a consequence of the first one in the case where there is an extreme difference between the privileged and unprivileged groups. In that case, by preserving the contrast, the model satisfying equality of opportunity might in the long term increase this difference, resulting in a vicious circle.

第二个缺陷是特权组和非特权组之间存在极端差异的情况下第一个缺陷的结果。 在那种情况下,通过保留对比,长期满足机会均等的模型可能会增加这种差异,从而导致恶性循环。

预测汇率平价 (Predictive Rate Parity)

A slightly similar fairness definition is predictive rate parity and was introduced by Dieterich, Mendoza and Brennan.

Dieterich,Mendoza和Brennan引入了一个稍微相似的公平性定义,即预测汇率平价。

A model satisfies predictive rate parity if the likelihood of the positive observation of the target variable among people predicted with the positive outcome is independent from the subpopulation.

如果在预测为阳性结果的人群中对目标变量进行阳性观察的可能性与亚人群无关,则该模型满足预测率均等。

Mathematically speaking, predictive rate parity is defined as follows:

从数学上讲,预测汇率平价定义如下:

In practice, relaxed definitions also exist for this definition, and, depending on the value of interest of 𝑌, we can talk about positive predictive parity or negative predictive parity. Positive predictive parity is defined as follows:

实际上,对于该定义也存在宽松的定义,并且根据interest的兴趣值,我们可以谈论正预测奇偶性或负预测奇偶性。 正预测奇偶校验的定义如下:

A practical example is that, among all the recruited candidates, the same proportion of really qualified applicants should be the same. Respectively it means that the recruitment errors are spread homogeneously among subpopulations. Thereby, predictive rate parity should guarantee that candidates are chosen by the model based on their real qualification for the job.

一个实际的例子是,在所有应聘候选人中,真正合格的申请人所占的比例应该相同。 分别意味着招聘错误在子人群中均匀分布。 因此,预测率的均等性应确保模型是根据模型的实际资格来选择的。

Positive predictive parity. Notice that it isn’t possible to satisfy predictive rate parity in this situation
积极的预测均等。 请注意,在这种情况下不可能满足预测的汇率平价

As the notion is similar to equality of opportunity, one of its advantages in practice is that it validates the optimal classifier 𝐶 = 𝑌, as both necessary probabilities are equal to 1 in the case of perfect classification. Another advantage could be called inclusiveness, as predictive rate parity (and especially positive rate parity) literally means that the chances for a recruited individual to succeed are the same no matter the subpopulation it belongs to.

由于该概念类似于机会均等,因此在实践中其优势之一在于,它验证了最佳分类器𝐶=𝑌,因为在完全分类的情况下,两个必要概率都等于1。 另一个优势可以称为包容性,因为预测率均等(尤其是正率均等)从字面上意味着,被招募人员成功的机会是相同的,无论其属于哪个子群体。

Limitations. However, this definition’s resemblance with equality of opportunity also makes its flaws similar: it doesn’t take into account unfairness preexisting among candidates and replicates it. Models corrected under predictive rate parity can also boost unfairness in the long term. In practice, this method also has definition flaws: it needs access to the true value of the target, which is sometimes hard to define (true qualification for a job as an example) and it is very similar to equality of opportunity but way more difficult to implement in practice.

局限性。 但是,该定义与机会均等的相似之处也使它的缺陷相似:它没有考虑到候选人之间已经存在的不公平现象,并且没有加以复制。 从长期来看,根据预测利率平价进行修正的模型也可能加剧不公平现象。 在实践中,这种方法也存在定义缺陷:它需要获取目标的真实价值,有时很难定义(以工作的真实资格为例),它与机会均等非常相似,但难度更大付诸实践。

没有公平的免费午餐 (No Free Lunch in Fairness)

Now that we’ve explored the different types of fairness definitions, we have to highlight a fairness property that has crucial importance when correcting unfair algorithms in practice.

既然我们已经探索了不同类型的公平性定义,我们就必须强调一个公平性属性,该属性在实际中纠正不公平算法时至关重要。

This property is called the Impossibility Theorem of Fairness and states the pairwise incompatibility of all group fairness definitions discussed here (demographic parity, equality of odds, and predictive rate parity).

此属性称为“不可能性定理” ,它陈述了此处讨论的所有组别公平性定义的成对不兼容性(人口统计学奇偶性,赔率相等性和预测率奇偶性)。

It is impossible to satisfy all definitions of group fairness, meaning that the data scientists need to choose one to refer to when starting a fairness analysis.

无法满足组公平性的所有定义,这意味着数据科学家在开始公平性分析时需要选择一个要引用的数据。

结论 (Conclusion)

On top of biased data issues (cf. our previous article) lies another obstacle when it comes to correcting unfairness in practice: there is no consensus on the definition of fairness. Already existing legal material is too vague to be used in machine learning, and there are currently six main fairness definitions across research papers on fairness: Unawareness, Demographic Parity, Equality of Odds (and of Opportunity), Predictive Rate Parity, Individual Fairness, and Counterfactual Fairness.

除了在有偏见的数据问题上(请参见我们的上一篇文章),在纠正实践中的不公平性方面还存在另一个障碍:关于公平性的定义尚无共识。 现有的法律材料太模糊了,无法用于机器学习,并且在有关公平性的研究论文中,目前有六个主要的公平性定义:不了解,人口均等,赔率(和机会均等),预测率均等,个人公平以及反事实公平。

The Impossibility Theorem of Fairness proves that Demographic Parity, Equality of Odds, and Predictive Rate Parity are pairwise incompatible, which makes satisfying all fairness definitions impossible. Therefore, we face a practical dilemma when it comes to designing fair machine learning models — there’s no “best” answer.

公平的不可能定理证明,人口奇偶性,赔率相等性和预测汇率奇偶性是成对不相容的,这使得不可能满足所有公平性定义。 因此,在设计公平的机器学习模型时,我们面临着一个实际的困境–没有“最佳”答案。

Now that we’ve defined how to detect unfairness in machine learning models, the next article in our fairness blog posts series will focus on how to correct unfair models.

既然我们已经定义了如何检测机器学习模型中的不公平性,那么我们的公平性博客文章系列的下一篇文章将重点介绍如何纠正不公平模型。

翻译自: https://medium/data-from-the-trenches/measuring-fairness-in-machine-learning-models-2be070fab712

机器学习模型的衡量指标

更多推荐

机器学习模型的衡量指标_在机器学习模型中衡量公平性