蛇爬树问题

Rather, how to get down from it. But first things first. This article stands out a bit of the usual format of articles from PVS-Studio. We often write about checking other projects, but almost never lift the veil on our inner workings. It's time to rectify this omission and talk about how the analyzer is built from the inside. More precisely, about the most important of its parts — the syntax tree. The article will focus on the part of PVS-Studio that relates to the C and C++ languages.

相反,如何摆脱困境。 但是首先是第一件事。 本文突出了PVS-Studio文章的常用格式。 我们经常写关于检查其他项目的文章,但几乎从来没有揭开我们内部工作的面纱。 现在该纠正这一遗漏,并从内部讨论分析仪的构造。 更确切地说,关于其最重要的部分-语法树。 本文将重点介绍与C和C ++语言有关的PVS-Studio部分。

第一件事 (First things first)

The syntax tree is the central part of any compiler. One way or another, the code needs to be presented in a form convenient for program handling, and it just so happens that the tree structure is best suited for this. I will not delve into the theory here — suffice it to say that the tree very well reflects the hierarchy of expressions and blocks in the code, and at the same time contains only the data necessary for work.

语法树是任何编译器的核心部分。 一种或另一种方式是,需要以一种便于程序处理的形式来呈现代码,而恰好恰恰是树结构最适合此形式。 我不会在这里深入研究该理论-足以说这棵树很好地反映了代码中表达式和块的层次结构,并且同时仅包含工作所需的数据。

What does the compiler have to do with the static analyzer? The fact is that these two tools have a lot in common. At the initial stage of parsing the code, they do the same job. First, the code is divided into a stream of tokens, which is fed to the parser. Then, in the process of synthetic and semantic analysis, tokens are organized into a tree, which is sent further along the pipeline. At this stage, compilers can perform intermediate optimizations before generating binary code, static analyzers begin traversing nodes and launching various checks.

编译器与静态分析器有什么关系? 事实是这两个工具有很多共同点。 在解析代码的初始阶段,它们执行相同的工作。 首先,将代码分为令牌流,该令牌流被馈送到解析器。 然后,在综合和语义分析过程中,令牌被组织成一棵树,然后沿着管道进一步发送。 在此阶段,编译器可以在生成二进制代码之前执行中间优化,静态分析器开始遍历节点并启动各种检查。

In the PVS-Studio analyzer with a tree built, several things happen:

在带有树的PVS-Studio分析仪中,发生了几件事:

  • For each declaration, types are determined. A declaration can be a variable, function, class, type alias definition via using or typedef, and so on. In brief, any declaration. All this is entered up in the table for the current scope;

    对于每个声明,确定类型。 声明可以是变量,函数,类, 使用usingtypedef的类型别名定义,等等。 简而言之,任何声明。 所有这些都输入到当前范围的表中;

  • Expressions are processed and variable values are calculated. Information that the analyzer uses for symbolic calculations and data flow analysis is stored;

    处理表达式并计算变量值。 存储分析仪用于符号计算和数据流分析的信息;
  • Overloads of the called functions are selected, predefined annotations are applied to them, and if they are absent, then whenever possible they are deduced automatically;

    选择被调用函数的重载,对它们应用预定义的注释,如果不存在,则在可能的情况下自动推断出它们;
  • The data flow is analyzed. To do this, the analyzer stores the value of each variable (if it can be calculated at compile time). In addition to the values, known data about their state is attached to the variables. For example, let's say that a function starts with a check of a pointer for nullptr followed by exiting the function if the pointer is null. In this case it will be considered valid further along the code. This data is also used in interprocedural analysis;

    分析数据流。 为此,分析器将存储每个变量的值(如果可以在编译时进行计算)。 除了这些值之外,有关其状态的已知数据还附加到变量中。 例如,假设一个函数从检查指针是否为nullptr开始,如果指针为null,则退出该函数。 在这种情况下,它将在代码中进一步有效。 该数据还用于过程间分析;

  • Diagnostic rules are run. Depending on the logic of their work, they can do an additional traversal of the tree. For different types of expressions, their own sets of diagnostics are launched, which sometimes may intersect.

    运行诊断规则。 根据工作的逻辑,他们可以对树进行其他遍历。 对于不同类型的表达式,将启动它们自己的诊断程序集,有时可能会相交。

If you are interested in the details of how the analysis works, I recommend reading the article "Technologies used in the PVS-Studio code analyzer for finding bugs and potential vulnerabilities". Some points from the list are covered there in detail.

如果您对分析的工作方式的细节感兴趣,我建议您阅读文章“ PVS-Studio代码分析器中用于查找错误和潜在漏洞的技术 ”。 列表中的一些要点在此处详细介绍。

We will look in more detail what happens to the tree inside the analyzer, and how it looks in general. At this point a brief introduction is over, it's time to get to the crux of the matter.

我们将更详细地分析分析器内部的树会发生什么,以及它的总体外观。 至此,简短的介绍已经结束,现在是解决问题的关键了。

这个怎么运作 (How it works)

Historically, PVS-Studio uses a binary tree to represent code. This classic data structure is familiar to everyone — we have a node that generally refers to two child ones. I will call nodes that are not supposed to have descendants — terminals, all others — nonterminals. A nonterminal may in some cases not have child nodes, but its key difference from the terminal is that descendants are fundamentally allowed for it. Terminal nodes (or leaves) lack the ability to refer to something other than the parent.

从历史上看,PVS-Studio使用二叉树表示代码。 每个人都熟悉这种经典的数据结构-我们有一个通常引用两个子节点的节点。 我将不应该具有后代的节点称为终端,而所有其他节点都称为非终端。 非终端在某些情况下可能没有子节点,但与终端的主要区别在于从根本上允许后代。 终端节点(或叶子)缺乏引用父节点以外的东西的能力。

The structure used in PVS-Studio is slightly different from the classical binary tree — this is necessary for convenience. Terminal nodes usually correspond to keywords, variable names, literals, and so on. Non-terminals — various types of expressions, blocks of code, lists, and alike constituent elements of a tree.

PVS-Studio中使用的结构与经典的二叉树略有不同-为方便起见,这是必需的。 终端节点通常对应于关键字,变量名,文字等。 非终结符-各种类型的表达式,代码块,列表以及树的类似组成元素。

With regard to compilers design, everything here is pretty standard. I encourage all interested to check out the iconic "Dragon Book".

关于编译器设计,这里的所有内容都是非常标准的。 我鼓励所有感兴趣的人阅读标志性的“ 龙书 ”。

As for us, we move on. Let's look at a simple code example and how the analyzer perceives it. Further there will be many pictures from our internal tree visualization utility.

至于我们,我们继续前进。 让我们看一个简单的代码示例,以及分析器如何看待它。 此外,我们的内部树可视化实用程序将提供许多图片。

So here is the example:

因此,这是示例:

int f(int a, int b)
{
  return a + b;
}

Being handled by the parser this simple function will look like this (non-terminal nodes are highlighted in yellow):

由解析器处理的这个简单函数将如下所示(非终端节点以黄色突出显示):

Such representation has its pros and cons. Cons, in my opinion, outnumber the pros. Anyway, let's look at the tree itself. I hasten to say that it is rather redundant, for example, as it contains punctuation and parentheses. The compiler considers it as superfluous garbage, but the analyzer might need this information for some diagnostic rules. In other words, the analyzer does not work with the abstract syntax tree (AST), but with the derivation tree (DT).

这样的代表有其优点和缺点。 我认为缺点多于优点。 无论如何,让我们看一下树本身。 我不得不说这是多余的,例如,因为它包含标点符号和括号。 编译器认为它是多余的垃圾,但是分析器可能需要此信息来用于某些诊断规则。 换句话说,分析器不使用抽象语法树(AST),而是使用派生树 (DT)。

The tree grows from left to right and from top to bottom. Left child nodes always contain something meaningful, such as declarators. If we look at the right part of it, we'll see intermediate nonterminals marked by the word NonLeaf. They are only needed for the free to retain its structure. Such nodes don't convey any informational load for the analysis needs.

这棵树从左到右,从上到下生长。 左子节点始终包含有意义的内容,例如声明符。 如果我们看一下它的正确部分,我们会看到中间的非终结符,并标有单词NonLeaf 。 仅需要它们即可自由保留其结构。 这样的节点不传达任何分析需求的信息负载。

At this point, we're interested in the left part of the tree. Here it is in a larger closeup:

此时,我们对树的左侧感兴趣。 这是一个更大的特写:

This is a function declaration. The PtreeDeclarator parent node is an object through which you can access nodes with the name of the function and its parameters. It also stores the encoded signature for the type system. It seems to me that this picture is pretty self-explanatory, and it's pretty easy to compare the elements of the tree with the code.

这是一个函数声明。 PtreeDeclarator父节点是一个对象,您可以通过该对象访问带有函数名称及其参数的节点。 它还存储类型系统的编码签名。 在我看来,这张图片是不言自明的,并且很容易将树的元素与代码进行比较。

Looks simple, right?

看起来很简单,对吧?

For more clarity, let's take a simpler example. Imagine that we have the code that calls our f function:

为了更清楚,让我们举一个简单的例子。 假设我们有调用f函数的代码:

f(42, 23);

The function call in the tree will look like this:

树中的函数调用将如下所示:

The structure is very similar, only here we see the function call instead of its declaration. Now suppose we wanted to go through all the arguments and do something with each of them. This is a real task that is often found in analyzer code. Needless to say, all this doesn't revolve around arguments, so different types of nodes have to be traversed. But right now we'll consider this specific example.

结构非常相似,仅在这里我们看到函数调用而不是其声明。 现在假设我们想遍历所有参数并对每个参数进行一些处理。 这是一个真正的任务,通常可以在分析器代码中找到。 不用说,所有这些都不会围绕参数展开,因此必须遍历不同类型的节点。 但是现在,我们将考虑这个特定示例。

Suppose we only have a pointer to the parent FUNCALL node. From any nonterminal, we can get the left and right child nodes. The type of each of them is known. We know the structure of the tree, therefore we can immediately get to the node with the list of arguments, which is the NonLeaf, from which the terminal 42 grows (as shown in the picture). We don't know the number of arguments in advance, and there are commas in the list that in this case are absolutely not of interest to us.

假设我们只有一个指向父FUNCALL节点的指针。 从任何非终端,我们都可以获取左右子节点。 每个类型都是已知的。 我们知道树的结构,因此我们可以立即进入带有参数列表的节点,即NonLeaf ,终端42从该节点增长(如图所示)。 我们事先不知道参数的数量,列表中有逗号,在这种情况下我们绝对不感兴趣。

How will we do this? Keep reading.

我们将如何做? 继续阅读。

车轮发明实验室 (Wheel invention lab)

It would seem that iterating along the tree is quite simple. You just need to write a function that will do just that, and use it everywhere. Perhaps, also pass it a lambda as an argument to handle each element. It really would be so, if not for a couple of nuances.

沿着树迭代似乎很简单。 您只需要编写一个可以执行此操作的函数,然后在任何地方使用它即可。 也许也将lambda作为参数传递给它来处理每个元素。 如果不是有一些细微差别,那的确会是这样。

Firstly, every time traversing the tree has to be a little different. The logic of handling each node is different, as well as the logic of working with the entire list. Say, in one case, we want to go through the list of arguments and pass each of them to a certain function for handling. In another, we want to select and return one argument that meets some requirements. Or filter the list and discard any uninteresting elements from it.

首先,每次遍历树都必须有所不同。 处理每个节点的逻辑以及处理整个列表的逻辑是不同的。 假设,在一种情况下,我们要遍历参数列表,并将每个参数传递给某个函数进行处理。 在另一种情况下,我们希望选择并返回一个满足某些要求的参数。 或过滤列表,并从列表中丢弃所有无关的元素。

Secondly, sometimes you need to know the index of the current element. For example, we want to handle only the first two arguments and stop.

其次,有时您需要知道当前元素的索引。 例如,我们只想处理前两个参数并停止。

Third, let's digress from the function example. Let's say we have a code fragment like this:

第三,让我们脱离函数示例。 假设我们有一个这样的代码片段:

int f(int a, int b)
{
  int c = a + b;
  c *= 2;
  if (c < 42) return c;
  return 42;
}

I know, this code is dullish, but let's concentrate now on how the tree looks. We have already seen the function declaration, here we need its body:

我知道,这段代码有些呆板,但现在让我们集中讨论树的外观。 我们已经看过函数声明,在这里我们需要它的主体:

This case is like a list of arguments, but you might notice some difference. Take another look at the picture from the previous section.

这种情况就像一个参数列表,但是您可能会注意到一些区别。 再看看上一节中的图片。

Did you notice something?

你有注意到吗?

That's right, there are no commas in this list, which means that you can process it in a row and not worry about skipping separators.

没错,此列表中没有逗号,这意味着您可以连续处理它,而不必担心跳过分隔符。

In total, we have at least two cases:

总共,我们至少有两种情况:

  • The list with separators.

    带分隔符的列表。
  • The homogeneous list.

    同类列表。

Now let's see how all this operates in the analyzer code. Here is an example of traversing the list of arguments. This is a simplified version of one of the functions in the translator.

现在,让我们看看所有这些如何在分析器代码中运行。 这是遍历参数列表的示例。 这是翻译器中功能之一的简化​​版本。

void ProcessArguments(Ptree* arglist)
{
  if (!arglist) return;

  Ptree* args = Second(arglist);
  while (args)
  {
    Ptree* p = args->Car();
    if (!Eq(p, ','))
    {
      ProcessArg(p);
    }

    args = args->Cdr();
  }
}

If I were paid a dollar every time I see such code, I would already get rich.

如果我每次看到这样的代码时都得到一美元的报酬,那我已经很富有了。

Let's see what is happening here. I should warn you, this is very old code written long before even C++11, not to mention more modern standards. I guess, I was specifically looking for a fragment of the times of ancient civilizations.

让我们看看这里发生了什么。 我应该警告您,这是非常古老的代码,甚至在C ++ 11之前就已经编写了,更不用说更现代的标准了。 我想,我特别在寻找古代文明时代的片段。

So, firstly, this function accepts the list of arguments in parentheses as input. Something like that:

因此,首先,此函数接受括号中的参数列表作为输入。 像这样:

(42, 23)

(42,23)

The Second function is called here to get the contents of the parentheses. All that it does is shifting once to the right and then once to the left through the binary tree. Next, the loop sequentially gets the elements: 42, then a comma, then 23, and in the next step, the args pointer becomes null, because we get to the end of the branch. The loop, of course, skips uninteresting commas.

在这里调用Second函数来获取括号的内容。 它所做的一切只是在二叉树中向右移动一次,然后向左移动一次。 接下来,循环依次获取元素:42,逗号,23,然后在下一步中, args指针变为空,因为我们到达了分支的末尾。 当然,该循环会跳过无趣的逗号。

Similar functions with slightly changed logic can be found in many places, especially in the old code.

在许多地方,尤其是在旧代码中,都可以找到逻辑稍有变化的类似功能。

Another example. How do I know if there is a call to a certain function in a certain block of code? Somehow as follows:

另一个例子。 我怎么知道某个代码块中是否有对某个函数的调用? 如下所示:

bool IsFunctionCalled(const Ptree* body, std::string_view name)
{
  if (!arglist) return;

  const Ptree* statements = body;
  while (statements)
  {
    const Ptree* cur = statements->Car();
    if (IsA(cur, ntExprStatement) && IsA(cur->Car(), ntFuncallExpr))
    {
      const Ptree* funcName = First(cur->Car());
      if (Eq(funcName, name))
        return true;
    }

    statements = statements->Cdr();
  }
  return false;
}

Note. An attentive reader might have noticed something. So where is it old? There is std::string_view sticking out. It's plain and simple, even the oldest code is gradually refactored and eventually nothing of this kind will remain.

注意。 细心的读者可能已经注意到了一些东西。 那哪里老了? 有std :: string_view突出显示。 它简单明了,即使最古老的代码也逐渐被重构,最终将一无所有。

It would be nice to use something more elegant here, right? Well, for example, the standard find_if algorithm. In fact, even a regular range-based for would greatly improve readability and facilitate the maintaining of such code, not to mention the algorithm.

在这里使用更优雅的东西会很好,对吗? 好吧,例如,标准的find_if算法。 实际上,即使基于规则范围的for也会大大提高可读性并促进此类代码的维护,更不用说算法了。

Let's try to achieve this.

让我们尝试实现这一目标。

把树放在盒子里 (Put the tree in the box)

Our goal is to make the tree behave like an STL container. In doing so, we should not care about the internal structure of the lists, we want to uniformly iterate through the nodes, for example, like this:

我们的目标是使树的行为类似于STL容器。 这样做时,我们不必关心列表的内部结构,而希望统一遍历节点,例如,像这样:

void DoSomethingWithTree(const Ptree* tree)
{
  ....
  for (auto cur : someTreeContainer)
  {
    ....
  }
}

As you can see, here we have a certain entity called someTreeContainer, which we do not know about yet. Such a container should have at least begin and end methods that return iterators. Speaking of iterators, they should also behave like standard ones. Let's start right with them.

如您所见,这里有一个称为someTreeContainer的特定实体,我们尚不知道。 这样的容器至少应具有返回迭代器的beginend方法。 说到迭代器,它们的行为也应该像标准迭代器一样。 让我们从他们开始。

In the simplest case, the iterator looks like this:

在最简单的情况下,迭代器如下所示:

template <typename Node_t,
          std::enable_if_t<std::is_base_of_v<Node_t, Ptree>, int>>
class PtreeIterator
{
public:
  using value_type = Node_t;
  using dereference_type = value_type;
  using reference = std::add_lvalue_reference_t<value_type>;
  using pointer   = std::add_pointer_t<value_type>;
  using difference_type =
    decltype(std::declval<pointer>() - std::declval<pointer>());
  using iterator_category = std::forward_iterator_tag;

public:
  PtreeIterator(Node_t* node) noexcept : m_node{ node } {}
  ....

  PtreeIterator& operator++() noexcept
  {
    m_node = Rest(m_node);
    return *this;
  }
  dereference_type operator*() const noexcept
  {
    return static_cast<dereference_type>(First(m_node));
  }

private:
  Node_t* m_node = nullptr;
};

In order not to clutter the code, I removed some details. The key points here are the dereferencing and the increment. The template is needed so that the iterator can work with both constant and non-constant data.

为了不使代码混乱,我删除了一些细节。 这里的关键点是取消引用和增量。 需要模板,以便迭代器可以处理常量和非常量数据。

Now we will write the container in which we will place the tree node. Here is the simplest option:

现在,我们将编写用于放置树节点的容器。 这是最简单的选择:

template <typename Node_t>
class PtreeContainer
{
public:
  using Iterator = PtreeIterator<Node_t>;
  using value_type = typename Iterator::dereference_type;
  using size_type  = size_t;
  using difference_type =
        typename Iterator::difference_type;

public:
  PtreeContainer(Node_t* nodes) :
    m_nodes{ nodes }
  {
    if (IsLeaf(m_nodes))
    {
      m_nodes = nullptr;
    }
  }

  ....

  Iterator begin() const noexcept
  { 
    return m_nodes;
  }
  Iterator end() const noexcept
  { 
    return nullptr; 
  }
  bool empty() const noexcept
  {
    return begin() == end();
  }

private:
  Node_t* m_nodes = nullptr;
};

Ok, we are done, we can all rest easy, thanks for your attention.

好的,我们完成了,我们都可以安枕无忧,感谢您的关注。

No, hold on. It can't be that simple, right? Let's go back to our two list variants — with and without separators. Here, when incrementing, we simply take the right node of the tree, so this does not solve the problem. We still need to skip commas if we want to work only with data.

不,等一下 不可能那么简单,对吧? 让我们回到我们的两个列表变体–带和不带分隔符。 在这里,当增加时,我们仅取树的右节点,因此不能解决问题。 如果我们只想处理数据,我们仍然需要跳过逗号。

Not a problem, we just add an additional template parameter to the iterator. For instance, as follows:

没问题,我们只需向迭代器添加一个额外的模板参数即可。 例如,如下:

enum class PtreeIteratorTag : uint8_t
{
  Statement,
  List
};

template <typename Node_t, PtreeIteratorTag tag,
  std::enable_if_t<std::is_base_of_v<Node_t, Ptree>, int> = 0>
class PtreeIterator { .... };

How can this help us? As easy as pie. We will check this parameter in the increment operator and behave accordingly. Fortunately, in C++ 17 we can solve this at compile time using the if constexpr construct:

这对我们有什么帮助? 像馅饼一样容易。 我们将在增量运算符中检查此参数并相应地执行操作。 幸运的是,在C ++ 17中,我们可以使用if constexpr构造在编译时解决此问题:

PtreeIterator& operator++() noexcept
{
  if constexpr (tag == PtreeIteratorTag::Statement)
  {
    m_node = Rest(m_node);
  }
  else
  {
    m_node = RestRest(m_node);
  }
  return *this;
}

That's better, now we can choose an iterator to meet our needs. What shall we do with containers? You can, for example, do something like this:

更好,现在我们可以选择一个迭代器来满足我们的需求。 我们该如何处理容器? 例如,您可以执行以下操作:

template <typename Node_t, PtreeIteratorTag tag>
class PtreeContainer
{
public:
  using Iterator = PtreeIterator<Node_t, tag>;
  ....
};

Ok, are we done yet? Actually, not really.

好的,我们完成了吗? 其实不是。

但这还不是终点 (But this is not the end)

Let's look at this code:

让我们看一下这段代码:

void ProcessEnum(Ptree* argList, Ptree* enumPtree)
{
  const ptrdiff_t argListLen = Length(argList);
  if (argListLen < 0) return;

  for (ptrdiff_t i = 0; i < argListLen; ++i)
  {
    std::string name;
    Ptree* elem;

    const EGetEnumElement r = GetEnumElementInfo(enumPtree, i, elem, name);
    ....
  }
}

I really don't like a lot in this code, starting from the loop with a counter, and ending with the fact that the GetEnumElementInfo function looks very suspicious. At the moment it remains a black box for us, but we can assume that it gets the enum element by index and returns its name and node in the tree via out-parameters. The return value is also a bit strange. Let's get rid of it at all — it's an ideal job for our list iterator:

我真的不喜欢这段代码,从带有计数器的循环开始,到以GetEnumElementInfo函数看起来非常可疑为结尾 。 目前,它仍然是我们的黑匣子,但是我们可以假定它按索引获取枚举元素,并通过输出参数在树中返回其名称和节点。 返回值也有点奇怪。 让我们完全摆脱它-这对于我们的列表迭代器来说是一项理想的工作:

void ProcessEnum(const Ptree* argList)
{
  for (auto elem : PtreeContainer<const Ptree, PtreeIteratorTag::List>(argList))
  {
    auto name = PtreeToString(elem);
    ....
  }
}

Not bad. The snag is that the code doesn't compile. Why? Because the index we removed was used in the body of the loop below the GetEnumElementInfo call. I will not say here exactly how it was used, because it is not crucial now. Suffice it to say that an index is needed.

不错。 问题是代码无法编译。 为什么? 因为我们删除的索引用于GetEnumElementInfo调用下面的循环主体中。 在这里我不会确切地说出它的用法,因为它现在并不重要。 只需说需要一个索引即可。

Well, let's add a variable and mess up our beautiful code:

好吧,让我们添加一个变量并弄乱我们漂亮的代码:

void ProcessEnum(const Ptree* argList)
{
  size_t i = 0;
  for (auto elem : PtreeContainer<const Ptree, PtreeIteratorTag::List>(argList))
  {
    auto name = PtreeToString(elem);
    ....
    UseIndexSomehow(i++);
  }
}

Still a working option, but this is how I personally react to something like this:

仍然是一个可行的选择,但这是我个人对以下内容的React:

Well, let's try to solve this problem. We need something that can count elements automatically. Let's add an iterator with a counter. I again skipped extra details for brevity:

好吧,让我们尝试解决这个问题。 我们需要可以自动计数元素的东西。 让我们添加一个带有计数器的迭代器。 为了简洁起见,我再次跳过了其他详细信息:

template <typename Node_t, PtreeIteratorTag tag,
  std::enable_if_t<std::is_base_of_v<Node_t, Ptree>, int>>
class PtreeCountingIterator
{
public:
  using size_type = size_t;
  using value_type = Node_t;
  using dereference_type = std::pair<value_type, size_type>;
  using reference = std::add_lvalue_reference_t<value_type>;
  using pointer = std::add_pointer_t<value_type>;
  using difference_type =
        decltype(std::declval<pointer>() - std::declval<pointer>());
  using iterator_category = std::forward_iterator_tag;

public:
  PtreeCountingIterator(Node_t* node) noexcept : m_node{ node } {}
  ....

  PtreeCountingIterator& operator++() noexcept
  {
    if constexpr (tag == PtreeIteratorTag::Statement)
    {
      m_node = Rest(m_node);
    }
    else
    {
      m_node = RestRest(m_node);
    }

    ++m_counter;
    return *this;
  }

  dereference_type operator*() const noexcept
  {
    return { static_cast<value_type>(First(m_node)), counter() };
  }

private:
  Node_t* m_node = nullptr;
  size_type m_counter = 0;
};

Now we can write such code, right?

现在我们可以编写这样的代码了,对吧?

void ProcessEnum(const Ptree* argList)
{
  for (auto [elem, i] :
            PtreeCountedContainer<const Ptree, PtreeIteratorTag::List>(argList))
  {
    auto name = PtreeToString(elem);
    ....
    UseIndexSomehow(i);
  }
}

Generally speaking, we definitely can, but there is still one problem. If you look at this code, you may notice that we introduced yet another entity — something named PtreeCountedContainer. It seems that the situation is getting more sophisticated. What I really don't want to do is to juggle with different types of containers and given that they are the same inside, the hand itself reaches for the Occam's razor.

一般来说,我们当然可以,但是仍然存在一个问题。 如果看一下这段代码,您可能会注意到我们引入了另一个实体–名为PtreeCountedContainer的实体。 情况似乎越来越复杂。 我真的不想做的就是弄乱不同类型的容器,并且由于它们的内部相同,所以手本身就可以拿到Occam的剃须刀了。

We'll have to use the iterator as a template parameter for the container, but more on that later.

我们将必须使用迭代器作为容器的模板参数,但稍后会介绍更多。

类型的动物园 (Zoo of types)

Let's distract from counters, types, and iterators for a minute. In pursuit of a universal traverse of nodes, we forgot about the most significant thing — the tree itself.

让我们暂时分散计数器,类型和迭代器的注意力。 在追求节点的通用遍历时,我们忘记了最重要的事情-树本身。

Take a look at this code:

看一下这段代码:

int a, b, c = 0, d;

What we see in the tree:

我们在树中看到的是:

Let's now iterate over the list of declarators, but first I will tell you something else about the tree. All the time before that, we were dealing with a pointer to the Ptree class. This is the base class from which all other types of nodes are inherited. Through their interfaces we can get additional information. In particular, the topmost node in the picture can return to us the list of declarators without using utility functions such as First and Second. Also, we won't need Car and Cdr low-level methods (hi to fans of the Lisp language). This is good news, since in diagnostics we can ignore the implementation of the tree. I think everyone agrees that leaking abstractions are very bad.

现在让我们遍历声明器的列表,但是首先,我将告诉您有关树的其他信息。 在此之前的所有时间里,我们一直在处理指向Ptree类的指针。 这是继承所有其他类型节点的基类。 通过它们的界面,我们可以获得更多信息。 特别是,图片中最顶层的节点可以在不使用诸如FirstSecond之类的实用函数的情况下向我们返回声明器列表。 另外,我们将不需要CarCdr低级方法(向Lisp语言的迷们致敬)。 这是个好消息,因为在诊断中我们可以忽略树的实现。 我认为每个人都同意泄漏抽象是非常糟糕的。

This is how traversing of all declarators looks like:

这是遍历所有声明符的样子:

void ProcessDecl(const PtreeDeclarator* decl) { .... }

void ProcessDeclarators(const PtreeDeclaration* declaration)
{
  for (auto decl : declaration->GetDeclarators())
  {
    ProcessDecl(static_cast<const PtreeDeclarator*>(decl));
  }
}

The GetDeclarators method returns an iterable container. In this case, its type is PtreeContainer<const Ptree, PtreeIteratorTag::List>.

GetDeclarators方法返回一个可迭代的容器。 在这种情况下,其类型为PtreeContainer <const Ptree,PtreeIteratorTag :: List>

All fine and dandy, except for the cast. The fact is that the ProcessDecl function wants a pointer to a class derived from Ptree, but our iterators know nothing about it. I'd like to avoid converting types manually.

除演员外,其他一切都很好。 事实是, ProcessDecl函数想要一个指向从Ptree派生的类的指针,但是我们的迭代器对此一无所知。 我想避免手动转换类型。

Seems like it's time we changed the iterator and added it the ability to cast.

好像是时候了,我们更改了迭代器,并为其添加了投射功能。

template <typename Node_t, typename Deref_t, PtreeIteratorTag tag,
  std::enable_if_t<std::is_base_of_v<Node_t, Ptree>, int>>
class PtreeIterator
{
public:
  using value_type = Deref_t;
  using dereference_type = value_type;
  using reference = std::add_lvalue_reference_t<value_type>;
  using pointer = std::add_pointer_t<value_type>;
  using difference_type =
        decltype(std::declval<pointer>() - std::declval<pointer>());
  using iterator_category = std::forward_iterator_tag;
  ....
}

In order not to write all these template arguments manually each time, we will add several aliases for all occasions:

为了避免每次都手动编写所有这些模板参数,我们将为所有情况添加几个别名:

template <typename Node_t, typename Deref_t = std::add_pointer_t<Node_t>>
using PtreeStatementIterator =
PtreeIterator<Node_t, Deref_t, PtreeIteratorTag::Statement>;

template <typename Node_t, typename Deref_t = std::add_pointer_t<Node_t>>
using PtreeListIterator =
PtreeIterator<Node_t, Deref_t, PtreeIteratorTag::List>;

template <typename Node_t, typename Deref_t = std::add_pointer_t<Node_t>>
using PtreeStatementCountingIterator =
PtreeCountingIterator<Node_t, Deref_t, PtreeIteratorTag::Statement>;

template <typename Node_t, typename Deref_t = std::add_pointer_t<Node_t>>
using PtreeListCountingIterator =
PtreeCountingIterator<Node_t, Deref_t, PtreeIteratorTag::List>;

That's better. Now, if we don't need the cast, we can specify only the first template argument. We also don't have to cram our heads with the value of the tag parameter.

这样更好 现在,如果我们不需要强制转换,则只能指定第一个模板参数。 我们也不必使用tag参数的值。

What shall we do with containers? To recap, we want to have only one universal class that is suitable for any iterator. What we have here is a ridiculously great number of different combinations, whereas we need simplicity. Something like this:

我们该如何处理容器? 回顾一下,我们只想拥有一个适用于任何迭代器的通用类。 我们这里所拥有的是大量的不同组合,而我们需要简单。 像这样:

That is, we want a single container class to be able to support all types of our iterators and be able to tell them which type to return when dereferencing. Then, in the code, we simply create the container we need and start working with it without a thought of which iterators we need.

也就是说,我们希望单个容器类能够支持所有类型的迭代器,并能够告诉它们在取消引用时返回哪种类型。 然后,在代码中,我们仅创建所需的容器并开始使用它,而无需考虑我们需要哪些迭代器。

We will address this question in the next section.

我们将在下一部分中解决这个问题。

模板魔术 (Template magic)

So here is what we need:

所以这是我们需要的:

  • One container that can work universally with any iterator.

    一个可以与任何迭代器通用的容器。
  • An iterator, which, depending on the list of nodes, can work both with each element, and through one.

    一个迭代器,它取决于节点列表,可以与每个元素一起使用,也可以与一个元素一起使用。
  • The same iterator, but with a counter.

    相同的迭代器,但带有计数器。
  • Both iterators should be able to cast when dereferencing, if the type is additionally specified.

    如果另外指定了类型,则两个迭代器都应能够在取消引用时进行转换。

First of all, we need to somehow bind the container type to the iterator type through template parameters. Here's what we finally got:

首先,我们需要通过模板参数以某种方式将容器类型绑定到迭代器类型。 这是我们最终得到的:

template <template <typename, typename> typename FwdIt,
          typename Node_t,
          typename Deref_t = std::add_pointer_t<Node_t>>
class PtreeContainer
{
public:
  using Iterator = FwdIt<Node_t, Deref_t>;
  using value_type = typename Iterator::dereference_type;
  using size_type  = size_t;
  using difference_type = typename Iterator::difference_type;

public:
  PtreeContainer(Node_t* nodes) :
    m_nodes{ nodes }
  {
    if (IsLeaf(m_nodes))
    {
      m_nodes = nullptr;
    }
  }

  ....
  Iterator begin() const noexcept
  { 
    return m_nodes;
  }
  Iterator end() const noexcept
  { 
    return nullptr; 
  }
  bool empty() const noexcept
  {
    return begin() == end();
  }
  ....

private:
  Node_t* m_nodes = nullptr;
};

Also, you can add more methods in the container. For example, this is how we can find out the number of elements:

另外,您可以在容器中添加更多方法。 例如,这是我们找出元素数量的方法:

difference_type count() const noexcept
{
  return std::distance(begin(), end());
}

Or here is the indexing operator:

或这里是索引运算符:

value_type operator[](size_type index) const noexcept
{
  size_type i = 0;
  for (auto it = begin(); it != end(); ++it)
  {
    if (i++ == index)
    {
      return *it;
    }
  }

  return value_type{};
}

Clearly, one has to handle such methods carefully because of their linear complexity, but sometimes they are useful.

显然,由于它们的线性复杂性,必须谨慎处理这些方法,但有时它们很有用。

For ease of use, we'll add aliases:

为了易于使用,我们将添加别名:

template <typename Node_t, typename Deref_t = std::add_pointer_t<Node_t>>
using PtreeStatementList =
PtreeContainer<PtreeStatementIterator, Node_t, Deref_t>;

template <typename Node_t, typename Deref_t = std::add_pointer_t<Node_t>>
using PtreeItemList =
PtreeContainer<PtreeListIterator, Node_t, Deref_t>;

template <typename Node_t, typename Deref_t = std::add_pointer_t<Node_t>>
using PtreeCountedStatementList =
PtreeContainer<PtreeStatementCountingIterator, Node_t, Deref_t>;

template <typename Node_t, typename Deref_t = std::add_pointer_t<Node_t>>
using PtreeCountedItemList =
PtreeContainer<PtreeListCountingIterator, Node_t, Deref_t>;

Now we can create containers easily. Say, in the already mentioned PtreeDeclaration class, we want to get a container from the GetDeclarators method, the iterator of which skips separators, while there is no counter in it, and when dereferenced, it returns a value of the PtreeDeclarator type. Here is the declaration of such a container:

现在,我们可以轻松创建容器。 假设在已经提到的PtreeDeclaration类中,我们想从GetDeclarators方法中获取一个容器,该容器的迭代器跳过分隔符,而其中没有计数器,并且在取消引用时,它将返回PtreeDeclarator类型的值。 这是这样一个容器的声明:

using DeclList =
      Iterators::PtreeItemList<Ptree, PtreeDeclarator*>;
using ConstDeclList =
      Iterators::PtreeItemList<const Ptree, const PtreeDeclarator*>;

Now we can write such code and not think about the type of a list, or casts:

现在我们可以编写这样的代码,而不必考虑列表或类型转换的类型:

void ProcessDecl(const PtreeDeclarator* decl) { .... }

void ProcessDeclarators(const PtreeDeclaration* declaration)
{
  for (auto decl : declaration->GetDeclarators())
  {
    ProcessDecl(decl);
  }
}

And finally, since type inference for aliases will appear only in C++ 20, in order to more conveniently create containers in the code, we added such functions:

最后,由于别名的类型推断将仅在C ++ 20中出现,为了更方便地在代码中创建容器,我们添加了以下功能:

template <typename Node_t>
PtreeStatementList<Node_t> MakeStatementList(Node_t* node)
{
  return { node };
}

template <typename Node_t>
PtreeItemList<Node_t> MakeItemList(Node_t* node)
{
  return { node };
}

template <typename Node_t>
PtreeCountedStatementList<Node_t> MakeCountedStatementList(Node_t* node)
{
  return { node };
}

template <typename Node_t>
PtreeCountedItemList<Node_t> MakeCountedItemList(Node_t* node)
{
  return { node };
}

Let's recall the function that worked with enums. Now we can write it like this:

让我们回想一下枚举所使用的函数。 现在我们可以这样写:

void ProcessEnum(const Ptree* argList)
{
  for (auto [elem, i] : MakeCountedItemList(argList))
  {
    auto name = PtreeToString(elem);
    ....
    UseIndexSomehow(i);
  }
}

Compare with the original version. It seems to me, it has become a way better:

与原始版本比较。 在我看来,它已经变得更好:

void ProcessEnum(Ptree* argList, Ptree* enumPtree)
{
  const ptrdiff_t argListLen = Length(argList);
  if (argListLen < 0) return;

  for (ptrdiff_t i = 0; i < argListLen; ++i)
  {
    std::string name;
    Ptree* elem;

    const EGetEnumElement r = GetEnumElementInfo(enumPtree, i, elem, name);
    ....
    UseIndexSomehow(i);
  }
}

就这样,伙计们 (That's all, folks)

That's all for me, thank you for your attention. I hope you found out something interesting or even useful.

这就是我的全部,谢谢您的关注。 希望您发现了一些有趣甚至有用的东西。

From the content of the article, it may seem that I am scolding the code of our analyzer and want to say that everything is bad there. But it is not so. Like any project with a history, our analyzer is full of geological deposits that have remained from past eras. Consider that we have just excavated, pulled out the artifacts of ancient civilization from underground and carried out restoration to make them look good on a shelf.

从文章的内容来看,似乎我在骂我们的分析器代码,并想说那里的一切都不好。 但事实并非如此。 像任何具有历史意义的项目一样,我们的分析仪充满了过去时代遗留下来的地质沉积物。 考虑到我们刚刚发掘,从地下挖出了古代文明的文物,并进行了修复,使它们在架子上看起来不错。

聚苯乙烯 (P.S.)

There will be a lot of code here. I doubted whether to include the implementation of iterators here or not, and in the end I decided to include it so as not to leave anything behind the scenes. If you are not interested in reading the code, here I will say goodbye to you. I wish the rest of you have a good time with templates.

这里将有很多代码。 我怀疑是否在这里包含迭代器的实现,最后我决定包括它,以便不遗漏任何东西。 如果您对阅读代码不感兴趣,在这里我将对您说再见。 我希望其余的人在模板方面过得愉快。

(Code)

常规迭代器 (Regular iterator)

template <typename Node_t, typename Deref_t, PtreeIteratorTag tag,
std::enable_if_t<std::is_base_of_v<Node_t, Ptree>, int> = 0>
class PtreeIterator
{
public:
  using value_type = Deref_t;
  using dereference_type = value_type;
  using reference = std::add_lvalue_reference_t<value_type>;
  using pointer = std::add_pointer_t<value_type>;
  using difference_type =
        decltype(std::declval<pointer>() - std::declval<pointer>());
  using iterator_category = std::forward_iterator_tag;

public:
  PtreeIterator(Node_t* node) noexcept : m_node{ node } {}
  PtreeIterator() = delete;
  PtreeIterator(const PtreeIterator&) = default;
  PtreeIterator& operator=(const PtreeIterator&) = default;
  PtreeIterator(PtreeIterator&&) = default;
  PtreeIterator& operator=(PtreeIterator&&) = default;

  bool operator==(const PtreeIterator & other) const noexcept
  {
    return m_node == other.m_node;
  }
  bool operator!=(const PtreeIterator & other) const noexcept
  {
    return !(*this == other);
  }
  PtreeIterator& operator++() noexcept
  {
    if constexpr (tag == PtreeIteratorTag::Statement)
    {
      m_node = Rest(m_node);
    }
    else
    {
      m_node = RestRest(m_node);
    }
    return *this;
  }
  PtreeIterator operator++(int) noexcept
  {
    auto tmp = *this;
    ++(*this);
    return tmp;
  }
  dereference_type operator*() const noexcept
  {
    return static_cast<dereference_type>(First(m_node));
  }
  pointer operator->() const noexcept
  {
    return &(**this);
  }

  Node_t* get() const noexcept
  {
    return m_node;
  }

private:
  Node_t* m_node = nullptr;
};

template <typename Node_t, typename Deref_t = std::add_pointer_t<Node_t>>
using PtreeStatementIterator =
PtreeIterator<Node_t, Deref_t, PtreeIteratorTag::Statement>;

template <typename Node_t, typename Deref_t = std::add_pointer_t<Node_t>>
using PtreeListIterator =
PtreeIterator<Node_t, Deref_t, PtreeIteratorTag::List>;

带计数器的迭代器 (Iterator with counter)

template <typename Node_t, typename Deref_t, PtreeIteratorTag tag,
std::enable_if_t<std::is_base_of_v<Node_t, Ptree>, int> = 0>
class PtreeCountingIterator
{
public:
  using size_type = size_t;
  using value_type = Deref_t;
  using dereference_type = std::pair<value_type, size_type>;
  using reference = std::add_lvalue_reference_t<value_type>;
  using pointer = std::add_pointer_t<value_type>;
  using difference_type =
        decltype(std::declval<pointer>() - std::declval<pointer>());
  using iterator_category = std::forward_iterator_tag;

 public:
  PtreeCountingIterator(Node_t* node) noexcept : m_node{ node } {}
  PtreeCountingIterator() = delete;
  PtreeCountingIterator(const PtreeCountingIterator&) = default;
  PtreeCountingIterator& operator=(const PtreeCountingIterator&) = default;
  PtreeCountingIterator(PtreeCountingIterator&&) = default;
  PtreeCountingIterator& operator=(PtreeCountingIterator&&) = default;

  bool operator==(const PtreeCountingIterator& other) const noexcept
  {
    return m_node == other.m_node;
  }
  bool operator!=(const PtreeCountingIterator& other) const noexcept
  {
    return !(*this == other);
  }
  PtreeCountingIterator& operator++() noexcept
  {
    if constexpr (tag == PtreeIteratorTag::Statement)
    {
      m_node = Rest(m_node);
    }
    else
    {
      m_node = RestRest(m_node);
    }

    ++m_counter;
    return *this;
  }
  PtreeCountingIterator operator++(int) noexcept
  {
    auto tmp = *this;
    ++(*this);
    return tmp;
  }
  dereference_type operator*() const noexcept
  {
    return { static_cast<value_type>(First(m_node)), counter() };
  }
  value_type operator->() const noexcept
  {
    return (**this).first;
  }

  size_type counter() const noexcept
  {
    return m_counter;
  }
  Node_t* get() const noexcept
  {
    return m_node;
  }

private:
  Node_t* m_node = nullptr;
  size_type m_counter = 0;
};

template <typename Node_t, typename Deref_t = std::add_pointer_t<Node_t>>
using PtreeStatementCountingIterator =
PtreeCountingIterator<Node_t, Deref_t, PtreeIteratorTag::Statement>;

template <typename Node_t, typename Deref_t = std::add_pointer_t<Node_t>>
using PtreeListCountingIterator =
PtreeCountingIterator<Node_t, Deref_t, PtreeIteratorTag::List>;

通用容器 (Generic container)

template <template <typename, typename> typename FwdIt,
          typename Node_t, typename Deref_t = std::add_pointer_t<Node_t>>
class PtreeContainer
{
public:
  using Iterator = FwdIt<Node_t, Deref_t>;
  using value_type = typename Iterator::dereference_type;
  using size_type  = size_t;
  using difference_type = typename Iterator::difference_type;

public:
  PtreeContainer(Node_t* nodes) :
    m_nodes{ nodes }
  {
    if (IsLeaf(m_nodes))
    {
      m_nodes = nullptr;
    }
  }

  PtreeContainer() = default;
  PtreeContainer(const PtreeContainer&) = default;
  PtreeContainer& operator=(const PtreeContainer&) = default;
  PtreeContainer(PtreeContainer&&) = default;
  PtreeContainer& operator=(PtreeContainer&&) = default;

  bool operator==(std::nullptr_t) const noexcept
  {
    return empty();
  }
  bool operator!=(std::nullptr_t) const noexcept
  {
    return !(*this == nullptr);
  }
  bool operator==(Node_t* node) const noexcept
  {
    return get() == node;
  }
  bool operator!=(Node_t* node) const noexcept
  {
    return !(*this == node);
  }
  bool operator==(PtreeContainer other) const noexcept
  {
    return get() == other.get();
  }
  bool operator!=(PtreeContainer other) const noexcept
  {
    return !(*this == other);
  }
  value_type operator[](size_type index) const noexcept
  {
    size_type i = 0;
    for (auto it = begin(); it != end(); ++it)
    {
      if (i++ == index)
      {
        return *it;
      }
    }

    return value_type{};
  }

  Iterator begin() const noexcept
  { 
    return m_nodes;
  }
  Iterator end() const noexcept
  { 
    return nullptr; 
  }
  bool empty() const noexcept
  {
    return begin() == end();
  }

  value_type front() const noexcept
  {
    return (*this)[0];
  }
  value_type back() const noexcept
  {
    value_type last{};
    for (auto cur : *this)
    {
      last = cur;
    }

    return last;
  }
  Node_t* get() const noexcept
  {
    return m_nodes;
  }

  difference_type count() const noexcept
  {
    return std::distance(begin(), end());
  }
  bool has_at_least(size_type n) const noexcept
  {
    size_type counter = 0;
    for (auto it = begin(); it != end(); ++it)
    {
      if (++counter == n)
      {
        return true;
      }
    }
    return false;
  }

private:
  Node_t* m_nodes = nullptr;
};

template <typename Node_t, typename Deref_t = std::add_pointer_t<Node_t>>
using PtreeStatementList =
PtreeContainer<PtreeStatementIterator, Node_t, Deref_t>;

template <typename Node_t, typename Deref_t = std::add_pointer_t<Node_t>>
using PtreeItemList =
PtreeContainer<PtreeListIterator, Node_t, Deref_t>;

template <typename Node_t, typename Deref_t = std::add_pointer_t<Node_t>>
using PtreeCountedStatementList =
PtreeContainer<PtreeStatementCountingIterator, Node_t, Deref_t>;

template <typename Node_t, typename Deref_t = std::add_pointer_t<Node_t>>
using PtreeCountedItemList =
PtreeContainer<PtreeListCountingIterator, Node_t, Deref_t>;

翻译自: https://habr/en/company/pvs-studio/blog/502516/

蛇爬树问题

更多推荐

蛇爬树问题_如何爬树