Idiomatic tree and graph like structures in Rust

Posted on December 20, 2016 by Rust Leipzig

If you want to create data structures which can be modified during runtime, a possible solution could lead into tree or graph like structures. Writing tree structures in Rust is no trivial problem. Nevertheless there are some common idiomatic ways how to handle lifetime and borrowing issues. In other languages like C/C++ we would simply use pointers to create graphs or trees. This is also possible in Rust, but the thing is that this would kill every benefit from the borrow checker and could lead into the same pitfalls like in other languages.

I tried a lots of implementations of tree like structures, which should be dynamically modifiable during runtime. One of the first implementations used interior mutability with data structures like:

use std::rc::Rc;
use std::cell::RefCell;
struct Node<T> {
previous: Rc<RefCell<Box<Node<T>>>>,
//        ^  ^       ^   ^
//        |  |       |   |
//        |  |       |   - The next Node with generic T
//        |  |       |
//        |  |       - Heap allocated memory, needed
//        |  |         if T is a trait object.
//        |  |
//        |  - A mutable memory location with
//        |    dynamically checked borrow rules.
//        |    Needed because Box is immutable.
//        |
//        - Reference counted pointer, will be
//          dropped when every reference is gone.
//          Needed to create multiple node references.
<span class="n">next</span><span class="p">:</span> <span class="nb">Vec</span><span class="o">&lt;</span><span class="nb">Rc</span><span class="o">&lt;</span><span class="n">RefCell</span><span class="o">&lt;</span><span class="nb">Box</span><span class="o">&lt;</span><span class="n">T</span><span class="o">&gt;&gt;&gt;&gt;</span><span class="p">,</span>
<span class="n">data</span><span class="p">:</span> <span class="n">T</span><span class="p">,</span>
<span class="c">// ...</span>

}

This is on one hand hard to understand and will on the other hand lead into runtime borrow checks, which are not that idiomatic Rust. Furthermore back references are simply not possible with this approach, since cyclic references are not allowed within Rust. The main reason is that cyclic references could lead very fast into memory leaks if we are not really careful. Also, this approach is not multi threading capable, but we could use an Arc instead of Rc, which will decrease the overall performance.

To reduce the complexity of the problem we need to reduce the lifetime complexity within the structures. A good approach is to use a memory arena for the Nodes, because this implies that every element within the arena has the same lifetime. How would such an arena look like? What about a simple vector:

pub struct Arena<T> {
    nodes: Vec<Node<T>>,
}

A node could then look like this:

pub struct Node<T> {
    parent: Option<NodeId>,
    previous_sibling: Option<NodeId>,
    next_sibling: Option<NodeId>,
    first_child: Option<NodeId>,
    last_child: Option<NodeId>,
<span class="c">/// The actual data which will be stored within the tree</span>
<span class="k">pub</span> <span class="n">data</span><span class="p">:</span> <span class="n">T</span><span class="p">,</span>

}
pub struct NodeId {
index: usize,
}

This means we store the actual Node within the Vec, but for creating the tree, we simply use the indexes from the vector. To create a new node we can use this method:

pub fn new_node(&mut self, data: T) -> NodeId {
    // Get the next free index
    let next_index = self.nodes.len();
<span class="c">// Push the node into the arena</span>
<span class="k">self</span><span class="py">.nodes</span><span class="nf">.push</span><span class="p">(</span><span class="n">Node</span> <span class="p">{</span>
    <span class="n">parent</span><span class="p">:</span> <span class="nb">None</span><span class="p">,</span>
    <span class="n">first_child</span><span class="p">:</span> <span class="nb">None</span><span class="p">,</span>
    <span class="n">last_child</span><span class="p">:</span> <span class="nb">None</span><span class="p">,</span>
    <span class="n">previous_sibling</span><span class="p">:</span> <span class="nb">None</span><span class="p">,</span>
    <span class="n">next_sibling</span><span class="p">:</span> <span class="nb">None</span><span class="p">,</span>
    <span class="n">data</span><span class="p">:</span> <span class="n">data</span><span class="p">,</span>
<span class="p">});</span>

<span class="c">// Return the node identifier</span>
<span class="n">NodeId</span> <span class="p">{</span> <span class="n">index</span><span class="p">:</span> <span class="n">next_index</span> <span class="p">}</span>

}

This approach makes creating graph like structures very easy, where they can contain any data. A general multi processing is also possible since parts of a vector can shared across threads safely.

To try it out, simple use the indextree crate and create graph like structures for nearly everything.