Javascript Dependency Tree

Introduction

In this post I will outline how to run tasks/jobs that have dependencies on other tasks such as builds or testing. When one task is dependent on another it means that the task cannot be be started or executed until it has the results from a previous task/job.

Imagine you plan on making an omelette, this task requires that you have eggs among other things. We can say then that this task, “make-omelette”, has a dependency on eggs. If you have eggs then the task of making the omelette can be started, otherwise you will have to delay the execution of that task until you have acquired eggs; going forward with this analogy we can introduce other tasks as well such as finding your keys and wallet before leaving your home, driving to the nearest grocery store, getting gas, etc.

(For those of you that are RTS gamers think about your tech tree build order)

As is the case with everyday life, many automated software systems are comprised of one or more jobs, and often times these jobs depend on the results of other jobs. It is therefore useful to have an idea of how to think about these dependencies and at the very least understand the essential algorithms used to work with them.

Problem Statement

In the version of the job shop scheduling problem solved by the Coffman–Graham algorithm, one is given a set of n jobs J1, J2, …, Jn, together with a system of precedence constraints Ji < Jj requiring that job Ji be completed before job Jj begins. Each job is assumed to take unit time to complete. The scheduling task is to assign each of these jobs to time slots on a system of W identical processors, minimizing the makespan of the assignment (the time from the beginning of the first job until the completion of the final job). Abstractly, the precedence constraints define a partial order on the jobs, so the problem can be rephrased as one of assigning the elements of this partial order to levels (time slots) in such a way that each time slot has at most as many jobs as processors (at most W elements per level), respecting the precedence constraints.

Before we continue to the real meat of this writeup, lets take a naive approach at solving this sort of problem without a structured approach. That way we can identify what it truly is that the Coffman-Graham algorithm solves and appreciate the value in having such a procedure as well as an understanding of when to use it.

Consider the following code:

function simulatedTask(timeToCompletion = 500, jobName) {
    let resolve;
    const p = new Promise((res) => {
        resolve = res;
    });
    setTimeout(() => {
        console.log('task "', jobName, '" completed.')
        resolve();
    }, timeToCompletion);
    return p;
}

var demoTasks = {
    a: () => {
        return simulatedTask(500, 'a').then(() => {
            return 'a';
        });
    },
    b: () => {
        return simulatedTask(1600, 'b').then(() => {
            return 'b';
        });
    },
    c: () => {
        return simulatedTask(1300, 'c').then(() => {
            return 'c';
        });
    },
    d: () => {
        return simulatedTask(850, 'd').then(() => {
            return 'd';
        });
    },
    e: () => {
        return simulatedTask(5000, 'e').then(() => {
            return 'e';
        });
    },
    f: () => {
        return simulatedTask(10, 'f').then(() => {
            return 'f';
        });
    },
};

var demoTree = {
    d: ['b', 'c', 'e'], // we expect this one to finish last
    a: [],
    b: ['a'],
    c: ['a'],
    e: [],
    f: ['e'],
};

The demoTree variable stores a set an unordered set of tasks or jobs. First we have must determine wich job(s) are the root nodes, meaning the jobs that have no dependencies. At a glance we can see that the jobs ‘a’ and ‘e’ don’t depend on anything so we can assign those as our root nodes/jobs. We’re going to build a tree (a graph) that allows us to visually see which jobs depened on which. Once we have this tree we can begin to implement some logic that will iteratively queue up the jobs and execute them one after another.

(This task is called graph drawing)

What we are doing is performing a topological sort.

class JobNode {
    constructor(name, parents = [], childs = []) {
        this.name = name;
        this.parents = parents;
        this.childs = childs;
    }
}

function buildDepsTree(tree) {
    const depsTree = {};
    const rootNodes = [];
    Object.keys(tree).forEach((jobName) => {
        const jobDeps = tree[jobName];
        const node = depsTree[jobName] || new JobNode(jobName);
        if (jobDeps.length === 0) {
            rootNodes.push(node);
            depsTree[jobName] = node;
            return;
        }
        jobDeps.forEach((parentJobName) => {
            const parentNode = depsTree[parentJobName] || new JobNode(parentJobName);
            parentNode.childs.push(node)
            node.parents.push(parentNode);
            depsTree[parentJobName] = parentNode;
        });
    });
    depsTree._sudoRoot = new JobNode('init jobs', [], rootNodes);
    return depsTree;
}

Now that we have some code that performs the sort we can then traverse the tree that results from the execution of this sort. Essentially we are chaining each job such that once all dependencies (parents) of a given node/job are finished we immediately kick off it’s execution.

function executeBuildWithDeps(tree = demoTree) {
    const depsTree = buildDepsTree(tree);
    const runningTasks = {};
    function traverse(node, cb) {
        if (node.childs.length === 0) {
            return;
        }
        node.childs.forEach((childNode) => {
            // start every task at a given level
            // in the deps tree at the same time
            cb(childNode);
        })
        node.childs.forEach((childNode) => {
            traverse(childNode, cb);
        });
    }
    traverse(depsTree._sudoRoot, (node) => {
        // avoid re-calling on nodes
        // with more than one parent dependency
        if (runningTasks[node.name]) {
            return;
        }
        const pendingParentJobs = [];
        node.parents.forEach((parentNode) => {
            pendingParentJobs.push(runningTasks[parentNode.name]);
        })
        runningTasks[node.name] = Promise.all(pendingParentJobs).then((results) => {
            if (results && results.length > 0) {
                console.log('results: ', results, ' passed to task "', node.name, '"');
            }
            if (!demoTasks[node.name]) {
                return;
            }
            return demoTasks[node.name](results)
        });
    });
}

executeBuildWithDeps();

Take a moment to consider what that means, namely we provide no constraint on the number processesors being used to execute these jobs. If the jobs you’re kicking off are relatively in expensive in both time and money, then this procedure works just fine; there’s no need to limit how many concurrent tasks can be running at a given time. This however is an entirely different problem from the case where we have a constraint on the number of tasks that can be running in parallel to eachother.

In our analogy with the shopping list, this means we could summon any number of people to go out and independently aquire whatever ingredients are necessary for the omelette we are preparing.

Consider our first step; drawing the graph - By placing the constraint ‘W’ as mentioned in the problem statement we have fundementally introduced an underlying structure to the tree we need to draw such that it cannot have any level with more than w nodes present in it. Imagine you’re setting up for an event such as a birthday party or a wedding. This constrait W represents the number of people you have on hand to help you execute tasks independently of eachother.

Enter: The Coffman-Graham Algorithm

Wikipedia:

In job shop scheduling and graph drawing, the Coffman–Graham algorithm is an algorithm, named after Edward G. Coffman, Jr. and Ronald Graham, for arranging the elements of a partially ordered set into a sequence of levels.

Each ‘level’ here represents a unit of time, where zero represents the moment this entire set of jobs is started and the last level represents the time when everything is finished. Each level can be thought of as some integer value on a vertical y-axis.

Let’s get acquainted with some terminology:

Directed Acyclic Graph

a directed acyclic graph is a finite directed graph with no directed cycles. That is, it consists of finitely many vertices and edges (also called arcs), with each edge directed from one vertex to another, such that there is no way to start at any vertex v and follow a consistently-directed sequence of edges that eventually loops back to v again.

Feedback Arc Set

A feedback arc set (FAS) or feedback edge set is a set of edges which, when removed from the graph, leave a DAG. Put another way, it’s a set containing at least one edge of every cycle in the graph.

Please take the time to familiarize yourself with these terms if you don’t already know them.

The Coffman-Graham algorithm is defined by the following procedure:

Represent the partial order by its transitive reduction or covering relation, a directed acyclic graph G that has an edge from x to y whenever x < y and there does not exist any third element z of the partial order for which x < z < y.

In the graph drawing applications of the Coffman–Graham algorithm, the resulting directed acyclic graph may not be the same as the graph being drawn, and in the scheduling applications it may not have an edge for every precedence constraint of the input: in both cases, the transitive reduction removes redundant edges that are not necessary for defining the partial order.

Construct a topological ordering of G in which the vertices are ordered lexicographically by the set of positions of their incoming neighbors. To do so, add the vertices one at a time to the ordering, at each step choosing a vertex v to add such that the incoming neighbors of v are all already part of the partial ordering, and such that the most recently added incoming neighbor of v is earlier than the most recently added incoming neighbor of any other vertex that could be added in place of v.

If two vertices have the same most recently added incoming neighbor, the algorithm breaks the tie in favor of the one whose second most recently added incoming neighbor is earlier, etc.

Assign the vertices of G to levels in the reverse of the topological ordering constructed in the previous step. For each vertex v, add v to a level that is at least one step higher than the highest level of any outgoing neighbor of v, that does not already have W elements assigned to it, and that is as low as possible subject to these two constraints.