Dataflow systems

Dataflow systems

An alternative to conventional programming languages and architectures, in which values rather than value containers are dealt with, and all processing is achieved by applying functions to values to produce new values. These systems can realize large amounts of parallelism (present in many applications) and effectively utilize very-large-scale-integration (VLSI) technology.

Dataflow systems use an underlying execution model which differs substantially from the conventional one. The model deals with values, not names of value containers. There is no notion of assigning different values to an object which is held in a global, updatable memory location. A statement such as X: = B = C in a dataflow language is only syntactically similar to an assignment statement. The meaning of X: = B = C in dataflow is to compute the value B = C and bind this value to the name X. Other operators can use this value by referring to the name X, and the statement has a precise mathematical meaning defining X. This definition remains constant within the scope in which the statement occurs. Languages with this property are sometimes referred to as single assignment languages. The second property of the model is that all processing is achieved by applying functions to values to produce new values. The inputs and results are clearly defined, and there are no side effects. Languages with this property are called applicative. Value-oriented, applicative languages do not impose any sequencing constraints in addition to the basic data dependencies present in the algorithm. Functions must wait for all input values to be computed, but the order in which the functions are evaluated does not affect the final results. There is no notion of a central controller which initiates one statement at a time sequentially. The model described above can be applied to languages and architectures.

The computation specified by a program in a dataflow language can be represented as a data dependence graph, in which each node is a function and each arc carries a value. Very efficient execution of a dataflow program can be achieved on a stored program computer which has the properties of the dataflow model. The machine language for such a computer is dependence graph rather than the conventional sequence of instructions. There is no program counter in a dataflow computer. Instead, a mechanism is provided to detect when an instruction is enabled, that is, when all required input values are present. Enabled instructions, together with input values and destination addresses (for the result), are sent to processing elements. Results are routed to destinations, which may enable other instructions. This mode of execution is called data-driven.

Dataflow systems can overcome many of the disadvantages of conventional approaches. In principle, all the parallelism in the algorithm is exposed in the program and this the programmer does not have to deal with parallesism explicitly.

Several important problems remain to be solved in dataflow systems. The handling of complex data structures as values is inefficient, but there is no complete solution to this problem yet. Dataflow computers tend to have long pipelines, and this causes degraded performance if the application does not have sufficient parallelism. Since the programmer does not have explicit control over memory, “separate garbage collection” mechanisms must be implemented. The space-time overheads of managing low levels of parallelism have not been quantified. Thus, though the parallelism is exposed to the hardware, it has not been demonstrated that it can be effectively realized.