Proposal N3554 (A Parallel Algorithms Library) for C++14, proposes (among other things), what seem to be parallel versions of the current `std::partial_sum`

, e.g.:

```
template<
class ExecutionPolicy,
class InputIterator,
class OutputIterator,
class BinaryOperation>
OutputIterator inclusive_scan(
ExecutionPolicy &&exec,
InputIterator first,
InputIterator last,
OutputIterator result,
BinaryOperation binary_op);
```

With the explanation

Effects: For each iterator i in [result,result + (last - first)), performs *i = prefix_sum, where prefix_sum is the result of the corresponding sum init + *iter_0 + *iter_1 + *iter_2 + ... or binary_op(init, binary_op(*iter_0, binary_op(*iter_1, binary_op(*iter_2, ...))) for every iterator iter_j in the range [first,first + (i - result) - 1) ... The order of operands of the sum is unspecified.

How could this operation be made parallel? It seems like, almost by definition, each output prefix_sum must be calculated for the immediate next one to be calculated - essentially leading to a serial operation.

**Edit** Many thanks to Aasmund Eldhuset for his answer. Personally, I found "Prefix Sums and Their Applications" by Guy E. Blelloch to be very useful.