template< class ExecutionPolicy, class InputIterator, class OutputIterator, class BinaryOperation> OutputIterator inclusive_scan( ExecutionPolicy &&exec, InputIterator first, InputIterator last, OutputIterator result, BinaryOperation binary_op);
With the explanation
Effects: For each iterator i in [result,result + (last - first)), performs *i = prefix_sum, where prefix_sum is the result of the corresponding sum init + *iter_0 + *iter_1 + *iter_2 + ... or binary_op(init, binary_op(*iter_0, binary_op(*iter_1, binary_op(*iter_2, ...))) for every iterator iter_j in the range [first,first + (i - result) - 1) ... The order of operands of the sum is unspecified.
How could this operation be made parallel? It seems like, almost by definition, each output prefix_sum must be calculated for the immediate next one to be calculated - essentially leading to a serial operation.
Edit Many thanks to Aasmund Eldhuset for his answer. Personally, I found "Prefix Sums and Their Applications" by Guy E. Blelloch to be very useful.