Advanced¶
Generic Pipeline¶
This program allows to build and to execute generic chains directly from the command line interface (CLI). Basically, there are 3 different types of task that can be instantiated:
- First task: To be correct, a chain should start with a task that have no
input socket. As a consequence, there is only one first task and it is
possible to choose between
readandinitializetasks. - Middle task: A task that has an input and an output socket. It is up to
the user to decide the number and the combination of middle tasks he wants. It
is possible to select between
relay,relayf,incrementandincrementftasks. - Last task: To be correct, a chain should always end with a task that does
not have an output socket. As a consequence, there is only one last task and
it is possible to choose between
writeorfinalizetasks.
Here is a summary of the available tasks and their behavior:
read: Reads data from a binary file and writes the read bytes on its output socket.initializeorinit: Initializes the data in its output socket (useful for benchmark and validation).relay: Copies the data from its input socket into its output socket.relayf: Variant of therelaytask that uses a forward socket, consequently, this task does NOTHING.incrementorincr: Increments (+1) the data of its input socket and writes the result in its output socket.incrementforincrf: Variant of theincrementtask that uses a forward socket to write the result in place.write: Writes contents of its input socket into a binary file. It expects 0 or 1 values in its input socket to work correctly.finalizeorfin: Memorizes (= copies) the input data for further validation (if there is a validation).
There are three main ways of describing a processing chain:
-
Specification of homogeneous types of task per stage. This is performed with the combination of the
-R(or--tsk-types-sta) and-n(or--tsk-per-sta) CLI parameters.-Rgives the tasks type per stage (example of a 4-stage pipeline:-R (read,incr,relayf,write)) and-ngives how many tasks of the same type will be created per stage. For instance, the combination of-R (read,incr,relayf,write)and-n "1,2,3,1"will produce a 4-stage pipeline with the following sequence of tasks:read\(\rightarrow\)incr\(\rightarrow\)incr\(\rightarrow\)relayf\(\rightarrow\)relayf\(\rightarrow\)relayf\(\rightarrow\)write. -
Specification of heterogeneous types of task per stage. This is achieved with
-r(or--tsk-types) CLI parameter. For instance,-r ((init),(incrf,relay,incr),(fin))will produce a 3-stage pipeline with the following sequence of tasks:init\(\rightarrow\)incrf\(\rightarrow\)relay\(\rightarrow\)incr\(\rightarrow\)fin. -
Use of a scheduler to perform the pipeline decomposition in stages automatically. This is achieved with
-C(or--chain) CLI parameter. For instance,-C "(init,relayf_15,incrementf_S_60,relay_15,fin)"defines a chain that starts with aninitializetask, after that, arelayftask of 15 microseconds is executed, then anincrementftask of 60 microseconds is executed (note that the_Smeans that this task will be considered sequential and "non-replicable" for the scheduler). Finally, a 15 microsecondsrelaytask and afinalizetask are executed. By default the scheduler considers that the number of resources \(R\) is the number of CPU hardware threads but you can override this behavior by using the-t(or--n-threads) CLI parameter. It is also possible to choose the scheduler algorithm through the-S(or--sched) CLI parameter. For now, the available schedulers areOTACandFILE(please see the note below about the latest).
The first notation is a compressed way to describe chains of tasks. By default,
the chain is split in pipeline stages according to the given
decomposition (with -R and -r) and each stage is run on a separated thread.
It is also possible to run the chain in a sequence (with the -q
or --force-sequence CLI parameter). In this case, the given stage
decomposition is ignored and all the tasks of the chain are run by the same
thread.
Note
You cannot use -r and -R parameters at the same time, they are
exclusive.
Note
If StreamPU has been compiled with the CMake -DSPU_LINK_HWLOC=ON option,
then it is possible to specify an pinning policy with
the -P or --pinning-policy CLI argument.
Tip
For the initialize, increment, incrementf, relay, relayf and
finalize tasks it is possible to specify the duration. For instance,
relay_12 means that the relay task will spend 12 microseconds in active
waiting. This is different from using the -s CLI parameter. The -s
parameter will set the same duration for all the previously mentioned tasks.
Note
The scheduler FILE reads the scheduling from a JSON file, to set the path
to this file there is the -F parameter (or --sched-file).
The expected JSON file looks like the following:
{
"platform": "x7ti",
"resources": {
"p-core": {
"node-list": ["core0-5"],
"cluster-size": 1,
"smt": 2
},
"e-core": {
"node-list": ["core6-13"],
"cluster-size": 4,
"smt": 1
}
},
"scheduler_name": "HeRAD",
"date": "2025-07-23",
"schedule": [
{ "tasks": 5, "threads": 1, "core-type": "p-core", "pinning-policy": "packed", "sync_buff_size": 1, "sync_waiting_type": "active" },
{ "tasks": 1, "threads": 1, "core-type": "p-core", "pinning-policy": "packed", "sync_buff_size": 8, "sync_waiting_type": "passive" },
{ "tasks": 6, "threads": 1, "core-type": "p-core", "pinning-policy": "packed", },
{ "tasks": 4, "threads": 2, "core-type": "p-core", "pinning-policy": "guided", "sync_buff_size": 1, },
{ "tasks": 3, "threads": 7, "core-type": "e-core", "pinning-policy": "distant", "sync_waiting_type": "active" },
{ "tasks": 4, "threads": 2, "core-type": "p-core", "pinning-policy": "guided", }
]
}
schedule field, each line corresponds to one pipeline stage, the
field tasks counts the number of consecutive tasks of the current stage
while the field threads gives the number of threads to use for the
current stage. Four policies are available while using the pinning-policy
field: loose (do not pin), guided (pin to core type), packed (pin
following the ascending order of core ids given by the resources field) or
distant (pin in a round robin way between the clusters and packages). When
no pinning policy is specified, the loose policy is applied. Finally,
sync_buff_size and sync_waiting_type are optional and may help to fine
tune the synchronizations between the pipeline stages. The last stage should
not contain sync_buff_size or sync_waiting_type fields. Using the FILE
scheduler will override the following parameters: -u (or --buffer-size)
and -w (or --active-waiting).
Moreover, for each stage it is possible to specify the number of replications
(= number of threads that will execute the stage) with the -t
(or --n-threads) CLI parameter. Here are some examples of generated pipelines:
test-generic-pipeline: input/output sockets & 3-stage pipeline.
test-generic-pipeline: forward sockets & 3-stage pipeline.
test-generic-pipeline: hybrid in/out and forward sockets & 3-stage pipeline.
Command Line Arguments
The following verbatim is a copy-paste from the -h stdout:
usage: ./bin/test-generic-pipeline [options]
-t, --n-threads Number of threads to run in parallel for each stage [empty]
-f, --n-inter-frames Number of frames to process in one task [1]
-s, --sleep-time Sleep time duration in one task (microseconds) [5]
-d, --data-length Size of data to process in one task (in bytes) [2048]
-e, --n-exec Number of executions (0 means -> never stop because of this counter) [0]
-l, --n-exec-pro Number of executions during the scheduler profiling phase [100]
-u, --buffer-size Size of the buffer between the different stages of the pipeline [16]
-o, --dot-filepath Path to dot output file [empty]
-i, --in-filepath Path to the input file (used to generate bits of the chain) [empty]
-j, --out-filepath Path to the output file (written at the end of the chain) ["file.out"]
-c, --copy-mode Enable to copy data in sequence (performance will be reduced) [false]
-b, --step-by-step Enable step-by-step sequence execution (performance will be reduced) [false]
-p, --print-stats Enable to print per task statistics (performance will be reduced) [false]
-g, --debug Enable task debug mode (print socket data) [false]
-q, --force-sequence Force sequence instead of pipeline [false]
-w, --active-waiting Enable active waiting in the pipeline synchronizations [false]
-n, --tsk-per-sta The number of tasks on each stage of the pipeline [empty]
-r, --tsk-types The socket type of each task (SFWD or SIO) [empty]
-R, --tsk-types-sta The socket type of tasks on each stage (SFWD or SIO) [empty]
-C, --chain Description of the tasks chain (to be combined with '-S' param) [empty]
-S, --sched Scheduler algorithm for the pipeline creation ('OTAC', 'FILE') ["OTAC"]
-F, --sched-file File that contains the scheduling, to combine with 'FILE' scheduler ["sched.json"]
-v, --verbose Show information about the scheduling choices [false]
-h, --help This help [false]