Journal of Systems Architecture 160 (2025) 103349


                                                                 Contents lists available at ScienceDirect


                                                         Journal of Systems Architecture
                                                         journal homepage: www.elsevier.com/locate/sysarc


Real-time scheduling for multi-object tracking tasks in regions with different
criticalities
Donghwa Kang a , Jinkyu Lee b ,∗, Hyeongboo Baek c ,∗
a Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
b
    Sungkyunkwan University (SKKU), Suwon, South Korea
c
    University of Seoul (UOS), Seoul, South Korea


ARTICLE                 INFO                             ABSTRACT

Keywords:                                                Autonomous vehicles (AVs) utilize sensors such as LiDAR and cameras to iteratively perform sensing, decision-
Multi-object tracking                                    making, and actions. Multi-object tracking (MOT) systems are employed in the sensing stage of AVs, using these
Real-time scheduling                                     sensors to detect and track objects like pedestrians and vehicles, thereby enhancing situational awareness.
Timing guarantee
                                                         These systems must handle regions of varying criticality and dynamically shifting locations, all within limited
Criticality-awareness
                                                         computing resources. Previous DNN-based MOT approaches primarily focused on tracking accuracy, but timing
Autonomous driving
                                                         guarantees are becoming increasingly vital for autonomous driving. Although recent studies have introduced
                                                         MOT scheduling frameworks with timing guarantees, they are either restricted to single-camera systems or
                                                         fail to prioritize safety-critical regions in the input images. We propose CA-MOT, a Criticality-Aware MOT
                                                         execution and scheduling framework for multiple cameras. CA-MOT provides a control knob that balances
                                                         tracking accuracy in safety-critical regions and timing guarantees. By effectively utilizing this control knob,
                                                         CA-MOT achieves both high accuracy and timing guarantees. We evaluated CA-MOT’s performance using a
                                                         GPU-enabled embedded board commonly employed in AVs, with data from real-world autonomous driving
                                                         scenarios.


1. Introduction                                                                          pooling and convolutional layers) using CNN (convolutional neural
                                                                                         network)-based models (e.g., OS-Net [10]). For unmatched objects,
    Autonomous vehicles (AVs) are systems that iteratively perform                       location-based methods like intersection over union (IoU) are applied.
sensing, decision-making, and actions using various sensors such as                          MOT input images exhibit two key characteristics: (i) regions with
LiDAR, radar, inertial measurement units (IMU), and cameras [1].                         varying levels of criticality and (ii) dynamically shifting locations. With
Multi-object tracking (MOT) systems, used in the perception stage                        limited computing resources in AVs, it is crucial to deliver different
of AVs, track objects like pedestrians and cars, enhancing situational                   levels of service quality based on criticality. Safety-critical regions,
awareness. Since MOT information is periodically transferred to control                  where objects with a short time-to-collision (e.g., under 2 s) cluster,
tasks, timely execution must be guaranteed to ensure safety and prevent                  must be prioritized. If multiple clusters exist, the broader area en-
severe accidents [2–4]. Low accuracy, despite timely execution, may                      compassing them is considered the safety-critical region, as defined in
result in missed objects, thus compromising AVs’ safety [2,4,5]. There-
                                                                                         DNN-SAM [5]. Established methods compute time-to-collision using Li-
fore, AV MOT systems should ensure timing guarantees with maximized
                                                                                         DAR and IMU data; we follow the approach from DNN-SAM. This leads
accuracy.
                                                                                         to two requirements for criticality-aware MOT systems: (R1) accuracy
    Tracking-by-detection [6,7] is widely used due to its high accuracy
                                                                                         maximization for safety-critical regions and (R2) timing guarantees.
and ability to leverage state-of-the-art DNN-based detection models
                                                                                             Most existing DNN-based MOT approaches focus on accuracy [7,
(e.g., YOLO series [8], Faster R-CNN [9]). For each input image from
each camera, tracking-by-detection performs two tasks: detection and                     11,12], but timing guarantees are increasingly critical in autonomous
association. Detection uses DNN-based models to sense the motion                         driving. Recent research has proposed MOT resource scheduling frame-
information of objects, such as location and velocity, while association                 works that guarantee timing for every MOT execution [2,4]. How-
matches objects between frames based on extracted feature informa-                       ever, [2] overlooks safety-criticality, while [4] focuses on a single task.
tion (also called feature vectors or feature maps obtained through                       We address safety-criticality across multiple tasks, raising the following


     ∗ Corresponding authors.
       E-mail addresses: anima0729@kaist.ac.kr (D. Kang), jinkyu.lee@skku.edu (J. Lee), hbbaek359@gmail.com (H. Baek).

https://doi.org/10.1016/j.sysarc.2025.103349
Received 22 September 2024; Received in revised form 13 December 2024; Accepted 20 January 2025
Available online 28 January 2025
1383-7621/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
D. Kang et al.                                                                                                      Journal of Systems Architecture 160 (2025) 103349


challenges:

 C1. How to balance R1 and R2 to efficiently use limited computing
     resources.
 C2. How to achieve both R1 and R2 by effectively using the control
     knob developed from C1.
    In this paper, we propose CA-MOT, a Criticality-Aware MOT exe-
cution and scheduling framework for multiple MOT tasks. To address
C1, CA-MOT offers three execution options (low, middle, and high
workloads) to balance R1 and R2 for both detection and association.
To address C2, CA-MOT introduces the notion of aging for detection
and association sub-tasks, estimating the reliability of motion and
feature information over time. Balancing the aging of these tasks is
essential to achieve R1 and R2 with limited resources (to be discussed
in Section 3.4). Based on this, CA-MOT develops two scheduling algo-
rithms: EDF-BE and EDF-Slack. EDF-BE increases the workload of tasks
waiting in the ready queue for execution (referred to as active tasks)
without compromising the R2 bound when no other tasks are pending.
In contrast, EDF-Slack is designed to handle scenarios with multiple
active tasks.                                                                     Fig. 1. Tracking accuracy and execution time on different execution options of
                                                                                  detection and association.
    To validate CA-MOT’s performance in meeting R1 and R2, we
conducted extensive experiments on an NVIDIA Jetson Xavier using
the KITTI Dataset [13]. Additionally, we applied three detectors in our
experiments: YOLOv5 [14], YOLOX [8], and Faster-RCNN [9].                         a system that tracks specific objects moving between the fields of view
    The contributions of this paper are as follows:                               of multiple cameras (called hand-over), this is beyond the scope of
                                                                                  our work. This paper focuses on dividing the multi-object tracking task
    • We motivate the importance of balancing between aging of de-                into two subtasks (i.e., detection and association) and using DNN-based
      tection and association to achieve R1 and R2 (Section 2).                   MOT-specific properties (i.e., reuse of motion and feature information)
    • We propose a new system design, CA-MOT that addresses R1 and                to achieve R1 and R2 under limited resources.
      R2 considering varying levels of criticality in different regions for
      multiple MOT tasks (Section 3).                                             2.2. Trade-off between accuracy and execution time
    • We develop new scheduling algorithms to effectively achieve R1
      and R2 by balancing between aging of detection and association
                                                                                      To address C1, we consider two factors: (i) the input image size
      for each MOT task (Section 4).
                                                                                  and detection within the safety-critical region, and (ii) the number of
    • We demonstrate the effectiveness of CA-MOT in achieving R1 and
                                                                                  objects used for feature extraction during association across all detected
      R2 using a real-world self-driving dataset (Section 5).
                                                                                  objects in each frame.
                                                                                      Fig. 1(a) compares the multi-object tracking accuracy (MOTA) [15]
2. Motivation                                                                     for the overall and safety-critical regions (referred to as overall and
                                                                                  critical accuracy) and the execution time for a single MOT task using
   This section presents target systems and motivates the system de-              three input image sizes (256 × 256, 416 × 416, 672 × 672). Overall
sign of CA-MOT to address C1 and C2 based on measurement-based                    accuracy considers all objects, while critical accuracy focuses on the
observations.                                                                     safety-critical region. YOLOv5 [14] is used for detection, and features
                                                                                  are extracted for all detected objects. The KITTI dataset [13] is used.
2.1. Target system
                                                                                  For image sizes 256 × 256 and 416 × 416, detection is performed
                                                                                  on a cropped region of interest (RoI) that includes the safety-critical
    CA-MOT targets 2D MOT systems on AVs equipped with multiple
                                                                                  region. If the RoI is smaller, it is resized to include the critical region;
camera sensors. Each MOT task performs MOT execution on consecu-
                                                                                  otherwise, the critical region is cropped accordingly. The safety-critical
tive input frames received from the corresponding camera sensor at a
                                                                                  region will be defined in Section 3. For the 672 × 672 size (i.e., the
predetermined period. As this recurring task is required to complete
                                                                                  original input size), detection occurs without cropping.
a job within a specified deadline, each MOT task is considered a
                                                                                      As shown in Fig. 1(a), reducing image size leads to a notable
real-time task with a period and deadline. CA-MOT employs tracking-
                                                                                  decrease in overall accuracy, while critical accuracy decreases less
by-detection comprising two steps of MOT execution: detection and
association. The front-end detector performs detection by exploiting              significantly due to prioritization of the safety-critical region in the RoI.
the existing stand-alone DNN-based detector to identify the position              Additionally, execution time decreases as the image size is reduced,
and class of objects in the input image. Using the locations of detected          demonstrating a trade-off between R1 and R2 when focusing on the
objects, the feature extractor (e.g., the deployed CNN model such as              critical region.
OSNet) extracts features (i.e., feature vectors or feature maps) for each             Fig. 1(b) shows the impact of varying the number of objects used
object. These features capture the visual characteristics of each object.         for feature extraction on accuracy and execution time, with the image
The back-end tracker compares the feature similarities between objects            size fixed at 672 × 672. The number of objects ranges from zero, three,
in the current frame and the previous frame, matching objects with                and more than three. OS-Net [10] is used for feature extraction. As
high similarity. For any remaining unmatched objects, a location-based            shown in Fig. 1(b), as the number of objects with feature extraction
matching method such as IoU is applied. The tracker then stores the               increases, both overall and critical accuracy improve, but this also leads
motion information (position and velocity) and the features of each               to increased execution time. This highlights a trade-off between R1 and
object in preparation for the next frame.                                         R2 based on the number of objects considered for feature extraction.
    We assume a system in which each camera independently tracks                      Section 3.3 details the MOT execution pipeline of CA-MOT, which
objects moving within its field of view. While it is possible to consider         leverages these observations to effectively address C1.

                                                                              2
D. Kang et al.                                                                                                             Journal of Systems Architecture 160 (2025) 103349


Fig. 2. System design of CA-MOT: the key features are (a) an aging-aware scheduler that provides timing guarantees and a criticality-aware flexible MOT execution pipeline
including (b) a detection module that accommodates varying input sizes and (c) an association module that handles a varying number of objects for feature extraction.


2.3. Different combination of detection and association                                     • It provides a timing guarantee while providing prioritized track-
                                                                                              ing accuracy for the safety-critical regions by exploiting an MOT-
    To address C2, Fig. 1(c) reveals an intriguing observation that differ-                   specific property.
ent combinations of image sizes and the number of feature extractions
                                                                                            To address the first goal, CA-MOT implements a criticality-aware
yield distinct effects on accuracy and execution time. The experiment
                                                                                        flexible MOT execution pipeline in which detection and association
was conducted over 100 consecutive frames.
                                                                                        are performed with different execution option by leveraging the ob-
    In Fig. 1(c), the execution of detection or association is denoted by
                                                                                        servations discussed in Section 2.2. To address the second goal, CA-
𝑃 or 𝐹 . 𝑃 represents partial computation, where detection is performed                 MOT develops an aging-aware task-level scheduler to provide accuracy
only on the region of interest (RoI) at a size of 256 × 256, including                  maximization while providing a timing guarantee by exploiting the
a safety-critical region, and association is limited to location-based                  observations discussed in Section 2.3 building upon the MOT execution
association without feature extraction. 𝐹 represents full computation,                  pipeline. The MOT execution pipeline and the scheduler are imple-
where detection is performed on the entire image at a size of 672 × 672,                mented as separate threads, and they are communicated with shared
and association includes feature extraction for all objects. The number                 memory.
in the upper right of the notation indicates how many times the combi-                      CA-MOT does not require modifications to existing DNN models
nation of detection and association has been performed. For example,                    (e.g., detectors and feature extractors), which allows for reusing most
the notation 𝐹 𝐹 50 𝑃 𝑃 50 indicates that we use 𝐹 for both the detection               (if not all) stand-alone detectors and feature extractors. Notably, state-
and association steps in the first 50 frames and 𝑃 for both phases in the               of-the-art detectors like YOLOv5 are inherently designed to handle
remaining 50 frames. To mitigate the issue of objects outside the critical              varying input image sizes, and all CNN models can perform batch exe-
region not being detected due to cropping, which can decrease the                       cution on multiple images (each corresponding to a different object). As
accuracy of the non-critical region, we utilize the position information                shown in Fig. 2, the key features of CA-MOT are: (a) a scheduling policy
of objects from the previous frame as predicted position information for                that selects one input per camera, provides timing guarantees, and
the current frame using a prediction model such as Kalman filter [16]                   adjusts the workload for detection and association; (b) a module that
during the execution of 𝑃 in the detection step. Except for 𝑃 𝑃 100 and                 processes detection with inputs of varying sizes; and (c) a module that
𝐹 𝐹 100 , all combinations have the same proportion of 𝐹 and 𝑃 for the                  extracts features from a pre-determined number of detected objects.
entire frames.
    As shown in Fig. 1(c), although 𝐹 𝐹 50 𝑃 𝑃 50 and 𝑃 𝐹 100 have similar              3.2. Workflow
execution times with (𝑃 𝐹 + 𝐹 𝑃 )100 and (𝐹 𝐹 + 𝑃 𝑃 )100 , they show lower
tracking accuracy. This indicates that different combinations of 𝐹 and                      Fig. 2 presents the workflow of CA-MOT. During system operation,
𝑃 can have a varying impact on accuracy. The observation in Fig. 1(c)                   the task scheduler maintains a queue to store images periodically
necessitates a new scheduler that is capable of obtaining high tracking                 received from each camera sensor (⃝).  1   Then, the task scheduler deter-
accuracy by capturing an MOT-specific property referred to as aging,                    mines the following for tasks in the queue: (a) the task to be scheduled,
which will be detailed in Section 3.4.                                                  (b) the execution option for the detector, and (c) the execution option
                                                                                        for the association (⃝).
                                                                                                               2   After an image moves to the MOT execution
3. System design of CA-MOT                                                              pipeline, the critical region identification module identifies a safety-
                                                                                        critical region from the image and crops (or not) a RoI including the
   This section presents the goal and design of CA-MOT to address C1                    safety-critical region according to the execution option for the detector
and C2.                                                                                 (⃝).
                                                                                          3    Depending on the execution option, the cropped RoI or entire
                                                                                        image is processed for detection (⃝).4   Furthermore, depending on the
3.1. System overview                                                                    number of objects for which features are extracted, CA-MOT selectively
                                                                                        extracts features for detected objects (⃝).
                                                                                                                                 5   All detected objects are then
    CA-MOT utilizes a tracking-by-detection approach, consisting of                     matched with the tracked objects from the previous frame. If both the
two steps: detection and association, where the front-end detector em-                  detected and tracked objects have feature vectors, they are associated
ploys a pre-existing DNN-based detector to detect and classify objects                  through feature-based matching. Otherwise, they are associated solely
in the input image, and the unmatched objects are matched using a                       based on their locations (⃝).
                                                                                                                    6

location-based method like IoU. CA-MOT aims at providing prioritized
tracking accuracy for the safety-critical region with a timing guarantee                3.3. Criticality-aware flexible MOT execution pipeline
for every MOT execution on limited computing resources by addressing
C1 and C2 discussed in Section 1, which has the following design goals.                     The MOT execution pipeline conducts detection and association
                                                                                        sequentially. CA-MOT can employ any existing stand-alone DNN-based
    • It provides different execution options not only for detection but                detectors as long as it can accommodate different sizes of input images
      also association considering different criticality of regions in input            (e.g., YOLO series) and offer a clear trade-off between accuracy and
      images.                                                                           execution time. For each input image with a size of 672 × 672, the

                                                                                    3
D. Kang et al.                                                                                                       Journal of Systems Architecture 160 (2025) 103349


detector performs the detection to identify the location and class of              association for each task at every scheduling decision. The scheduler
multiple objects in the image. Once the scheduler determines the task              manages MOT tasks using a single queue and is triggered when a task
(associated with an input image) to be scheduled and the execution                 completes its execution or a new task is released. As three execution
option for the tasks, the detection is performed for the task according            options (e.g., low, middle, and high workloads) are provided for each
to the execution option. CA-MOT provides three execution options                   detection and association under CA-MOT, the scheduler decides the
(i.e., low, middle, and high workloads, respectively) providing a trade-           image size (e.g., 256 × 256, 416 × 416, and 672 × 672) for detection
off between execution time and accuracy. For low and middle workload               and feature size (e.g., zero, three, and more than three) for association
detections, CA-MOT first identifies the RoI with sizes of 256 × 256 and            according to the scheduling algorithms (to be presented in Section 4).
416 × 416, respectively, and then detection is performed on cropped                    As discussed in Section 2.3, various combinations of image sizes
RoI, which includes the safety-critical region. The area outside the               and the quantity of feature extractions result in different impacts on
RoI is not subject to detection, and the motion information (e.g., size,           tracking accuracy. This is due to an important property of the MOT
position, velocity, direction) of objects detected in the previous frame           system, which involves supplementing non-updated motion or feature
is used in the prediction models such as the Kalman filter to obtain               information during detection and association in the current frame by
the estimated information of objects in the current frame. On the other            utilizing information from the previous frame. For example, in scenar-
hand, high-workload detection is performed on the original image with              ios with low and middle workload detection, the detection process does
a size of 672 × 672.                                                               not cover the area outside the RoI. Instead, the motion information
    We define the area that encompasses all safety-critical objects,               of objects detected in the previous frame, such as their size, position,
which are objects with a time-to-collision of less than two seconds, as            velocity, and direction, is leveraged to estimate the corresponding
the safety-critical area. If the safety-critical area exceeds the input size       information for objects in the current frame. Moreover, during the as-
for the detector, as determined by the detection process (e.g., 256 × 256          sociation step, if the feature extracted from the immediately preceding
or 416 × 416), the safety-critical area is cropped and resized to the              frame is unavailable due to low- and middle-workload associations,
corresponding dimensions before being fed into the detector model.                 the feature-based matching algorithm compares the features extracted
The locations of safety-critical objects are determined based on their             from objects in the current frame with the features extracted from the
most recently computed positions, without projecting future safety-                nearest past frames. Therefore, the tracking accuracy is determined
critical regions from them. There are numerous existing approaches                 by the reliability of the reused motion and feature information of the
that calculate time-to-collision based on the relative positions of objects        objects. To capture the reliability of the motion and feature information
and the ego vehicle given LiDAR and IMU data, and we assume the use                of objects, we propose a new notion of aging that specifies the number
of one such method. It is also important to note that the KITTI dataset            of middle- or high-workload executions of detection and association
provides both LiDAR and IMU data. For example, areas where objects                 conducted from the beginning of the MOT task, respectively. In order
with a time-to-collision of less than 2 s congregate can be defined as             to update the motion and feature information as frequently as possi-
safety-critical regions, and if multiple such areas exist, the encompassing        ble using limited computing resources, it is necessary to balance the
area that includes all of them would be considered the safety-critical             aging of detection and association for each task. Note that increasing
region. Please note that we adhere to the definition of the safety-critical        the aging of detection and association for all tasks simultaneously in
region as defined in the existing paper DNN-SAM in [5]. It is assumed              every MOT execution is generally not feasible due to limited com-
that the critical region is pre-calculated by external sensors such as             puting resources. Therefore, a mechanism is required to balance the
LiDAR and IMU and provided to CA-MOT. If an input image does not                   aging of detection and association for all tasks while providing timing
have a critical region, the entire frame is considered a critical region.          guarantees under constrained resources. To this end, we propose new
As seen in Fig. 2, GPU is used only for the inference of DNN models,               scheduling algorithms that will be detailed in the next section.
such as the detector (e.g., YOLOv5) and feature extractor (e.g., OSNet),
while all other execution is performed on the CPU.                                 4. Scheduling algorithm
    For the association, the MOT system uses the two-step approach [7].
Initially, a CNN-based model (e.g., OS-Net [10]) is employed by the                   This section presents a task model and proposes new scheduling
tracker to extract features from the detected objects. The tracker then            algorithms building upon CA-MOT.
compares these features between the current and previous frames to
identify object pairs with the highest feature similarity. For the re-             4.1. Task model
maining objects that are not matched based on feature comparison, a
location-based matching method such as IoU (intersection over union)                  Targeting MOT systems in AVs that involve 𝑛 camera sensors, we
is used. CA-MOT also provides three execution options (i.e., low,                  consider a set 𝜏 consisting of 𝑛 MOT tasks denoted as 𝜏𝑖 ∈ 𝜏. Each
middle, and high workloads, respectively) for the association. When                MOT task 𝜏𝑖 is responsible for conducting MOT execution using input
it comes to the middle and high workload associations, the tracker                 images provided periodically by each camera sensor. As we employ
extracts features from some (e.g., three) of the detected objects or               the methodology of tracking-by-detection, an MOT task consists of
all of the detected objects, and then performs consecutive feature-                detection and association sub-tasks. Thus, the specification of each
based and location-based matchings. On the other hand, low-workload                MOT task 𝜏𝑖 is given as 𝜏𝑖 = (𝑇𝑖 , 𝐶𝑖 (𝑠𝑖 , 𝑓𝑖 ), 𝐷𝑖 ), where 𝑇𝑖 represents the
association performs location-based matching only. Depending on the                period (or the minimum inter-arrival time), 𝐶𝑖 (𝑠𝑖 , 𝑓𝑖 ) denotes the worst-
execution option, CA-MOT may extract features from only a subset or                case execution time (WCET) based on the execution options (i.e., low,
all of the detected objects, which means that the feature information              middle, and high-workload execution) for detection and association
of objects may not be updated every time. Therefore, during feature-               sub-tasks, respectively, and 𝐷𝑖 indicates the relative deadline. The
based matching, the algorithm compares the features extracted from                 execution time of the detection sub-task depends on the image size
the objects in the current frame with the closest previously extracted             𝑠𝑖 ∈ 𝑆𝑖 = {𝐿, 𝑀 , 𝐻}, where 𝐿, 𝑀 , 𝐻 are 256 × 256, 416 × 416, and
features of the tracked objects and matches the two objects with the               672 × 672, respectively. Note that CA-MOT supports arbitrary non-
highest feature similarity.                                                        decreasing sizes for 𝑆𝑖 = {𝐿, 𝑀 , 𝐻}. On the other hand, the execution
                                                                                   time of the association sub-task depends on the feature size 𝑓𝑖 ∈
3.4. Aging-aware task scheduler                                                    𝐹𝑖 = {𝐿, 𝑀 , 𝐻}, where 𝐿, 𝑀 , 𝐻 are zero, from one to three, and more
                                                                                   than three, respectively. Note that the tracking-by-detection methodol-
   The CA-MOT implements a thread-level task scheduler to determine                ogy performs the association phase sequentially through feature-based
the task to be scheduled and execution options for detection and                   matching followed by location-based matching using IoU. If 𝑓𝑖 is equal

                                                                               4
D. Kang et al.                                                                                                                       Journal of Systems Architecture 160 (2025) 103349

Table 1                                                                                             On the other hand, the worst-case execution time 𝐶𝑖𝐴 (𝑓𝑖 ) of the associa-
Notations used in the scheduling algorithms.
                                                                                                    tion task depends on the feature size 𝑓𝑖 . It is calculated by considering
 Symbol                       Description                                                           the time required for extracting features from detected objects and
 𝜏𝑖                           Task 𝑖 in the system                                                  performing matching methods such as feature-based and IoU-based
 𝑇𝑖                           Period of task 𝜏𝑖 (minimum inter-arrival time)
                                                                                                    matching. An MOT task 𝜏𝑖 is considered schedulable if every job 𝐽𝑖
 𝐷𝑖                           Relative deadline of task 𝜏𝑖
                                                                                                    (invoked by 𝜏𝑖 ) completes its execution within the relative deadline 𝐷𝑖 .
 𝐶𝑖 (𝑋 , 𝑌 )                  Worst-case execution time (WCET) of task 𝑖.
                                                                                                    The overall schedulability of the system is determined by ensuring that
                              𝑋: image size for detection (𝐿, 𝑀 , 𝐻)
                              𝑌 : feature size for association (𝐿, 𝑀 , 𝐻)                           every task 𝜏𝑖 ∈ 𝜏 is schedulable.
 𝑠𝑖                           Image size for the detection sub-task of task 𝑖
 𝑓𝑖                           Feature size for the association sub-task of task 𝑖                   4.2. EDF best-effort
 𝑆𝑖                           Set of image size options for task 𝑖 (𝑆𝑖 = {𝐿, 𝑀 , 𝐻})
 𝐹𝑖                           Set of feature size options for task 𝑖 (𝐹𝑖 = {𝐿, 𝑀 , 𝐻})                  Building upon the system design of CA-MOT presented in Section 3,
 𝐿                            Low workload execution
                                                                                                    we develop two scheduling algorithms that aim to provide not only
 𝑀                            Middle workload execution
 𝐻                            High workload execution                                               high tracking accuracy for the safety-critical regions but also a tim-
 𝐶𝑖𝐷 (𝑠𝑖 )                    WCET of the detection sub-task of task 𝑖, based on image
                                                                                                    ing guarantee for every MOT execution. To this end, the proposed
                              size 𝑠𝑖                                                               scheduling algorithms have the following two features: (F1) an offline
 𝐶𝑖𝐴 (𝑓𝑖 )                    WCET of the association sub-task of task 𝑖, based on                  timing guarantee for the minimum execution (i.e., low-workload exe-
                              feature size 𝑓𝑖                                                       cution for both detection and association) of every MOT execution and
 𝑅𝐶𝑖 (𝐿, 𝐿)                   Remaining execution time for the minimum execution of                 (F2) an online policy to maximize tracking accuracy by systematically
                              task 𝑖                                                                increasing workload (i.e., middle- or high-workload execution) of an
 𝑎𝑔 𝑒𝐷                        Aging value of the detection sub-task of task 𝑖                       MOT execution using notions of slack and aging without compromising
     𝑖
 𝑎𝑔 𝑒𝐴
     𝑖
                              Aging value of the association sub-task of task 𝑖                     timing guarantee.
 𝑠𝑙𝑎𝑐 𝑘𝑖𝑡                     Slack time available for task 𝑖 at the current time 𝑡𝑐 𝑢𝑟                 The proposed scheduling algorithms are based on the non-preem-
            𝑐 𝑢𝑟

 𝑞𝑖                           Minimum execution time of task 𝑖 in the interval                      ptive earliest deadline first (EDF) scheduling algorithm, which assigns
                              [𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 )]                                                 higher priority to jobs with earlier deadlines without allowing any
 𝑝                            Sum of the minimum execution times for all tasks                      preemption. To provide the first feature F1, CA-MOT employs the
 𝑑1 (𝑡𝑐 𝑢𝑟 )                  Earliest deadline or future release time at time instant 𝑡𝑐 𝑢𝑟        existing schedulability analysis developed for non-preemptive EDF as
 𝑠𝑙𝑎𝑐 𝑘𝐷−                     Remaining slack after executing high-workload detection               follows.
       𝑖
                              for task 𝑖
 𝑠𝑙𝑎𝑐 𝑘𝐴−                     Remaining slack after executing high-workload                         Lemma 1. For a set 𝜏 of MOT tasks scheduled by non-preemptive EDF,
       𝑖
                              association for task 𝑖                                                minimum execution 𝐶𝑖 (𝐿, 𝐿) of every task 𝜏𝑖 ∈ 𝜏 can be executed without
                                                                                                    deadline miss as long as the following holds for every task 𝜏𝑖 ∈ 𝜏.
                                                                                                    max𝜏𝑖 𝐶𝑖 (𝐿, 𝐿) ∑ 𝐶𝑖 (𝐿, 𝐿)
                                                                                                                   +                ≤ 1.0                                (2)
to 𝐿, this indicates that no feature extraction has been performed for                                 min𝜏𝑖 𝑇𝑖       𝜏 ∈𝜏
                                                                                                                             𝑇𝑖
                                                                                                                      𝑖
the frame, and thus, feature-based matching is skipped, proceeding
directly to location-based matching. In the case of 𝐻, we employ the                                Proof. The lemma presents a schedulability condition for non-preem-
maximum number of objects as defined by the environment (for the                                    ptive EDF, and its proof is outlined as follows. Let us target 𝜏𝑘 ∈ 𝜏;
dataset considered, this is based on values measured across all videos),                            also, consider a virtual task 𝜏𝑥 ∉ 𝜏, whose 𝑇𝑥 and 𝐶𝑥 (𝐿, 𝐿) are set
for example, 10. Then, the worst-case execution time 𝐶𝑖 (𝑠𝑖 , 𝑓𝑖 ) of each                          to min𝜏𝑖 ∈𝜏 𝑇𝑖 and max𝜏𝑖 ∈𝜏 𝐶𝑖 (𝐿, 𝐿), respectively. Now, we compare the
MOT task 𝜏𝑖 is derived as follows.                                                                  finishing time of a job of 𝜏𝑘 when (Case 1) 𝜏 is scheduled by non-
                                                                                                    preemptive EDF, and (Case 2) 𝜏 ∪ {𝜏𝑥 } is scheduled by preemptive EDF.
𝐶𝑖 (𝑠𝑖 , 𝑓𝑖 ) = 𝐶𝑖𝐷 (𝑠𝑖 ) + 𝐶𝑖𝐴 (𝑓𝑖 ),                                                    (1)
                                                                                                    Since at most one lower-priority job can block a high-priority job under
where 𝐶𝑖𝐷 (𝑠𝑖 ) and 𝐶𝑖𝐴 (𝑓𝑖 ) are the worst-case execution times of detection                       non-preemptive scheduling, 𝜏𝑘 can be blocked by at most one lower-
and association sub-tasks according to 𝑠𝑖 and 𝑓𝑖 , respectively. As shown                           priority job under Case 1; obviously, the WCET of the lower-priority job
in Fig. 2, both the detection and the association sub-tasks involve GPU                             is upper-bounded by max𝜏𝑖 ∈𝜏 𝐶𝑖 (𝐿, 𝐿). Also, to block all the following
operations, with their respective WCETs including the communication                                 jobs of 𝜏𝑘 , the blocking frequency should be no smaller than 𝑇𝑘 , which
costs between the CPU and GPU. Note that the detection sub-task,                                    is lower-bounded by min𝜏𝑖 ∈𝜏 𝑇𝑖 . Therefore, the finishing time of a job of
denoted as 𝜏𝑖𝐷 , and the association sub-task, denoted as 𝜏𝑖𝐴 , are executed                        𝜏𝑖 under Case 1 is no later than that under Case 2. Once we apply the
consecutively without any preemption while sharing the same period                                  well-known schedulability condition for preemptive EDF to Case 2, the
and relative deadline. Similarly, when an active task is running, it                                condition is the same as Eq. (2), which proves the lemma. □
executes without any interruptions, while other tasks wait in the queue.
                                                                                                        Note that the proof is self-contained, but a different proof for
In addition, each task runs on an environment where non-preemption
between the GPU and CPU is guaranteed. To ensure this, while the                                    Lemma 1 can be found in [5,17].
CPU is running, the GPU waits for input from the CPU. Once the GPU                                      To provide the second feature F2, the proposed scheduling al-
receives the input and is activated, the CPU waits until it receives the                            gorithms (i) dynamically increase the workload of each MOT task
results from the GPU, as illustrated in Fig. 2. As seen Fig. 2, GPU                                 (e.g., from low workload to middle or high workload) without compro-
is used only for the inference of DNN models, such as the detector                                  mising the timing guarantee while (ii) balance the aging of detection
(e.g., YOLOv5) and feature extractor (e.g., OSNet), while all other                                 and association of every task. We propose two scheduling algorithms
execution is performed on the CPU. Also, CA-MOT does not allow                                      that simultaneously provide (i) and (ii) in different ways: EDF-BE
parallel execution for multiple MOT executions (see Table 1).                                       (EDF Best-Effort) and EDF-Slack (EDF with Slack reclamation), adapted
    The worst-case execution time 𝐶𝑖𝐷 (𝑠𝑖 ) of the detection sub-task is                            from [5]. EDF-BE and EDF-Slack utilize slacks defined differently, but
determined by the sum of various components, including preprocessing                                use the same mechanism (in Algorithm 2) to decide on the execution
time (such as cropping and resizing the input image), image transfer                                option that employs a notion of aging.
time from CPU memory to GPU memory, model inference time to                                             Let 𝑑1 (𝑡𝑐 𝑢𝑟 ) be the earliest deadline or future release time among
                                                                                                                                                          𝑡
obtain candidate objects, and postprocessing time (e.g., applying non-                              all tasks at a time instant 𝑡𝑐 𝑢𝑟 . The slack 𝑠𝑙𝑎𝑐 𝑘𝑖𝑐 𝑢𝑟 of task 𝜏𝑖 at 𝑡𝑐 𝑢𝑟
maximum suppression) to extract the final objects from the candidates.                              under the EDF-BE is defined as the expected remaining time up to

                                                                                                5
D. Kang et al.                                                                                                                     Journal of Systems Architecture 160 (2025) 103349


                                                                                          Algorithm 1 Slack calculation for 𝜏𝑘 at 𝑡𝑐 𝑢𝑟 under EDF-Slack
                                                                                          Input: 𝜏, 𝑡𝑐 𝑢𝑟
                                                                                                          𝑡
                                                                                          Output: 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟
                                                                                           1: 𝑝 = 0, 𝑈 = the left-hand-side of Equation (2)
                                                                                           2: for 𝑖 = 𝑛 to 1, 𝜏𝑖 ∈ {𝜏1 , ..., 𝜏𝑛 |𝑑1 (𝑡𝑐 𝑢𝑟 ) ≤ ⋯ ≤ 𝑑𝑛 (𝑡𝑐 𝑢𝑟 )} do
                                                                                                           𝐶𝑖 (𝐿, 𝐿)
                                                                                           3:     𝑈 =𝑈−
                                                                                                               𝑇𝑖
                                                                                           4:   𝑞𝑖 = max(0, 𝑅𝐶𝑖 (𝐿, 𝐿) − (1 − 𝑈 ) ⋅ (𝑑𝑖 (𝑡𝑐 𝑢𝑟 ) − 𝑑1 (𝑡𝑐 𝑢𝑟 )))
                                                                                                         (             𝑅𝐶𝑖 (𝐿, 𝐿) − 𝑞𝑖 )
                                                                                           5:   𝑈 = min 1.0, 𝑈 +
                                                                                                                     𝑑𝑖 (𝑡𝑐 𝑢𝑟 ) − 𝑑1 (𝑡𝑐 𝑢𝑟 )
                                                                                           6:   𝑝 = 𝑝 + 𝑞𝑖
                                                                                           7: end for
                                                                                                             𝑡
                                                                                           8: return 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 = 𝑑1 (𝑡𝑐 𝑢𝑟 ) − 𝑡𝑐 𝑢𝑟 − 𝑝


                                                                                          that does not exceed the earliest deadline or future release, ensuring
Fig. 3. Execution timeline of multiple MOT tasks under (a) baseline (non-preemptive       execution without deadline misses. The following lemma present the
EDF), (b) EDF-BE, and (c) EDF-Slack scheduling policies.
                                                                                          timing guarantee of EDF-BE.

                                                                                          Theorem 1.      A task set 𝜏 that satisfies the condition in Eq. (2) is
𝑑1 (𝑡𝑐 𝑢𝑟 ) after the execution of 𝐶𝑖 (𝐿, 𝐿) is completed, which is calculated            schedulable by EDF-BE .
by 𝑑1 (𝑡𝑐 𝑢𝑟 ) − 𝑡𝑐 𝑢𝑟 − 𝐶𝑖 (𝐿, 𝐿). This slack value is only valid when there
are no more than two tasks in the waiting queue at time 𝑡𝑐 𝑢𝑟 and no                      Proof. According to Lemma 1, for a task set 𝜏 that satisfies Eq. (2),
future releases within the interval [𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 )). Using the slack value          the minimum execution time 𝐶𝑖 (𝐿, 𝐿) of all tasks 𝜏𝑖 ∈ 𝜏 guarantees
conditionally provided at a scheduling decision, EDF-BE can perform                       execution without deadline misses. At each scheduling decision at 𝑡
middle- or high-workload execution for detection and/or association.                      under the online policy of EDF-BE, the execution of a job exploiting
                                                                                          any slack value does not impose additional inference on any other job.
Example. Figs. 3(a) and (b) present a scheduling scenario of the                          This guarantees that all tasks 𝜏𝑖 receive no more interference than what
baseline algorithm (i.e., non-preemptive EDF) and EDF-BE with an                          they would receive under non-preemptive EDF scheduling. Thus, this
example task set. We consider an example task set 𝜏 = {𝜏1 , 𝜏2 } of                       theorem holds. □
which 𝐶𝑖 = 𝐶𝑖𝐷 (𝐻) + 𝐶𝑖𝐴 (𝐻) = 25, 𝑇𝑖 = 25, 𝐶𝑖𝐷 (𝑠𝑖 ) = {5, 9, 12}, and
𝐶𝑖𝐴 (𝑓𝑖 ) = {3, 8, 13} hold for 𝜏𝑖 ∈ 𝜏. As shown in Figs. 3(a) and (b), each              4.3. EDF with slack reclamation
first job of 𝜏1 and 𝜏2 are released at 𝑡 = 0 and 𝑡 = 13, respectively. In
the baseline algorithm, the first job of 𝜏1 executes for 25 time units,                       In the case of EDF-BE, more workload than the minimum execution
and then the first job of 𝜏2 starts its execution at 𝑡 = 25 resulting in                  can only be processed when there is a single job in the waiting queue at
a deadline miss at 𝑡 = 38. Let 𝑎𝑔 𝑒𝐷               𝐴
                                         𝑖 and 𝑎𝑔 𝑒𝑖 be the aging value of                a given time 𝑡𝑐 𝑢𝑟 and no additional releases occur until 𝑑1 (𝑡𝑐 𝑢𝑟 ). This cre-
detection and association of 𝜏𝑖 . The aging value is an integer satisfying                ates a limited opportunity for MOT tasks in CA-MOT to perform more
𝑎𝑔 𝑒𝐷       𝐴                𝐷          𝐴
    𝑖 , 𝑎𝑔 𝑒𝑖 ≥ 0, and 𝑎𝑔 𝑒𝑖 and 𝑎𝑔 𝑒𝑖 for all task 𝜏𝑖 ∈ 𝜏 are set to zero                workload than the minimum execution, thus restricting the potential
at the beginning of the system. The, 𝑎𝑔 𝑒𝐷               𝐴
                                              𝑖 (and 𝑎𝑔 𝑒𝑖 ) increases by one             to improve tracking accuracy. To address this limitation, we integrate
at each time when a detection (and association) is run with middle- or                    the approach presented in [5] into EDF-Slack, allowing it to compute
high-workload. In other words, the aging value refers to the number of                    slack in a different way than EDF-BE.
executions excluding those with low workloads. By adjusting the aging                         Let 𝑑𝑖 (𝑡𝑐 𝑢𝑟 ) denote the 𝑖th earliest deadline or release time at 𝑡𝑐 𝑢𝑟 , and
value, a balance is maintained so that neither detection nor association                  𝑅𝐶𝑖 (𝐿, 𝐿) represent the remaining execution time required to complete
becomes disproportionately large.                                                         the minimum execution 𝐶𝑖 (𝐿, 𝐿). Algorithm 1 outlines the calculation
                                                                                                                         𝑡
                                                                                          of the slack value 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 for task 𝜏𝑘 at 𝑡𝑐 𝑢𝑟 within the EDF-Slack
    Compared to EDF-Slack, EDF-BE is a simpler algorithm that utilizes
                                                                                          algorithm, triggered at each scheduling decision. Since EDF is a job-
as many resources as possible, executing a job for greater than 𝐶𝑖 (𝐿, 𝐿)
                                                                                          level fixed-priority scheduling policy, wherein the priority of a job
up to its closest future release only when there is exactly one job in the
                                                                                          remains constant throughout its execution, scheduling decisions under
waiting queue. EDF-BE naturally guarantees no deadline misses in any
                                                                                          EDF occur either at the commencement of a job’s execution or upon its
job execution. This is because, as stated in Lemma 1, the execution of
                                                                                          completion. In the interval [𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 )], EDF-Slack processes tasks in
𝐶𝑖 (𝐿, 𝐿) without deadline misses for all jobs is guaranteed under EDF.
                                                                                          reverse EDF order, starting from the task with the latest deadline. Job
Furthermore, when a job executes for more than 𝐶𝑖 (𝐿, 𝐿) under EDF-
                                                                                          𝐽𝑘 of 𝜏𝑘 has the highest priority at 𝑡𝑐 𝑢𝑟 , with 𝑑1 (𝑡𝑐 𝑢𝑟 ) being its deadline,
BE, there is only one active job in the waiting queue at that time. In
                                                                                          as EDF-Slack follows the EDF policy. The goal of the slack calculation
the case of EDF-BE, when the first job of task 𝜏1 starts its execution
                                                                                          in Algorithm 1 is to delay the execution of all other tasks 𝜏𝑖 ∈ 𝜏 ⧵ 𝜏𝑘
at 𝑡 = 0, it executes the minimum execution 𝐶1 (𝐿, 𝐿) = 8 until the
                                                                                          beyond 𝑑1 (𝑡𝑐 𝑢𝑟 ) while ensuring that future deadlines are met. This is
earliest deadline or future release at 𝑡 = 13, resulting in a slack of five.              repeated for all tasks in the waiting queue. To ensure 𝜏𝑖 completes
Utilizing this slack, the task 𝜏1 then executes 𝐶1 (𝑀 , 𝐿), and the aging                 𝐶𝑖 (𝐿, 𝐿) before 𝑑𝑖 (𝑡𝑐 𝑢𝑟 ), EDF-Slack calculates the maximum execution
factor 𝑎𝑔 𝑒𝐷
           𝑖 increases by one. For the first job of task 𝜏2 , released at                 time in the interval 𝑑1 (𝑡𝑐 𝑢𝑟 ), 𝑑𝑖 (𝑡𝑐 𝑢𝑟 ), which is (1 − 𝑈 ) ⋅ (𝑑𝑖 (𝑡𝑐 𝑢𝑟 ) − 𝑑1 (𝑡𝑐 𝑢𝑟 )),
𝑡 = 13, there is a slack of 4 until the earliest deadline or future release               where 𝑈 denote the left-hand-side of Eq. (2).
at 𝑡 = 25. Thus, 𝐶2 (𝑀 , 𝐿) is executed, and 𝑎𝑔 𝑒𝐷  𝑖 increases by one. The                   The key steps in the slack calculation are as follows:
second job of task 𝜏1 , released at 𝑡 = 13, has a slack of 5 until the earliest
deadline or future release at 𝑡 = 38. To balance 𝑎𝑔 𝑒𝐷            𝐴
                                                       𝑖 and 𝑎𝑔 𝑒𝑖 , 𝐶1 (𝐿, 𝑀)                  • 𝑞𝑖 is computed as the minimum execution of 𝜏𝑖 in the interval
                      𝐴
is executed, and 𝑎𝑔 𝑒𝑖 increases by one. The details of the online policy                         𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 ) (Lines 3–4).
that effectively balances the aging of detection and association for each                       • 𝑅𝐶𝑖 (𝐿, 𝐿) is either zero or 𝐶𝑖 (𝐿, 𝐿), since scheduling decisions
task will be provided at the end of this section. As can be observed                              are only made upon job completion or release in non-preemptive
from the figure, at each scheduling decision, an execution is performed                           scheduling (Line 4).

                                                                                      6
D. Kang et al.                                                                                                                  Journal of Systems Architecture 160 (2025) 103349


    • The execution rate of 𝜏𝑖 in the interval 𝑑1 (𝑡𝑐 𝑢𝑟 ), 𝑑𝑖 (𝑡𝑐 𝑢𝑟 ) is calculated       Algorithm 2 Determination of execution options
      and recorded (Line 5).                                                                                         𝑡
                                                                                            Input: 𝜏, 𝑡𝑐 𝑢𝑟 , 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟
    • 𝑝 is set as the sum of the minimum execution times of all tasks
                                                                                            Output: (𝑠𝑘 , 𝑓𝑘 )
      𝜏𝑖 ∈ 𝜏 (Line 6).                                                                                    𝑡
                                                                                             1: if 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 ≤ 0 then
    • The slack is then determined as the remaining time slots, exclud-
                                                                                             2:    return (𝐿, 𝐿)
      ing 𝑝 (i.e., the sum of 𝑞𝑖 ), within the interval 𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 ) (Line
                                                                                             3: else
      7).
                                                                                             4:    if 𝑎𝑔 𝑒𝐷
                                                                                                          𝑘
                                                                                                              ≤ 𝑎𝑔 𝑒𝐴
                                                                                                                    𝑘
                                                                                                                      then
                                                                                                                            𝑡
Example. Fig. 3(c) illustrates a scheduling scenario of EDF-Slack using                      5:     𝑠𝑙𝑎𝑐 𝑘𝐷−
                                                                                                          𝑘
                                                                                                               = 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 − (𝐶𝑘𝐷 (𝐻) − 𝐶𝑘𝐷 (𝐿))
the same example tasks as shown in Figs. 3(a) and (b). The initial jobs                      6:     if 𝑠𝑙𝑎𝑐 𝑘𝐷−
                                                                                                             𝑘
                                                                                                                 ≥ 0 then
of 𝜏1 and 𝜏2 are released at 𝑡 = 0 and 𝑡 = 13, respectively. Applying                        7:        return (𝐻 , 𝑓𝑘 (𝑠𝑙𝑎𝑐 𝑘𝐷−  𝑘
                                                                                                                                     + 𝐶𝑘𝐴 (𝐿)))
Algorithm 1, the calculated slack value for 𝜏1 at 𝑡𝑐 𝑢𝑟 = 0 is 17, allowing                  8:     else
                                                                                                                             𝑡
the first job of 𝜏1 to execute for 𝐶1 (𝐻 , 𝐻) until 𝑡 = 25. Furthermore,                     9:        return (𝑠𝑘 (𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 + 𝐶𝑘𝐷 (𝐿)), 𝐿)
𝑎𝑔 𝑒𝐷
    1
      and 𝑎𝑔 𝑒𝐴1
                 increment by one. Subsequently, the first job of 𝜏2 begins                 10:     end if
its execution at 𝑡 = 25, executing for 𝐶2 (𝑀 , 𝐿) while increasing 𝑎𝑔 𝑒𝐷  2
                                                                                            11:   else
                                                                                                                       𝑡
by one. Finally, the second job of 𝜏2 starts its execution at 𝑡 = 37.                       12:     𝑠𝑙𝑎𝑐 𝑘𝐴−
                                                                                                          𝑘
                                                                                                               = 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 − (𝐶𝑘𝐴 (𝐻) − 𝐶𝑘𝐴 (𝐿))
                                                                                                             𝐴−
                                                                                                    if 𝑠𝑙𝑎𝑐 𝑘𝑘 ≥ 0 then
Comparing Fig. 3(b) that represents EDF-BE with Fig. 3(c) depicting                         13:
EDF-Slack, we observe that the aging of 𝜏1 and 𝜏2 increases in the same                     14:        return (𝑠𝑘 (𝑠𝑙𝑎𝑐 𝑘𝐴−  𝑘
                                                                                                                                + 𝐶𝑘𝐷 (𝐿)), 𝐻)
amount in both cases. However, the key difference lies in the execution                     15:     else
                                                                                                                                𝑡
of the first job of 𝜏1 . Under EDF-Slack, this job is able to execute with a                16:        return (𝐿, 𝑓𝑘 (𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 + 𝐶𝑘𝐴 (𝐿)))
high-workload execution, while under EDF-BE, it can only execute with                       17:     end if
a middle-workload execution, which allows for higher expectations of                        18:   end if
tracking accuracy in EDF-Slack.                                                             19: end if

    The following proves the timing guarantee of EDF-Slack.
Theorem 2.        A task set 𝜏 that satisfies Eq. (2) is schedulable under
EDF-Slack .                                                                                      • If the slack is less than or equal to zero, the algorithm returns 𝐿
                                                                                                   and 𝐿 (Lines 1–2).
Proof. We prove this by contradiction. Assume, for the sake of con-                              • Otherwise, the algorithm compares the ages of the detection step
tradiction, that the task set 𝜏 satisfies Eq. (2), but is not schedulable                          (𝑎𝑔 𝑒𝐷
                                                                                                        𝑘
                                                                                                          ) and the association step (𝑎𝑔 𝑒𝐴
                                                                                                                                          𝑘
                                                                                                                                            ) (Lines 3–4).
under EDF-Slack. This implies that at some time 𝑡, the total utilization
exceeds 1.0, and hence a deadline miss occurs for some job 𝐽𝑖 in 𝜏.                                    – If 𝑎𝑔 𝑒𝐷
                                                                                                                𝑘
                                                                                                                    is smaller than 𝑎𝑔 𝑒𝐴 𝑘
                                                                                                                                            , indicating the detection step
Let 𝑡𝑚𝑖𝑠𝑠 denote the earliest such time at which a deadline miss occurs,                                 requires more resources, the algorithm calculates 𝑠𝑙𝑎𝑐 𝑘𝐷−      𝑘
                                                                                                                                                                            ,
i.e., 𝑡𝑚𝑖𝑠𝑠 = 𝑑𝑖 , where 𝑑𝑖 is the deadline of 𝐽𝑖 . By the definition of EDF-                            representing the remaining slack after executing
Slack, at each time 𝑡, the slack time for each task is computed based                                    high-workload detection (Line 5).
on the highest-priority job 𝐽1 (𝑡𝑐 𝑢𝑟 ), where 𝑡𝑐 𝑢𝑟 denotes the current time.                         – If 𝑠𝑙𝑎𝑐 𝑘𝐷−
                                                                                                                  𝑘
                                                                                                                       is greater than or equal to zero, the high-workload
Since no tasks are released in the interval [𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 )], the slack time                        detection is followed by middle- or high-workload associa-
ensures that lower-priority tasks cannot block the execution of 𝐽1 . As a                                tion depending on 𝑠𝑙𝑎𝑐 𝑘𝐷−   𝑘
                                                                                                                                         (Lines 6–7). In this case, 𝑓𝑘 (𝑥) is
result, the blocking term in Eq. (2) remains valid during this interval.                                 set as follows:
     Now, since EDF-Slack is based on EDF scheduling, the total utiliza-                                       ∗ 𝐿 for 𝑥 < 𝐶𝑘𝐴 (𝑀),
tion 𝑈 (𝑡) at any time 𝑡 can be expressed as:
          ∑                                                                                                    ∗ 𝑀 for 𝐶𝑘𝐴 (𝑀) ≤ 𝑥 < 𝐶𝑘𝐴 (𝐻),
                   𝐶𝑖
𝑈 (𝑡) =                    + 𝐵(𝑡),                                                                             ∗ 𝐻 for 𝑥 ≥ 𝐶𝑘𝐴 (𝐻).
                𝑑
         𝐽 ∈𝜏(𝑡) 𝑖
                   − 𝑡𝑐 𝑢𝑟
         𝑖

where 𝐶𝑖 is the remaining execution time of task 𝐽𝑖 , and 𝐵(𝑡) is the                                  – If 𝑠𝑙𝑎𝑐 𝑘𝐷−
                                                                                                                  𝑘
                                                                                                                        is less than zero, the algorithm determines if
blocking term. According to Eq. (2), 𝑈 (𝑡) ≤ 1.0 for all 𝑡. Since 𝑡𝑚𝑖𝑠𝑠 is                               middle- or high-workload detection can be performed based
                                                                                                                    𝑡
the earliest time a deadline miss occurs, we must have 𝑈 (𝑡𝑚𝑖𝑠𝑠 ) > 1.0.                                 on 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 , followed by low-workload association (Lines
However, by Eq. (2), we know that 𝑈 (𝑡) ≤ 1.0 for all 𝑡 ≥ 𝑡𝑐 𝑢𝑟 , including                              8–10). In this case, 𝑠𝑘 (𝑥) is set as follows:
𝑡𝑚𝑖𝑠𝑠 . This leads to a contradiction, as the assumption that 𝑈 (𝑡𝑚𝑖𝑠𝑠 ) >                                     ∗ 𝐿 for 𝑥 < 𝐶𝑘𝐷 (𝑀),
1.0 contradicts the fact that 𝑈 (𝑡) ≤ 1.0 holds at all times. Therefore,
                                                                                                               ∗ 𝑀 for 𝐶𝑘𝐷 (𝑀) ≤ 𝑥 < 𝐶𝑘𝐷 (𝐻),
no deadline miss can occur, and the task set 𝜏 is schedulable under
EDF-Slack. □                                                                                                   ∗ 𝐻 for 𝑥 ≥ 𝐶𝑘𝐷 (𝐻).

    Note that the proof is self-contained, but a different proof can be                          • Lines 11–18 follow a similar procedure for determining the ex-
found in [5].                                                                                      ecution options, giving preference to the association step. Here,
    Determination of execution options. EDF-BE and EDF-Slack use                                   𝑠𝑙𝑎𝑐 𝑘𝐷− represents the remaining slack after executing the high-
                                                                                                         𝑘
different slack concepts to ensure timely execution of tasks while im-                             workload association.
proving tracking accuracy by executing beyond the minimum
                                                                                                According to the definition of aging, 𝑎𝑔 𝑒𝐷 𝑘
                                                                                                                                               (and 𝑎𝑔 𝑒𝐴
                                                                                                                                                        𝑘
                                                                                                                                                          ) increase
(i.e., 𝐶𝑖 (𝐿, 𝐿)). As shown in Figs. 3(b) and (c), both EDF-BE and
                                                                                            by one when middle- or high-workload detection (and association) is
EDF-Slack enhance the aging of detection and association through
                                                                                            performed.
predefined mechanisms. The goal of these mechanisms is to balance the
                                                                                                DNN-SAM proposed in [10] introduces two scheduling algorithms:
aging of detection and association, minimizing continuous omissions in
                                                                                            EDF-MandFirst and EDF-Slack. Unlike CA-MOT, both DNN-SAM al-
updating motion and feature information, thereby maximizing tracking
                                                                                            gorithms target multi-object detection (MOD) tasks. The primary dis-
accuracy.                                                                                   tinction between MOT and MOD lies in the presence or absence of
    Algorithm 2 outlines the process for determining the execution                          dependencies between consecutive frames. In MOD, the detection oper-
options for the detection and association steps of task 𝜏𝑘 at time 𝑡𝑐 𝑢𝑟                    ation for a given frame does not utilize any information from previous
                          𝑡
based on the slack 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 calculated in Algorithm 1.                                   frames. Therefore, techniques that rely on previous frame information,

                                                                                        7
D. Kang et al.                                                                                                      Journal of Systems Architecture 160 (2025) 103349


such as aging-aware methods, cannot be employed in the DNN-SAM al-             Table 2
gorithms. Another key difference is that DNN-SAM is responsible solely         Execution time measurement (average and maximum) in terms of image size, feature
                                                                               size, and scheduling overhead.
for detection execution and does not handle the association task. Both
                                                                                Time (ms)       𝐶𝑖𝐷                            𝐶𝑖𝐴                            𝐶𝑖𝑠𝑐 ℎ𝑒
DNN-SAM and CA-MOT algorithms are based on EDF and prioritize
executing jobs with the earliest deadlines among the released tasks.                            L         M         H          L         M         H

However, in contrast to CA-MOT, DNN-SAM splits each job at release              Average         28.0      30.6      36.7       8.3       63.4      74.4       0.3
                                                                                Maximum         43.6      53.5      67.6       11.3      74.0      125.2      0.6
into a mandatory job, responsible for execution in the safety-critical
area, and an optional job, responsible for execution in non-critical
areas. When any mandatory job is present in the waiting queue, it
is always executed first using the EDF algorithm. The distinction be-
tween MandFirst and EDF-Slack arises from whether the execution
of an optional job may interfere with the execution of a mandatory
job. Specifically, the scheduling behavior of DNN-SAM and EDF-Slack
operates as follows:

    • EDF-MandFirst in [10]: Any mandatory job in the waiting queue
      has a higher priority than optional jobs and is scheduled using
      EDF. If no mandatory jobs are in the queue, optional jobs are
      executed using EDF, ensuring that they do not interfere with the         Fig. 4. Comparison for two tasks with the periods (equal to the relative deadlines) of
      execution of future release jobs of mandatory tasks.                     180 ms and 270 ms.
    • EDF-Slack in [10]: Any mandatory job in the waiting queue has
      a higher priority than optional jobs and is scheduled using EDF.
      If no mandatory jobs are in the queue, optional jobs are executed              the ground truth of objects in all frames with the tracking results
      using EDF, potentially interfering with the execution of future-               obtained from the given techniques to measure accuracy. The
      release mandatory jobs, within the slack calculated from the job’s             KITTI dataset consists solely of data captured from forward-facing
      runtime.                                                                       cameras and does not utilize different cameras, meaning there
    • EDF-BE of CA-MOT: A job is not split and has three execu-                      is no overlap in the areas they cover. Additionally, as it does
      tion options for both detection and association. It is executed                not assume simultaneous capture by each camera, there are no
      with the maximum workload option to avoid interfering with                     synchronization issues. CA-MOT aims to maximize the average
      the execution of future release jobs, but only when exactly one                accuracy for the MOT tasks corresponding to all given cameras
      job is present in the waiting queue. The aging of detection and                without missing any deadlines. This assumes that CA-MOT oper-
      association tasks is considered for accuracy maximization.                     ates independently of camera interdependencies, with all cameras
    • EDF-Slack of CA-MOT: A job is not split and has three execution                receiving the same forward-facing camera feed.
      options for both detection and association. Regardless of the                • Execution time measurement: To obtain the WCET of different
      number of jobs in the waiting queue, the job is executed with                  execution options for detection and association, we measured the
      the maximum workload option, potentially interfering with the                  execution time by iterating 1000 times for each sub-tasks with
      execution of future release jobs based on the slack calculated                 three different execution options of an MOT task and then took
      from its runtime. The aging of detection and association tasks is              the largest value. We also measured the worst-case time required
      considered for accuracy maximization.                                          for slack calculation and scheduling decisions such as Algorithms
                                                                                     1 and 2. Table 2 shows the measurement results.
5. Evaluation
                                                                               5.2. Experiment result
   This section evaluates the effectiveness of CA-MOT in achieving R1
and R2 for multiple MOT tasks.
                                                                                   We consider task sets in which schedulability is not guaranteed with
                                                                               the high-workload execution for detection and association
5.1. Experiment setting
                                                                               (i.e., 𝐶𝑖 (𝐻 , 𝐻)) for all tasks but is guaranteed with the minimum
                                                                               execution (𝐶𝑖 (𝐿, 𝐿)) according to Eq. (2). Note that the schedulability
    • Software: CA-MOT employs the tracking-by-detection of which
                                                                               with 𝐶𝑖 (𝑥, 𝑦) for 𝑥, 𝑦 ∈ {𝐿, 𝑀 , 𝐻} can be judged with Eq. (2) by
      the detector is one of the most popular detectors, YOLOv5 [14]
                                                                               substituting 𝐶𝑖 (𝐿, 𝐿) to 𝐶𝑖 (𝑥, 𝑦). To evaluate the effectiveness of CA-
      model, and tracker is StrongSORT [7]. We confirmed that other
                                                                               MOT we consider the following including a baseline and our two
      detectors (i.e. YOLOX, Faster-RCNN) exhibit a similar trend to
                                                                               proposed approaches.
      YOLOv5 in terms of MOTA and execution time, as shown in Fig. 7.
      For feature extraction conducted as a part of association, we used           • Detection first (DF): non-preemptive EDF in which the execution
      OS-Net [10]. The YOLOv5 model was pretrained on the COCO                       option of all tasks 𝜏𝑖 ∈ 𝜏 is equally fixed to the rightmost
      Dataset [18], while OS-Net was pretrained on the MSMT Dataset
                                                                                     one among {𝐶𝑖 (𝐿, 𝐿), 𝐶𝑖 (𝑀 , 𝐿), 𝐶𝑖 (𝐻 , 𝐿), 𝐶𝑖 (𝐻 , 𝑀), 𝐶𝑖 (𝐻 , 𝐻)} that
      [19]. The experimental environment is with Ubuntu 18.04.6 LTS,
                                                                                     satisfies the schedulability condition in Eq. (2).
      CUDA 11.4, and PyTorch 1.12.
                                                                                   • EDF-BE: EDF-BE of which task set passes the schedulability con-
    • Hardware: We consider the NVIDIA Jetson Xavier as a GPU-
                                                                                     dition in Eq. (2), which is proposed in Section 4.2.
      enabled embedded board [20]. The NVIDIA Jetson Xavier features
                                                                                   • EDF-Slack: EDF-Slack of which task set passes the schedulability
      a 64-bit 8-core CPU, 32 GB Memory, and 512-core Volta GPU. We
                                                                                     condition in Eq. (2), which is proposed in Section 4.3.
      utilized the MAXN mode provided by the NVIDIA Jetson Xavier.
    • Dataset and performance metric: We used the KITTI Dataset                   Fig. 4 represents the tracking accuracy and the proportion of three
      [13], which contains data collected from autonomous vehicle              execution options (i.e., 𝐿, 𝑀, and 𝐻) selected during detection and
      driving. To evaluate the accuracy of each region, we measured            association for two tasks with different periods: 180 and 270 ms
      the MOTA [15] as the most well-known performance metric for              (milliseconds). As shown in Fig. 4(a), for overall accuracy, EDF-BE and
      tracking accuracy for critical and entire regions. MOTA compares         EDF-Slack achieve 20.2% and 26.6%, respectively, while DF achieves

                                                                           8
D. Kang et al.                                                                                                                       Journal of Systems Architecture 160 (2025) 103349


Fig. 5. Comparison for four tasks with the same period (equal to the relative deadline)       Fig. 6. Visualization on KITTI dataset for three tasks with the periods of 400 ms. (For
of 400 ms.                                                                                    interpretation of the references to color in this figure legend, the reader is referred to
                                                                                              the web version of this article.)


13.4%, which demonstrate the effectiveness of slack utilization and
balancing aging of detection and association in increasing tracking
accuracy. We observe that the slack reclamation performed by Algo-
rithm 1 in EDF-Slack is significantly more effective in achieving high
tracking accuracy than in EDF-BE which has limitations in obtaining
a substantial amount of slack. For critical accuracy, EDF-BE and EDF-
Slack achieve much higher accuracies, which are 28.3% and 32.2%,
respectively, compared to 15.4% of DF. Based on this observation,
we can interpret that even though EDF-BE obtains a smaller amount
                                                                                                              Fig. 7. MOTA and execution time on other detectors.
of slack compared to EDF-Slack, it efficiently performs tracking for
the critical region with limited computing resources. On the other
hand, EDF-Slack provides high tracking accuracy not only for the entire
region but also for the safety-critical region, thanks to its efficient slack                     Additional experiments are conducted to ascertain if CA-MOT ex-
reclamation. As seen in Fig. 4, EDF-Slack exhibits a significantly higher                     hibits comparable behavioral patterns across a range of detectors,
proportion of high-workload execution and middle-workload execution                           including YOLOv5, which was evaluated previously. Fig. 7 displays
for detection and association compared to other execution options. On                         the MOTA and execution time for various contemporary detectors,
the other hand, EDF-BE shows a slight proportion of middle-workload                           analyzed according to their workload. Modern detectors are generally
execution, while the majority of cases involve low-workload execution.                        classified into one-stage and two-stage categories based on their archi-
    Fig. 5 depicts the results of another experiment involving three                          tecture and further into anchor-free and anchor-based types, contingent
different sets of tasks, with the number of tasks ranging from two                            on their use of predefined anchors for object detection. Our study
to four, all having the same periods (i.e., 400 ms with a guaranteed                          incorporated YOLOv5, a standard one-stage anchor-based detector. We
minimum execution 𝐶𝑖 (𝐿, 𝐿), but no guaranteed maximum execution                              also investigate the performance of the two-stage anchor-based detector
𝐶𝑖 (𝐻 , 𝐻) for 𝜏𝑖 ∈ 𝜏). In Fig. 5(a), the tracking accuracy of the evaluated                  Faster-RCNN [9] and the one-stage anchor-free detector YOLOX [21],
approaches is shown as the number of tasks increases. For the case                            to verify the consistency of results. Faster-RCNN utilized ResNet-50
of two tasks, EDF-Slack achieves an overall accuracy of 41.8% and a                           as its backbone network, while YOLOX was configured with a small
critical accuracy of 41.4%, while EDF-BE achieves an overall accuracy                         version model. Both models were trained using the COCO dataset.
of 24.3% and a critical accuracy of 27.2%. In contrast, DF achieves                           Despite minor discrepancies in specific ratios, the results consistently
lower accuracy, with an overall accuracy of 18.0% and a critical                              demonstrate that both MOTA and execution time escalate in conjunc-
accuracy of 18.7%. As the number of tasks increases, both EDF-BE and                          tion with increasing workload, as shown in Figs. 7(a) and (b). The
EDF-Slack experience a decrease in accuracy, but they still outperform                        runtime trend of YOLOX is particularly noteworthy, which closely
DF in terms of tracking accuracy. Even with only four tasks, EDF-                             mirrors that of YOLOv5. This pattern indicates that similar outcomes
BE yields lower overall accuracy than DF, as it can only detect part                          may be expected from other detectors akin to YOLOv5.
of the image when selecting the high workload option. Nevertheless,
by prioritizing computations in critical regions at low and medium
                                                                                              6. Related work
workloads, EDF-BE attains higher critical accuracy than DF. Fig. 5(b)
presents the distribution of execution options for EDF-BE and EDF-Slack
                                                                                                  The tracking-by-detection model is a commonly used method in
when there are three tasks. Similar to Fig. 4, it is evident that both
                                                                                              the MOT field. It has shown significant progress and enhanced perfor-
EDF-BE and EDF-Slack allocate the workload between the detection and
association steps in a balanced manner using the ages. Additionally,                          mance recently, largely thanks to the evolution of deep neural networks
EDF-Slack can reclaim more slack compared to EDF-BE.                                          (DNNs). A well-recognized model in this field, SORT (simple online
    Fig. 6 presents the tracking outcomes of a single task within a set                       and real-time tracking) [22], does its matching based mainly on where
of three tasks, each with a period of 400 ms, comparing (a) the DF                            objects are located, using detection tools to achieve this. To push this
algorithm and (b) EDF-Slack. In the visualization, each tracked object                        model further, DeepSORT [6] builds on the SORT model by adding a
is represented by a unique color and ID within a bounding box, with                           DNN-based re-identification model. This allows for the extraction of ob-
the symbol ‘‘#’’ indicating the frame number. In the DF scenario, each                        ject features. By adding this layer, DeepSORT utilizes both the object’s
task executes 𝐶𝑖 (𝐻 , 𝐿), leading to insufficient computational resources                     location and its visual information, leading to a stronger performance.
for proper association. This inadequacy results in DF’s failure to track                      Recent work in this area, such as Deep OC-SORT [23] and Strong-
two objects within the safety-critical region in the 167th frame and                          SORT [7], is geared towards enhancing the accuracy of these models
causes an ID switch from 4 to 10 in the subsequent 168th frame,                               even more, focusing especially on refining and improving the matching
as illustrated in Fig. 6. Conversely, EDF-Slack leverages aging and                           algorithms used in these systems. However, it is critical to understand
slack techniques to allocate sufficient computational resources for both                      that these approaches are mainly designed for situations where there
detection and association tasks, enabling accurate tracking of all objects                    are plenty of computing resources. Therefore, they might struggle to
in the safety-critical region.                                                                meet the timing needs in systems that are restricted in resources, like

                                                                                          9
D. Kang et al.                                                                                                        Journal of Systems Architecture 160 (2025) 103349


the embedded systems in self-driving vehicles where resources may be             Declaration of competing interest
scarce.
    Considering self-driving vehicles, which are fundamentally systems               The authors declare the following financial interests/personal rela-
where safety is critical, even the smallest delays or slight drops in            tionships which may be considered as potential competing interests:
accuracy can lead to significant and potentially dangerous risks. Some           Hyeongboo Baek reports financial support was provided by National
research such as DNN-SAM [5] has tried to tackle these problems by               Research Foundation of Korea. If there are other authors, they declare
suggesting frameworks that concentrate specifically on safety-critical           that they have no known competing financial interests or personal
areas. These frameworks give priority to critical accuracy and use               relationships that could have appeared to influence the work reported
uncertainty handling to ensure the highest safety standards. However,            in this paper.
these research studies and their related approaches are mostly designed
for multi-object detection systems and may not directly apply to or be           Acknowledgments
effective in multi-object tracking. Likewise, another study, RT-MOT [2],
aims to maximize the overall accuracy of multi-object tracking and                   This work was supported by the National Research Foundation of
ensure on-time execution, but it overlooks the importance of individual          Korea (NRF) grant funded by the Korea government (MSIT) (RS-2023-
objects in its approach. To address these limitations, our suggested             00250742, 2022R1A4A3018824, RS-2024-00438248). This work was
framework, known as CA-MOT, aims to confront these challenges                    partly supported by the Institute of Information & Communications
directly. By leveraging the unique traits of multi-object tracking in            Technology Planning & Evaluation(IITP)-ITRC(Information Technol-
safety-critical systems, CA-MOT ensures on-time execution and boosts             ogy Research Center) grant funded by the Korea government(MSIT)
tracking accuracy for objects that could potentially be dangerous to the         (IITP-2025-RS-2023-00259061).
system. It builds on previous work while addressing their weaknesses
to create a safer and more efficient tracking system.                            Data availability

                                                                                    Data will be made available on request.
7. Discussion

    A limitation of CA-MOT is its exclusive reliance on a single CPU and         References
GPU, which restricts scalability. A recent approach, Batch-MOT [3],
addresses this limitation by processing input images from multiple                [1] M. Yang, S. Wang, J. Bakita, T. Vu, F.D. Smith, J.H. Anderson, J.-M. Frahm,
                                                                                      Re-thinking CNN frameworks for time-sensitive autonomous-driving applications:
cameras through a shared queue, distributing CPU operations across                    Addressing an industrial challenge, in: Proceedings of IEEE Real-Time Technology
multiple CPUs, and employing batch processing on a single GPU. How-                   and Applications Symposium, IEEE, 2019, pp. 305–317.
ever, this approach may introduce additional communication overhead               [2] D. Kang, S. Lee, H.S. Chwa, S.-H. Bae, C.M. Kang, J. Lee, H. Baek, RT-MOT:
among CPUs, potentially determining its overall efficiency. The primary               Confidence-aware real-time scheduling framework for multi-object tracking tasks,
                                                                                      in: Proceedings of IEEE Real-Time Systems Symposium, IEEE, 2022, pp. 318–330.
contribution of Batch-MOT lies in its online schedulability analysis,
                                                                                  [3] D. Kang, S. Lee, C.-H. Hong, J. Lee, H. Baek, Batch-MOT: Batch-enabled real-
which dynamically determines the maximum number of images that                        time scheduling for multi-object tracking tasks, IEEE Trans. Comput.-Aided Des.
can be batch-processed without violating their deadlines. Nonethe-                    Integr. Circuits Syst. (2024).
less, unlike CA-MOT, Batch-MOT lacks support for multiple execution               [4] S. Liu, X. Fu, M. Wigness, P. David, S. Yao, L. Sha, T. Abdelzaher, Self-cueing
                                                                                      real-time attention scheduling in criticality-aware visual machine perception, in:
strategies during the association phase, resulting in suboptimal resource
                                                                                      Proceedings of IEEE Real-Time Technology and Applications Symposium, IEEE,
utilization for individual MOT tasks. Enhancing CA-MOT by incor-                      2022, pp. 173–186.
porating batch processing capabilities to address these shortcomings              [5] W. Kang, S. Chung, J.Y. Kim, Y. Lee, K. Lee, J. Lee, K.G. Shin, H.S. Chwa,
presents a promising avenue for future research. Furthermore, as high-                DNN-SAM: Split-and-merge DNN execution for real-time object detection, in:
                                                                                      Proceedings of IEEE Real-Time Technology and Applications Symposium, 2022,
lighted in previous studies, deploying CA-MOT on real-world platforms,
                                                                                      URL https://rtcl.dgist.ac.kr/index.php/publication-2/.
such as the F1/10 autonomous driving platform [5], offers significant             [6] N. Wojke, A. Bewley, D. Paulus, Simple online and realtime tracking with a
potential for further investigation and practical validation.                         deep association metric, in: Proceedings of the IEEE International Conference on
                                                                                      Image Processing, IEEE, 2017, pp. 3645–3649.
                                                                                  [7] Y. Du, Y. Song, B. Yang, Y. Zhao, StrongSORT: Make deepsort great again, 2022,
8. Conclusion                                                                         arXiv preprint arXiv:2202.13514.
                                                                                  [8] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-
                                                                                      time object detection, in: Proceedings of the IEEE/CVF Conference on Computer
    In this paper, we proposed, CA-MOT, a new criticality-aware MOT
                                                                                      Vision and Pattern Recognition, 2016, pp. 779–788.
execution and scheduling framework. Aiming at achieving critical-                 [9] R. Girshick, Fast R-CNN, in: Proceedings of the IEEE International Conference
accuracy maximization and timing guarantee, CA-MOT first proposes                     on Computer Vision, 2015, pp. 1440–1448.
a new system design to offer a control knob between tracking accuracy            [10] K. Zhou, Y. Yang, A. Cavallaro, T. Xiang, Omni-scale feature learning for
                                                                                      person re-identification, in: Proceedings of the IEEE International Conference
and timing guarantee to efficiently utilize limited computing resources.
                                                                                      on Computer Vision, 2019, pp. 3702–3712.
Then, CA-MOT develops two scheduling algorithms to effectively uti-              [11] Y. Zhang, P. Sun, Y. Jiang, D. Yu, F. Weng, Z. Yuan, P. Luo, W. Liu, X.
lize the system design while using the notions of slack and aging of                  Wang, ByteTrack: Multi-object tracking by associating every detection box, in:
detection and association. Using various task sets and real-world au-                 Proceedings of the European Conference on Computer Vision, Springer, 2022,
                                                                                      pp. 1–21.
tonomous driving data, we demonstrated that CA-MOT can obtain high
                                                                                 [12] Y. Zhang, C. Wang, X. Wang, W. Zeng, W. Liu, FairMOT: On the fairness of
tracking accuracy of entire and safety-critical regions while ensuring                detection and re-identification in multiple object tracking, Int. J. Comput. Vis.
the timely execution of all MOT tasks.                                                129 (11) (2021) 3069–3087.
                                                                                 [13] A. Geiger, P. Lenz, R. Urtasun, Are we ready for autonomous driving? The
                                                                                      KITTI vision benchmark suite, in: Proceedings of the IEEE/CVF Conference on
CRediT authorship contribution statement                                              Computer Vision and Pattern Recognition, 2012.
                                                                                 [14] ultralytics, YOLOv5 [Online], 2022, Available: https://github.com/ultralytics/
                                                                                      yolov5.
    Donghwa Kang: Writing – original draft, Software, Methodology,
                                                                                 [15] K. Bernardin, R. Stiefelhagen, Evaluating multiple object tracking performance:
Formal analysis. Jinkyu Lee: Writing – review & editing, Valida-                      the clear mot metrics, EURASIP J. Image Video Process. 2008 (2008) 1–10.
tion, Formal analysis. Hyeongboo Baek: Writing – review & editing,               [16] G. Welch, G. Bishop, et al., An introduction to the Kalman filter, ACM SIGGRAPH
Supervision, Funding acquisition, Formal analysis, Conceptualization.                 (1995).


                                                                            10
D. Kang et al.                                                                                             Journal of Systems Architecture 160 (2025) 103349


[17] T.P. Baker, A stack-based resource allocation policy for realtime processes, in:           Jinkyu Lee is an associate professor in the Department
     Proceedings of IEEE Real-Time Systems Symposium, IEEE, 1990, pp. 191–200.                  of Computer Science and Engineering at Sungkyunkwan
[18] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár,               University (SKKU), South Korea, where he joined in 2014.
     C.L. Zitnick, Microsoft COCO: Common objects in context, in: Proceedings of the            He received the BS, MS, and Ph.D. degrees in computer
     European Conference on Computer Vision, Springer, 2014, pp. 740–755.                       science from the Korea Advanced Institute of Science and
[19] L. Wei, S. Zhang, W. Gao, Q. Tian, Person transfer gan to bridge domain gap                Technology (KAIST), South Korea, in 2004, 2006, and 2011,
     for person re-identification, in: Proceedings of the IEEE/CVF Conference on                respectively. He has been a research fellow/visiting scholar
     Computer Vision and Pattern Recognition, 2018, pp. 79–88.                                  in the Department of Electrical Engineering and Computer
[20] NVIDIA, NVIDIA Xavier Developer Kit. [Online], 2022, Available: https://www.               Science, University of Michigan until 2014. His research
     nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-agx-xavier.                   interests include system design and analysis with timing
[21] Z. Ge, S. Liu, F. Wang, Z. Li, J. Sun, Yolox: Exceeding yolo series in 2021, 2021,         guarantees, QoS support, and resource management in real-
     arXiv preprint arXiv:2107.08430.                                                           time embedded systems and cyber–physical systems. He won
[22] A. Bewley, Z. Ge, L. Ott, F. Ramos, B. Upcroft, Simple online and realtime track-          the best student paper award from the 17th IEEE Real-Time
     ing, in: Proceedings of the IEEE International Conference on Image Processing,             and Embedded Technology and Applications Symposium
     IEEE, 2016, pp. 3464–3468.                                                                 (RTAS) in 2011 and the Best Paper Award from the 33rd
[23] G. Maggiolino, A. Ahmad, J. Cao, K. Kitani, Deep oc-sort: Multi-pedestrian                 IEEE Real-Time Systems Symposium (RTSS) in 2012.
     tracking by adaptive re-identification, 2023, arXiv preprint arXiv:2302.11813.

                                                                                                Hyeongboo Baek is an associate professor in the Depart-
                          Dongwha Kang is a Ph.D. course student in the School                  ment of Artificial Intelligence, University of Seoul (UOS),
                          of Computing, Korea Advanced Institute of Science and                 South Korea. He received the BS degree in Computer Science
                          Technology (KAIST), South Korea. He received a BS and                 and Engineering from Konkuk University, South Korea, in
                          MS degree in computer science from Incheon National Uni-              2010 and the MS and Ph.D. degrees in Computer Science
                          versity (INU) in 2022 and 2024 respectively. His research             from KAIST, South Korea, in 2012 and 2016, respectively.
                          interests include artificial intelligence, autonomous systems,        His research interests include cyber–physical systems, real-
                          and real-time embedded systems.                                       time embedded systems, and system security. He won the
                                                                                                best paper award from the 33rd IEEE Real-Time Systems
                                                                                                Symposium (RTSS) in 2012.


                                                                                           11