901 lines
106 KiB
Plaintext
901 lines
106 KiB
Plaintext
Journal of Systems Architecture 160 (2025) 103349
|
||
|
||
|
||
Contents lists available at ScienceDirect
|
||
|
||
|
||
Journal of Systems Architecture
|
||
journal homepage: www.elsevier.com/locate/sysarc
|
||
|
||
|
||
|
||
|
||
Real-time scheduling for multi-object tracking tasks in regions with different
|
||
criticalities
|
||
Donghwa Kang a , Jinkyu Lee b ,∗, Hyeongboo Baek c ,∗
|
||
a Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
|
||
b
|
||
Sungkyunkwan University (SKKU), Suwon, South Korea
|
||
c
|
||
University of Seoul (UOS), Seoul, South Korea
|
||
|
||
|
||
|
||
ARTICLE INFO ABSTRACT
|
||
|
||
Keywords: Autonomous vehicles (AVs) utilize sensors such as LiDAR and cameras to iteratively perform sensing, decision-
|
||
Multi-object tracking making, and actions. Multi-object tracking (MOT) systems are employed in the sensing stage of AVs, using these
|
||
Real-time scheduling sensors to detect and track objects like pedestrians and vehicles, thereby enhancing situational awareness.
|
||
Timing guarantee
|
||
These systems must handle regions of varying criticality and dynamically shifting locations, all within limited
|
||
Criticality-awareness
|
||
computing resources. Previous DNN-based MOT approaches primarily focused on tracking accuracy, but timing
|
||
Autonomous driving
|
||
guarantees are becoming increasingly vital for autonomous driving. Although recent studies have introduced
|
||
MOT scheduling frameworks with timing guarantees, they are either restricted to single-camera systems or
|
||
fail to prioritize safety-critical regions in the input images. We propose CA-MOT, a Criticality-Aware MOT
|
||
execution and scheduling framework for multiple cameras. CA-MOT provides a control knob that balances
|
||
tracking accuracy in safety-critical regions and timing guarantees. By effectively utilizing this control knob,
|
||
CA-MOT achieves both high accuracy and timing guarantees. We evaluated CA-MOT’s performance using a
|
||
GPU-enabled embedded board commonly employed in AVs, with data from real-world autonomous driving
|
||
scenarios.
|
||
|
||
|
||
|
||
1. Introduction pooling and convolutional layers) using CNN (convolutional neural
|
||
network)-based models (e.g., OS-Net [10]). For unmatched objects,
|
||
Autonomous vehicles (AVs) are systems that iteratively perform location-based methods like intersection over union (IoU) are applied.
|
||
sensing, decision-making, and actions using various sensors such as MOT input images exhibit two key characteristics: (i) regions with
|
||
LiDAR, radar, inertial measurement units (IMU), and cameras [1]. varying levels of criticality and (ii) dynamically shifting locations. With
|
||
Multi-object tracking (MOT) systems, used in the perception stage limited computing resources in AVs, it is crucial to deliver different
|
||
of AVs, track objects like pedestrians and cars, enhancing situational levels of service quality based on criticality. Safety-critical regions,
|
||
awareness. Since MOT information is periodically transferred to control where objects with a short time-to-collision (e.g., under 2 s) cluster,
|
||
tasks, timely execution must be guaranteed to ensure safety and prevent must be prioritized. If multiple clusters exist, the broader area en-
|
||
severe accidents [2–4]. Low accuracy, despite timely execution, may compassing them is considered the safety-critical region, as defined in
|
||
result in missed objects, thus compromising AVs’ safety [2,4,5]. There-
|
||
DNN-SAM [5]. Established methods compute time-to-collision using Li-
|
||
fore, AV MOT systems should ensure timing guarantees with maximized
|
||
DAR and IMU data; we follow the approach from DNN-SAM. This leads
|
||
accuracy.
|
||
to two requirements for criticality-aware MOT systems: (R1) accuracy
|
||
Tracking-by-detection [6,7] is widely used due to its high accuracy
|
||
maximization for safety-critical regions and (R2) timing guarantees.
|
||
and ability to leverage state-of-the-art DNN-based detection models
|
||
Most existing DNN-based MOT approaches focus on accuracy [7,
|
||
(e.g., YOLO series [8], Faster R-CNN [9]). For each input image from
|
||
each camera, tracking-by-detection performs two tasks: detection and 11,12], but timing guarantees are increasingly critical in autonomous
|
||
association. Detection uses DNN-based models to sense the motion driving. Recent research has proposed MOT resource scheduling frame-
|
||
information of objects, such as location and velocity, while association works that guarantee timing for every MOT execution [2,4]. How-
|
||
matches objects between frames based on extracted feature informa- ever, [2] overlooks safety-criticality, while [4] focuses on a single task.
|
||
tion (also called feature vectors or feature maps obtained through We address safety-criticality across multiple tasks, raising the following
|
||
|
||
|
||
∗ Corresponding authors.
|
||
E-mail addresses: anima0729@kaist.ac.kr (D. Kang), jinkyu.lee@skku.edu (J. Lee), hbbaek359@gmail.com (H. Baek).
|
||
|
||
https://doi.org/10.1016/j.sysarc.2025.103349
|
||
Received 22 September 2024; Received in revised form 13 December 2024; Accepted 20 January 2025
|
||
Available online 28 January 2025
|
||
1383-7621/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
|
||
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
|
||
|
||
|
||
challenges:
|
||
|
||
C1. How to balance R1 and R2 to efficiently use limited computing
|
||
resources.
|
||
C2. How to achieve both R1 and R2 by effectively using the control
|
||
knob developed from C1.
|
||
In this paper, we propose CA-MOT, a Criticality-Aware MOT exe-
|
||
cution and scheduling framework for multiple MOT tasks. To address
|
||
C1, CA-MOT offers three execution options (low, middle, and high
|
||
workloads) to balance R1 and R2 for both detection and association.
|
||
To address C2, CA-MOT introduces the notion of aging for detection
|
||
and association sub-tasks, estimating the reliability of motion and
|
||
feature information over time. Balancing the aging of these tasks is
|
||
essential to achieve R1 and R2 with limited resources (to be discussed
|
||
in Section 3.4). Based on this, CA-MOT develops two scheduling algo-
|
||
rithms: EDF-BE and EDF-Slack. EDF-BE increases the workload of tasks
|
||
waiting in the ready queue for execution (referred to as active tasks)
|
||
without compromising the R2 bound when no other tasks are pending.
|
||
In contrast, EDF-Slack is designed to handle scenarios with multiple
|
||
active tasks. Fig. 1. Tracking accuracy and execution time on different execution options of
|
||
detection and association.
|
||
To validate CA-MOT’s performance in meeting R1 and R2, we
|
||
conducted extensive experiments on an NVIDIA Jetson Xavier using
|
||
the KITTI Dataset [13]. Additionally, we applied three detectors in our
|
||
experiments: YOLOv5 [14], YOLOX [8], and Faster-RCNN [9]. a system that tracks specific objects moving between the fields of view
|
||
The contributions of this paper are as follows: of multiple cameras (called hand-over), this is beyond the scope of
|
||
our work. This paper focuses on dividing the multi-object tracking task
|
||
• We motivate the importance of balancing between aging of de- into two subtasks (i.e., detection and association) and using DNN-based
|
||
tection and association to achieve R1 and R2 (Section 2). MOT-specific properties (i.e., reuse of motion and feature information)
|
||
• We propose a new system design, CA-MOT that addresses R1 and to achieve R1 and R2 under limited resources.
|
||
R2 considering varying levels of criticality in different regions for
|
||
multiple MOT tasks (Section 3). 2.2. Trade-off between accuracy and execution time
|
||
• We develop new scheduling algorithms to effectively achieve R1
|
||
and R2 by balancing between aging of detection and association
|
||
To address C1, we consider two factors: (i) the input image size
|
||
for each MOT task (Section 4).
|
||
and detection within the safety-critical region, and (ii) the number of
|
||
• We demonstrate the effectiveness of CA-MOT in achieving R1 and
|
||
objects used for feature extraction during association across all detected
|
||
R2 using a real-world self-driving dataset (Section 5).
|
||
objects in each frame.
|
||
Fig. 1(a) compares the multi-object tracking accuracy (MOTA) [15]
|
||
2. Motivation for the overall and safety-critical regions (referred to as overall and
|
||
critical accuracy) and the execution time for a single MOT task using
|
||
This section presents target systems and motivates the system de- three input image sizes (256 × 256, 416 × 416, 672 × 672). Overall
|
||
sign of CA-MOT to address C1 and C2 based on measurement-based accuracy considers all objects, while critical accuracy focuses on the
|
||
observations. safety-critical region. YOLOv5 [14] is used for detection, and features
|
||
are extracted for all detected objects. The KITTI dataset [13] is used.
|
||
2.1. Target system
|
||
For image sizes 256 × 256 and 416 × 416, detection is performed
|
||
on a cropped region of interest (RoI) that includes the safety-critical
|
||
CA-MOT targets 2D MOT systems on AVs equipped with multiple
|
||
region. If the RoI is smaller, it is resized to include the critical region;
|
||
camera sensors. Each MOT task performs MOT execution on consecu-
|
||
otherwise, the critical region is cropped accordingly. The safety-critical
|
||
tive input frames received from the corresponding camera sensor at a
|
||
region will be defined in Section 3. For the 672 × 672 size (i.e., the
|
||
predetermined period. As this recurring task is required to complete
|
||
original input size), detection occurs without cropping.
|
||
a job within a specified deadline, each MOT task is considered a
|
||
As shown in Fig. 1(a), reducing image size leads to a notable
|
||
real-time task with a period and deadline. CA-MOT employs tracking-
|
||
decrease in overall accuracy, while critical accuracy decreases less
|
||
by-detection comprising two steps of MOT execution: detection and
|
||
association. The front-end detector performs detection by exploiting significantly due to prioritization of the safety-critical region in the RoI.
|
||
the existing stand-alone DNN-based detector to identify the position Additionally, execution time decreases as the image size is reduced,
|
||
and class of objects in the input image. Using the locations of detected demonstrating a trade-off between R1 and R2 when focusing on the
|
||
objects, the feature extractor (e.g., the deployed CNN model such as critical region.
|
||
OSNet) extracts features (i.e., feature vectors or feature maps) for each Fig. 1(b) shows the impact of varying the number of objects used
|
||
object. These features capture the visual characteristics of each object. for feature extraction on accuracy and execution time, with the image
|
||
The back-end tracker compares the feature similarities between objects size fixed at 672 × 672. The number of objects ranges from zero, three,
|
||
in the current frame and the previous frame, matching objects with and more than three. OS-Net [10] is used for feature extraction. As
|
||
high similarity. For any remaining unmatched objects, a location-based shown in Fig. 1(b), as the number of objects with feature extraction
|
||
matching method such as IoU is applied. The tracker then stores the increases, both overall and critical accuracy improve, but this also leads
|
||
motion information (position and velocity) and the features of each to increased execution time. This highlights a trade-off between R1 and
|
||
object in preparation for the next frame. R2 based on the number of objects considered for feature extraction.
|
||
We assume a system in which each camera independently tracks Section 3.3 details the MOT execution pipeline of CA-MOT, which
|
||
objects moving within its field of view. While it is possible to consider leverages these observations to effectively address C1.
|
||
|
||
2
|
||
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
|
||
|
||
|
||
|
||
|
||
Fig. 2. System design of CA-MOT: the key features are (a) an aging-aware scheduler that provides timing guarantees and a criticality-aware flexible MOT execution pipeline
|
||
including (b) a detection module that accommodates varying input sizes and (c) an association module that handles a varying number of objects for feature extraction.
|
||
|
||
|
||
2.3. Different combination of detection and association • It provides a timing guarantee while providing prioritized track-
|
||
ing accuracy for the safety-critical regions by exploiting an MOT-
|
||
To address C2, Fig. 1(c) reveals an intriguing observation that differ- specific property.
|
||
ent combinations of image sizes and the number of feature extractions
|
||
To address the first goal, CA-MOT implements a criticality-aware
|
||
yield distinct effects on accuracy and execution time. The experiment
|
||
flexible MOT execution pipeline in which detection and association
|
||
was conducted over 100 consecutive frames.
|
||
are performed with different execution option by leveraging the ob-
|
||
In Fig. 1(c), the execution of detection or association is denoted by
|
||
servations discussed in Section 2.2. To address the second goal, CA-
|
||
𝑃 or 𝐹 . 𝑃 represents partial computation, where detection is performed MOT develops an aging-aware task-level scheduler to provide accuracy
|
||
only on the region of interest (RoI) at a size of 256 × 256, including maximization while providing a timing guarantee by exploiting the
|
||
a safety-critical region, and association is limited to location-based observations discussed in Section 2.3 building upon the MOT execution
|
||
association without feature extraction. 𝐹 represents full computation, pipeline. The MOT execution pipeline and the scheduler are imple-
|
||
where detection is performed on the entire image at a size of 672 × 672, mented as separate threads, and they are communicated with shared
|
||
and association includes feature extraction for all objects. The number memory.
|
||
in the upper right of the notation indicates how many times the combi- CA-MOT does not require modifications to existing DNN models
|
||
nation of detection and association has been performed. For example, (e.g., detectors and feature extractors), which allows for reusing most
|
||
the notation 𝐹 𝐹 50 𝑃 𝑃 50 indicates that we use 𝐹 for both the detection (if not all) stand-alone detectors and feature extractors. Notably, state-
|
||
and association steps in the first 50 frames and 𝑃 for both phases in the of-the-art detectors like YOLOv5 are inherently designed to handle
|
||
remaining 50 frames. To mitigate the issue of objects outside the critical varying input image sizes, and all CNN models can perform batch exe-
|
||
region not being detected due to cropping, which can decrease the cution on multiple images (each corresponding to a different object). As
|
||
accuracy of the non-critical region, we utilize the position information shown in Fig. 2, the key features of CA-MOT are: (a) a scheduling policy
|
||
of objects from the previous frame as predicted position information for that selects one input per camera, provides timing guarantees, and
|
||
the current frame using a prediction model such as Kalman filter [16] adjusts the workload for detection and association; (b) a module that
|
||
during the execution of 𝑃 in the detection step. Except for 𝑃 𝑃 100 and processes detection with inputs of varying sizes; and (c) a module that
|
||
𝐹 𝐹 100 , all combinations have the same proportion of 𝐹 and 𝑃 for the extracts features from a pre-determined number of detected objects.
|
||
entire frames.
|
||
As shown in Fig. 1(c), although 𝐹 𝐹 50 𝑃 𝑃 50 and 𝑃 𝐹 100 have similar 3.2. Workflow
|
||
execution times with (𝑃 𝐹 + 𝐹 𝑃 )100 and (𝐹 𝐹 + 𝑃 𝑃 )100 , they show lower
|
||
tracking accuracy. This indicates that different combinations of 𝐹 and Fig. 2 presents the workflow of CA-MOT. During system operation,
|
||
𝑃 can have a varying impact on accuracy. The observation in Fig. 1(c) the task scheduler maintains a queue to store images periodically
|
||
necessitates a new scheduler that is capable of obtaining high tracking received from each camera sensor (⃝). 1 Then, the task scheduler deter-
|
||
accuracy by capturing an MOT-specific property referred to as aging, mines the following for tasks in the queue: (a) the task to be scheduled,
|
||
which will be detailed in Section 3.4. (b) the execution option for the detector, and (c) the execution option
|
||
for the association (⃝).
|
||
2 After an image moves to the MOT execution
|
||
3. System design of CA-MOT pipeline, the critical region identification module identifies a safety-
|
||
critical region from the image and crops (or not) a RoI including the
|
||
This section presents the goal and design of CA-MOT to address C1 safety-critical region according to the execution option for the detector
|
||
and C2. (⃝).
|
||
3 Depending on the execution option, the cropped RoI or entire
|
||
image is processed for detection (⃝).4 Furthermore, depending on the
|
||
3.1. System overview number of objects for which features are extracted, CA-MOT selectively
|
||
extracts features for detected objects (⃝).
|
||
5 All detected objects are then
|
||
CA-MOT utilizes a tracking-by-detection approach, consisting of matched with the tracked objects from the previous frame. If both the
|
||
two steps: detection and association, where the front-end detector em- detected and tracked objects have feature vectors, they are associated
|
||
ploys a pre-existing DNN-based detector to detect and classify objects through feature-based matching. Otherwise, they are associated solely
|
||
in the input image, and the unmatched objects are matched using a based on their locations (⃝).
|
||
6
|
||
|
||
location-based method like IoU. CA-MOT aims at providing prioritized
|
||
tracking accuracy for the safety-critical region with a timing guarantee 3.3. Criticality-aware flexible MOT execution pipeline
|
||
for every MOT execution on limited computing resources by addressing
|
||
C1 and C2 discussed in Section 1, which has the following design goals. The MOT execution pipeline conducts detection and association
|
||
sequentially. CA-MOT can employ any existing stand-alone DNN-based
|
||
• It provides different execution options not only for detection but detectors as long as it can accommodate different sizes of input images
|
||
also association considering different criticality of regions in input (e.g., YOLO series) and offer a clear trade-off between accuracy and
|
||
images. execution time. For each input image with a size of 672 × 672, the
|
||
|
||
3
|
||
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
|
||
|
||
|
||
detector performs the detection to identify the location and class of association for each task at every scheduling decision. The scheduler
|
||
multiple objects in the image. Once the scheduler determines the task manages MOT tasks using a single queue and is triggered when a task
|
||
(associated with an input image) to be scheduled and the execution completes its execution or a new task is released. As three execution
|
||
option for the tasks, the detection is performed for the task according options (e.g., low, middle, and high workloads) are provided for each
|
||
to the execution option. CA-MOT provides three execution options detection and association under CA-MOT, the scheduler decides the
|
||
(i.e., low, middle, and high workloads, respectively) providing a trade- image size (e.g., 256 × 256, 416 × 416, and 672 × 672) for detection
|
||
off between execution time and accuracy. For low and middle workload and feature size (e.g., zero, three, and more than three) for association
|
||
detections, CA-MOT first identifies the RoI with sizes of 256 × 256 and according to the scheduling algorithms (to be presented in Section 4).
|
||
416 × 416, respectively, and then detection is performed on cropped As discussed in Section 2.3, various combinations of image sizes
|
||
RoI, which includes the safety-critical region. The area outside the and the quantity of feature extractions result in different impacts on
|
||
RoI is not subject to detection, and the motion information (e.g., size, tracking accuracy. This is due to an important property of the MOT
|
||
position, velocity, direction) of objects detected in the previous frame system, which involves supplementing non-updated motion or feature
|
||
is used in the prediction models such as the Kalman filter to obtain information during detection and association in the current frame by
|
||
the estimated information of objects in the current frame. On the other utilizing information from the previous frame. For example, in scenar-
|
||
hand, high-workload detection is performed on the original image with ios with low and middle workload detection, the detection process does
|
||
a size of 672 × 672. not cover the area outside the RoI. Instead, the motion information
|
||
We define the area that encompasses all safety-critical objects, of objects detected in the previous frame, such as their size, position,
|
||
which are objects with a time-to-collision of less than two seconds, as velocity, and direction, is leveraged to estimate the corresponding
|
||
the safety-critical area. If the safety-critical area exceeds the input size information for objects in the current frame. Moreover, during the as-
|
||
for the detector, as determined by the detection process (e.g., 256 × 256 sociation step, if the feature extracted from the immediately preceding
|
||
or 416 × 416), the safety-critical area is cropped and resized to the frame is unavailable due to low- and middle-workload associations,
|
||
corresponding dimensions before being fed into the detector model. the feature-based matching algorithm compares the features extracted
|
||
The locations of safety-critical objects are determined based on their from objects in the current frame with the features extracted from the
|
||
most recently computed positions, without projecting future safety- nearest past frames. Therefore, the tracking accuracy is determined
|
||
critical regions from them. There are numerous existing approaches by the reliability of the reused motion and feature information of the
|
||
that calculate time-to-collision based on the relative positions of objects objects. To capture the reliability of the motion and feature information
|
||
and the ego vehicle given LiDAR and IMU data, and we assume the use of objects, we propose a new notion of aging that specifies the number
|
||
of one such method. It is also important to note that the KITTI dataset of middle- or high-workload executions of detection and association
|
||
provides both LiDAR and IMU data. For example, areas where objects conducted from the beginning of the MOT task, respectively. In order
|
||
with a time-to-collision of less than 2 s congregate can be defined as to update the motion and feature information as frequently as possi-
|
||
safety-critical regions, and if multiple such areas exist, the encompassing ble using limited computing resources, it is necessary to balance the
|
||
area that includes all of them would be considered the safety-critical aging of detection and association for each task. Note that increasing
|
||
region. Please note that we adhere to the definition of the safety-critical the aging of detection and association for all tasks simultaneously in
|
||
region as defined in the existing paper DNN-SAM in [5]. It is assumed every MOT execution is generally not feasible due to limited com-
|
||
that the critical region is pre-calculated by external sensors such as puting resources. Therefore, a mechanism is required to balance the
|
||
LiDAR and IMU and provided to CA-MOT. If an input image does not aging of detection and association for all tasks while providing timing
|
||
have a critical region, the entire frame is considered a critical region. guarantees under constrained resources. To this end, we propose new
|
||
As seen in Fig. 2, GPU is used only for the inference of DNN models, scheduling algorithms that will be detailed in the next section.
|
||
such as the detector (e.g., YOLOv5) and feature extractor (e.g., OSNet),
|
||
while all other execution is performed on the CPU. 4. Scheduling algorithm
|
||
For the association, the MOT system uses the two-step approach [7].
|
||
Initially, a CNN-based model (e.g., OS-Net [10]) is employed by the This section presents a task model and proposes new scheduling
|
||
tracker to extract features from the detected objects. The tracker then algorithms building upon CA-MOT.
|
||
compares these features between the current and previous frames to
|
||
identify object pairs with the highest feature similarity. For the re- 4.1. Task model
|
||
maining objects that are not matched based on feature comparison, a
|
||
location-based matching method such as IoU (intersection over union) Targeting MOT systems in AVs that involve 𝑛 camera sensors, we
|
||
is used. CA-MOT also provides three execution options (i.e., low, consider a set 𝜏 consisting of 𝑛 MOT tasks denoted as 𝜏𝑖 ∈ 𝜏. Each
|
||
middle, and high workloads, respectively) for the association. When MOT task 𝜏𝑖 is responsible for conducting MOT execution using input
|
||
it comes to the middle and high workload associations, the tracker images provided periodically by each camera sensor. As we employ
|
||
extracts features from some (e.g., three) of the detected objects or the methodology of tracking-by-detection, an MOT task consists of
|
||
all of the detected objects, and then performs consecutive feature- detection and association sub-tasks. Thus, the specification of each
|
||
based and location-based matchings. On the other hand, low-workload MOT task 𝜏𝑖 is given as 𝜏𝑖 = (𝑇𝑖 , 𝐶𝑖 (𝑠𝑖 , 𝑓𝑖 ), 𝐷𝑖 ), where 𝑇𝑖 represents the
|
||
association performs location-based matching only. Depending on the period (or the minimum inter-arrival time), 𝐶𝑖 (𝑠𝑖 , 𝑓𝑖 ) denotes the worst-
|
||
execution option, CA-MOT may extract features from only a subset or case execution time (WCET) based on the execution options (i.e., low,
|
||
all of the detected objects, which means that the feature information middle, and high-workload execution) for detection and association
|
||
of objects may not be updated every time. Therefore, during feature- sub-tasks, respectively, and 𝐷𝑖 indicates the relative deadline. The
|
||
based matching, the algorithm compares the features extracted from execution time of the detection sub-task depends on the image size
|
||
the objects in the current frame with the closest previously extracted 𝑠𝑖 ∈ 𝑆𝑖 = {𝐿, 𝑀 , 𝐻}, where 𝐿, 𝑀 , 𝐻 are 256 × 256, 416 × 416, and
|
||
features of the tracked objects and matches the two objects with the 672 × 672, respectively. Note that CA-MOT supports arbitrary non-
|
||
highest feature similarity. decreasing sizes for 𝑆𝑖 = {𝐿, 𝑀 , 𝐻}. On the other hand, the execution
|
||
time of the association sub-task depends on the feature size 𝑓𝑖 ∈
|
||
3.4. Aging-aware task scheduler 𝐹𝑖 = {𝐿, 𝑀 , 𝐻}, where 𝐿, 𝑀 , 𝐻 are zero, from one to three, and more
|
||
than three, respectively. Note that the tracking-by-detection methodol-
|
||
The CA-MOT implements a thread-level task scheduler to determine ogy performs the association phase sequentially through feature-based
|
||
the task to be scheduled and execution options for detection and matching followed by location-based matching using IoU. If 𝑓𝑖 is equal
|
||
|
||
4
|
||
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
|
||
|
||
Table 1 On the other hand, the worst-case execution time 𝐶𝑖𝐴 (𝑓𝑖 ) of the associa-
|
||
Notations used in the scheduling algorithms.
|
||
tion task depends on the feature size 𝑓𝑖 . It is calculated by considering
|
||
Symbol Description the time required for extracting features from detected objects and
|
||
𝜏𝑖 Task 𝑖 in the system performing matching methods such as feature-based and IoU-based
|
||
𝑇𝑖 Period of task 𝜏𝑖 (minimum inter-arrival time)
|
||
matching. An MOT task 𝜏𝑖 is considered schedulable if every job 𝐽𝑖
|
||
𝐷𝑖 Relative deadline of task 𝜏𝑖
|
||
(invoked by 𝜏𝑖 ) completes its execution within the relative deadline 𝐷𝑖 .
|
||
𝐶𝑖 (𝑋 , 𝑌 ) Worst-case execution time (WCET) of task 𝑖.
|
||
The overall schedulability of the system is determined by ensuring that
|
||
𝑋: image size for detection (𝐿, 𝑀 , 𝐻)
|
||
𝑌 : feature size for association (𝐿, 𝑀 , 𝐻) every task 𝜏𝑖 ∈ 𝜏 is schedulable.
|
||
𝑠𝑖 Image size for the detection sub-task of task 𝑖
|
||
𝑓𝑖 Feature size for the association sub-task of task 𝑖 4.2. EDF best-effort
|
||
𝑆𝑖 Set of image size options for task 𝑖 (𝑆𝑖 = {𝐿, 𝑀 , 𝐻})
|
||
𝐹𝑖 Set of feature size options for task 𝑖 (𝐹𝑖 = {𝐿, 𝑀 , 𝐻}) Building upon the system design of CA-MOT presented in Section 3,
|
||
𝐿 Low workload execution
|
||
we develop two scheduling algorithms that aim to provide not only
|
||
𝑀 Middle workload execution
|
||
𝐻 High workload execution high tracking accuracy for the safety-critical regions but also a tim-
|
||
𝐶𝑖𝐷 (𝑠𝑖 ) WCET of the detection sub-task of task 𝑖, based on image
|
||
ing guarantee for every MOT execution. To this end, the proposed
|
||
size 𝑠𝑖 scheduling algorithms have the following two features: (F1) an offline
|
||
𝐶𝑖𝐴 (𝑓𝑖 ) WCET of the association sub-task of task 𝑖, based on timing guarantee for the minimum execution (i.e., low-workload exe-
|
||
feature size 𝑓𝑖 cution for both detection and association) of every MOT execution and
|
||
𝑅𝐶𝑖 (𝐿, 𝐿) Remaining execution time for the minimum execution of (F2) an online policy to maximize tracking accuracy by systematically
|
||
task 𝑖 increasing workload (i.e., middle- or high-workload execution) of an
|
||
𝑎𝑔 𝑒𝐷 Aging value of the detection sub-task of task 𝑖 MOT execution using notions of slack and aging without compromising
|
||
𝑖
|
||
𝑎𝑔 𝑒𝐴
|
||
𝑖
|
||
Aging value of the association sub-task of task 𝑖 timing guarantee.
|
||
𝑠𝑙𝑎𝑐 𝑘𝑖𝑡 Slack time available for task 𝑖 at the current time 𝑡𝑐 𝑢𝑟 The proposed scheduling algorithms are based on the non-preem-
|
||
𝑐 𝑢𝑟
|
||
|
||
𝑞𝑖 Minimum execution time of task 𝑖 in the interval ptive earliest deadline first (EDF) scheduling algorithm, which assigns
|
||
[𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 )] higher priority to jobs with earlier deadlines without allowing any
|
||
𝑝 Sum of the minimum execution times for all tasks preemption. To provide the first feature F1, CA-MOT employs the
|
||
𝑑1 (𝑡𝑐 𝑢𝑟 ) Earliest deadline or future release time at time instant 𝑡𝑐 𝑢𝑟 existing schedulability analysis developed for non-preemptive EDF as
|
||
𝑠𝑙𝑎𝑐 𝑘𝐷− Remaining slack after executing high-workload detection follows.
|
||
𝑖
|
||
for task 𝑖
|
||
𝑠𝑙𝑎𝑐 𝑘𝐴− Remaining slack after executing high-workload Lemma 1. For a set 𝜏 of MOT tasks scheduled by non-preemptive EDF,
|
||
𝑖
|
||
association for task 𝑖 minimum execution 𝐶𝑖 (𝐿, 𝐿) of every task 𝜏𝑖 ∈ 𝜏 can be executed without
|
||
deadline miss as long as the following holds for every task 𝜏𝑖 ∈ 𝜏.
|
||
max𝜏𝑖 𝐶𝑖 (𝐿, 𝐿) ∑ 𝐶𝑖 (𝐿, 𝐿)
|
||
+ ≤ 1.0 (2)
|
||
to 𝐿, this indicates that no feature extraction has been performed for min𝜏𝑖 𝑇𝑖 𝜏 ∈𝜏
|
||
𝑇𝑖
|
||
𝑖
|
||
the frame, and thus, feature-based matching is skipped, proceeding
|
||
directly to location-based matching. In the case of 𝐻, we employ the Proof. The lemma presents a schedulability condition for non-preem-
|
||
maximum number of objects as defined by the environment (for the ptive EDF, and its proof is outlined as follows. Let us target 𝜏𝑘 ∈ 𝜏;
|
||
dataset considered, this is based on values measured across all videos), also, consider a virtual task 𝜏𝑥 ∉ 𝜏, whose 𝑇𝑥 and 𝐶𝑥 (𝐿, 𝐿) are set
|
||
for example, 10. Then, the worst-case execution time 𝐶𝑖 (𝑠𝑖 , 𝑓𝑖 ) of each to min𝜏𝑖 ∈𝜏 𝑇𝑖 and max𝜏𝑖 ∈𝜏 𝐶𝑖 (𝐿, 𝐿), respectively. Now, we compare the
|
||
MOT task 𝜏𝑖 is derived as follows. finishing time of a job of 𝜏𝑘 when (Case 1) 𝜏 is scheduled by non-
|
||
preemptive EDF, and (Case 2) 𝜏 ∪ {𝜏𝑥 } is scheduled by preemptive EDF.
|
||
𝐶𝑖 (𝑠𝑖 , 𝑓𝑖 ) = 𝐶𝑖𝐷 (𝑠𝑖 ) + 𝐶𝑖𝐴 (𝑓𝑖 ), (1)
|
||
Since at most one lower-priority job can block a high-priority job under
|
||
where 𝐶𝑖𝐷 (𝑠𝑖 ) and 𝐶𝑖𝐴 (𝑓𝑖 ) are the worst-case execution times of detection non-preemptive scheduling, 𝜏𝑘 can be blocked by at most one lower-
|
||
and association sub-tasks according to 𝑠𝑖 and 𝑓𝑖 , respectively. As shown priority job under Case 1; obviously, the WCET of the lower-priority job
|
||
in Fig. 2, both the detection and the association sub-tasks involve GPU is upper-bounded by max𝜏𝑖 ∈𝜏 𝐶𝑖 (𝐿, 𝐿). Also, to block all the following
|
||
operations, with their respective WCETs including the communication jobs of 𝜏𝑘 , the blocking frequency should be no smaller than 𝑇𝑘 , which
|
||
costs between the CPU and GPU. Note that the detection sub-task, is lower-bounded by min𝜏𝑖 ∈𝜏 𝑇𝑖 . Therefore, the finishing time of a job of
|
||
denoted as 𝜏𝑖𝐷 , and the association sub-task, denoted as 𝜏𝑖𝐴 , are executed 𝜏𝑖 under Case 1 is no later than that under Case 2. Once we apply the
|
||
consecutively without any preemption while sharing the same period well-known schedulability condition for preemptive EDF to Case 2, the
|
||
and relative deadline. Similarly, when an active task is running, it condition is the same as Eq. (2), which proves the lemma. □
|
||
executes without any interruptions, while other tasks wait in the queue.
|
||
Note that the proof is self-contained, but a different proof for
|
||
In addition, each task runs on an environment where non-preemption
|
||
between the GPU and CPU is guaranteed. To ensure this, while the Lemma 1 can be found in [5,17].
|
||
CPU is running, the GPU waits for input from the CPU. Once the GPU To provide the second feature F2, the proposed scheduling al-
|
||
receives the input and is activated, the CPU waits until it receives the gorithms (i) dynamically increase the workload of each MOT task
|
||
results from the GPU, as illustrated in Fig. 2. As seen Fig. 2, GPU (e.g., from low workload to middle or high workload) without compro-
|
||
is used only for the inference of DNN models, such as the detector mising the timing guarantee while (ii) balance the aging of detection
|
||
(e.g., YOLOv5) and feature extractor (e.g., OSNet), while all other and association of every task. We propose two scheduling algorithms
|
||
execution is performed on the CPU. Also, CA-MOT does not allow that simultaneously provide (i) and (ii) in different ways: EDF-BE
|
||
parallel execution for multiple MOT executions (see Table 1). (EDF Best-Effort) and EDF-Slack (EDF with Slack reclamation), adapted
|
||
The worst-case execution time 𝐶𝑖𝐷 (𝑠𝑖 ) of the detection sub-task is from [5]. EDF-BE and EDF-Slack utilize slacks defined differently, but
|
||
determined by the sum of various components, including preprocessing use the same mechanism (in Algorithm 2) to decide on the execution
|
||
time (such as cropping and resizing the input image), image transfer option that employs a notion of aging.
|
||
time from CPU memory to GPU memory, model inference time to Let 𝑑1 (𝑡𝑐 𝑢𝑟 ) be the earliest deadline or future release time among
|
||
𝑡
|
||
obtain candidate objects, and postprocessing time (e.g., applying non- all tasks at a time instant 𝑡𝑐 𝑢𝑟 . The slack 𝑠𝑙𝑎𝑐 𝑘𝑖𝑐 𝑢𝑟 of task 𝜏𝑖 at 𝑡𝑐 𝑢𝑟
|
||
maximum suppression) to extract the final objects from the candidates. under the EDF-BE is defined as the expected remaining time up to
|
||
|
||
5
|
||
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
|
||
|
||
|
||
Algorithm 1 Slack calculation for 𝜏𝑘 at 𝑡𝑐 𝑢𝑟 under EDF-Slack
|
||
Input: 𝜏, 𝑡𝑐 𝑢𝑟
|
||
𝑡
|
||
Output: 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟
|
||
1: 𝑝 = 0, 𝑈 = the left-hand-side of Equation (2)
|
||
2: for 𝑖 = 𝑛 to 1, 𝜏𝑖 ∈ {𝜏1 , ..., 𝜏𝑛 |𝑑1 (𝑡𝑐 𝑢𝑟 ) ≤ ⋯ ≤ 𝑑𝑛 (𝑡𝑐 𝑢𝑟 )} do
|
||
𝐶𝑖 (𝐿, 𝐿)
|
||
3: 𝑈 =𝑈−
|
||
𝑇𝑖
|
||
4: 𝑞𝑖 = max(0, 𝑅𝐶𝑖 (𝐿, 𝐿) − (1 − 𝑈 ) ⋅ (𝑑𝑖 (𝑡𝑐 𝑢𝑟 ) − 𝑑1 (𝑡𝑐 𝑢𝑟 )))
|
||
( 𝑅𝐶𝑖 (𝐿, 𝐿) − 𝑞𝑖 )
|
||
5: 𝑈 = min 1.0, 𝑈 +
|
||
𝑑𝑖 (𝑡𝑐 𝑢𝑟 ) − 𝑑1 (𝑡𝑐 𝑢𝑟 )
|
||
6: 𝑝 = 𝑝 + 𝑞𝑖
|
||
7: end for
|
||
𝑡
|
||
8: return 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 = 𝑑1 (𝑡𝑐 𝑢𝑟 ) − 𝑡𝑐 𝑢𝑟 − 𝑝
|
||
|
||
|
||
|
||
|
||
that does not exceed the earliest deadline or future release, ensuring
|
||
Fig. 3. Execution timeline of multiple MOT tasks under (a) baseline (non-preemptive execution without deadline misses. The following lemma present the
|
||
EDF), (b) EDF-BE, and (c) EDF-Slack scheduling policies.
|
||
timing guarantee of EDF-BE.
|
||
|
||
Theorem 1. A task set 𝜏 that satisfies the condition in Eq. (2) is
|
||
𝑑1 (𝑡𝑐 𝑢𝑟 ) after the execution of 𝐶𝑖 (𝐿, 𝐿) is completed, which is calculated schedulable by EDF-BE .
|
||
by 𝑑1 (𝑡𝑐 𝑢𝑟 ) − 𝑡𝑐 𝑢𝑟 − 𝐶𝑖 (𝐿, 𝐿). This slack value is only valid when there
|
||
are no more than two tasks in the waiting queue at time 𝑡𝑐 𝑢𝑟 and no Proof. According to Lemma 1, for a task set 𝜏 that satisfies Eq. (2),
|
||
future releases within the interval [𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 )). Using the slack value the minimum execution time 𝐶𝑖 (𝐿, 𝐿) of all tasks 𝜏𝑖 ∈ 𝜏 guarantees
|
||
conditionally provided at a scheduling decision, EDF-BE can perform execution without deadline misses. At each scheduling decision at 𝑡
|
||
middle- or high-workload execution for detection and/or association. under the online policy of EDF-BE, the execution of a job exploiting
|
||
any slack value does not impose additional inference on any other job.
|
||
Example. Figs. 3(a) and (b) present a scheduling scenario of the This guarantees that all tasks 𝜏𝑖 receive no more interference than what
|
||
baseline algorithm (i.e., non-preemptive EDF) and EDF-BE with an they would receive under non-preemptive EDF scheduling. Thus, this
|
||
example task set. We consider an example task set 𝜏 = {𝜏1 , 𝜏2 } of theorem holds. □
|
||
which 𝐶𝑖 = 𝐶𝑖𝐷 (𝐻) + 𝐶𝑖𝐴 (𝐻) = 25, 𝑇𝑖 = 25, 𝐶𝑖𝐷 (𝑠𝑖 ) = {5, 9, 12}, and
|
||
𝐶𝑖𝐴 (𝑓𝑖 ) = {3, 8, 13} hold for 𝜏𝑖 ∈ 𝜏. As shown in Figs. 3(a) and (b), each 4.3. EDF with slack reclamation
|
||
first job of 𝜏1 and 𝜏2 are released at 𝑡 = 0 and 𝑡 = 13, respectively. In
|
||
the baseline algorithm, the first job of 𝜏1 executes for 25 time units, In the case of EDF-BE, more workload than the minimum execution
|
||
and then the first job of 𝜏2 starts its execution at 𝑡 = 25 resulting in can only be processed when there is a single job in the waiting queue at
|
||
a deadline miss at 𝑡 = 38. Let 𝑎𝑔 𝑒𝐷 𝐴
|
||
𝑖 and 𝑎𝑔 𝑒𝑖 be the aging value of a given time 𝑡𝑐 𝑢𝑟 and no additional releases occur until 𝑑1 (𝑡𝑐 𝑢𝑟 ). This cre-
|
||
detection and association of 𝜏𝑖 . The aging value is an integer satisfying ates a limited opportunity for MOT tasks in CA-MOT to perform more
|
||
𝑎𝑔 𝑒𝐷 𝐴 𝐷 𝐴
|
||
𝑖 , 𝑎𝑔 𝑒𝑖 ≥ 0, and 𝑎𝑔 𝑒𝑖 and 𝑎𝑔 𝑒𝑖 for all task 𝜏𝑖 ∈ 𝜏 are set to zero workload than the minimum execution, thus restricting the potential
|
||
at the beginning of the system. The, 𝑎𝑔 𝑒𝐷 𝐴
|
||
𝑖 (and 𝑎𝑔 𝑒𝑖 ) increases by one to improve tracking accuracy. To address this limitation, we integrate
|
||
at each time when a detection (and association) is run with middle- or the approach presented in [5] into EDF-Slack, allowing it to compute
|
||
high-workload. In other words, the aging value refers to the number of slack in a different way than EDF-BE.
|
||
executions excluding those with low workloads. By adjusting the aging Let 𝑑𝑖 (𝑡𝑐 𝑢𝑟 ) denote the 𝑖th earliest deadline or release time at 𝑡𝑐 𝑢𝑟 , and
|
||
value, a balance is maintained so that neither detection nor association 𝑅𝐶𝑖 (𝐿, 𝐿) represent the remaining execution time required to complete
|
||
becomes disproportionately large. the minimum execution 𝐶𝑖 (𝐿, 𝐿). Algorithm 1 outlines the calculation
|
||
𝑡
|
||
of the slack value 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 for task 𝜏𝑘 at 𝑡𝑐 𝑢𝑟 within the EDF-Slack
|
||
Compared to EDF-Slack, EDF-BE is a simpler algorithm that utilizes
|
||
algorithm, triggered at each scheduling decision. Since EDF is a job-
|
||
as many resources as possible, executing a job for greater than 𝐶𝑖 (𝐿, 𝐿)
|
||
level fixed-priority scheduling policy, wherein the priority of a job
|
||
up to its closest future release only when there is exactly one job in the
|
||
remains constant throughout its execution, scheduling decisions under
|
||
waiting queue. EDF-BE naturally guarantees no deadline misses in any
|
||
EDF occur either at the commencement of a job’s execution or upon its
|
||
job execution. This is because, as stated in Lemma 1, the execution of
|
||
completion. In the interval [𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 )], EDF-Slack processes tasks in
|
||
𝐶𝑖 (𝐿, 𝐿) without deadline misses for all jobs is guaranteed under EDF.
|
||
reverse EDF order, starting from the task with the latest deadline. Job
|
||
Furthermore, when a job executes for more than 𝐶𝑖 (𝐿, 𝐿) under EDF-
|
||
𝐽𝑘 of 𝜏𝑘 has the highest priority at 𝑡𝑐 𝑢𝑟 , with 𝑑1 (𝑡𝑐 𝑢𝑟 ) being its deadline,
|
||
BE, there is only one active job in the waiting queue at that time. In
|
||
as EDF-Slack follows the EDF policy. The goal of the slack calculation
|
||
the case of EDF-BE, when the first job of task 𝜏1 starts its execution
|
||
in Algorithm 1 is to delay the execution of all other tasks 𝜏𝑖 ∈ 𝜏 ⧵ 𝜏𝑘
|
||
at 𝑡 = 0, it executes the minimum execution 𝐶1 (𝐿, 𝐿) = 8 until the
|
||
beyond 𝑑1 (𝑡𝑐 𝑢𝑟 ) while ensuring that future deadlines are met. This is
|
||
earliest deadline or future release at 𝑡 = 13, resulting in a slack of five. repeated for all tasks in the waiting queue. To ensure 𝜏𝑖 completes
|
||
Utilizing this slack, the task 𝜏1 then executes 𝐶1 (𝑀 , 𝐿), and the aging 𝐶𝑖 (𝐿, 𝐿) before 𝑑𝑖 (𝑡𝑐 𝑢𝑟 ), EDF-Slack calculates the maximum execution
|
||
factor 𝑎𝑔 𝑒𝐷
|
||
𝑖 increases by one. For the first job of task 𝜏2 , released at time in the interval 𝑑1 (𝑡𝑐 𝑢𝑟 ), 𝑑𝑖 (𝑡𝑐 𝑢𝑟 ), which is (1 − 𝑈 ) ⋅ (𝑑𝑖 (𝑡𝑐 𝑢𝑟 ) − 𝑑1 (𝑡𝑐 𝑢𝑟 )),
|
||
𝑡 = 13, there is a slack of 4 until the earliest deadline or future release where 𝑈 denote the left-hand-side of Eq. (2).
|
||
at 𝑡 = 25. Thus, 𝐶2 (𝑀 , 𝐿) is executed, and 𝑎𝑔 𝑒𝐷 𝑖 increases by one. The The key steps in the slack calculation are as follows:
|
||
second job of task 𝜏1 , released at 𝑡 = 13, has a slack of 5 until the earliest
|
||
deadline or future release at 𝑡 = 38. To balance 𝑎𝑔 𝑒𝐷 𝐴
|
||
𝑖 and 𝑎𝑔 𝑒𝑖 , 𝐶1 (𝐿, 𝑀) • 𝑞𝑖 is computed as the minimum execution of 𝜏𝑖 in the interval
|
||
𝐴
|
||
is executed, and 𝑎𝑔 𝑒𝑖 increases by one. The details of the online policy 𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 ) (Lines 3–4).
|
||
that effectively balances the aging of detection and association for each • 𝑅𝐶𝑖 (𝐿, 𝐿) is either zero or 𝐶𝑖 (𝐿, 𝐿), since scheduling decisions
|
||
task will be provided at the end of this section. As can be observed are only made upon job completion or release in non-preemptive
|
||
from the figure, at each scheduling decision, an execution is performed scheduling (Line 4).
|
||
|
||
6
|
||
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
|
||
|
||
|
||
• The execution rate of 𝜏𝑖 in the interval 𝑑1 (𝑡𝑐 𝑢𝑟 ), 𝑑𝑖 (𝑡𝑐 𝑢𝑟 ) is calculated Algorithm 2 Determination of execution options
|
||
and recorded (Line 5). 𝑡
|
||
Input: 𝜏, 𝑡𝑐 𝑢𝑟 , 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟
|
||
• 𝑝 is set as the sum of the minimum execution times of all tasks
|
||
Output: (𝑠𝑘 , 𝑓𝑘 )
|
||
𝜏𝑖 ∈ 𝜏 (Line 6). 𝑡
|
||
1: if 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 ≤ 0 then
|
||
• The slack is then determined as the remaining time slots, exclud-
|
||
2: return (𝐿, 𝐿)
|
||
ing 𝑝 (i.e., the sum of 𝑞𝑖 ), within the interval 𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 ) (Line
|
||
3: else
|
||
7).
|
||
4: if 𝑎𝑔 𝑒𝐷
|
||
𝑘
|
||
≤ 𝑎𝑔 𝑒𝐴
|
||
𝑘
|
||
then
|
||
𝑡
|
||
Example. Fig. 3(c) illustrates a scheduling scenario of EDF-Slack using 5: 𝑠𝑙𝑎𝑐 𝑘𝐷−
|
||
𝑘
|
||
= 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 − (𝐶𝑘𝐷 (𝐻) − 𝐶𝑘𝐷 (𝐿))
|
||
the same example tasks as shown in Figs. 3(a) and (b). The initial jobs 6: if 𝑠𝑙𝑎𝑐 𝑘𝐷−
|
||
𝑘
|
||
≥ 0 then
|
||
of 𝜏1 and 𝜏2 are released at 𝑡 = 0 and 𝑡 = 13, respectively. Applying 7: return (𝐻 , 𝑓𝑘 (𝑠𝑙𝑎𝑐 𝑘𝐷− 𝑘
|
||
+ 𝐶𝑘𝐴 (𝐿)))
|
||
Algorithm 1, the calculated slack value for 𝜏1 at 𝑡𝑐 𝑢𝑟 = 0 is 17, allowing 8: else
|
||
𝑡
|
||
the first job of 𝜏1 to execute for 𝐶1 (𝐻 , 𝐻) until 𝑡 = 25. Furthermore, 9: return (𝑠𝑘 (𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 + 𝐶𝑘𝐷 (𝐿)), 𝐿)
|
||
𝑎𝑔 𝑒𝐷
|
||
1
|
||
and 𝑎𝑔 𝑒𝐴1
|
||
increment by one. Subsequently, the first job of 𝜏2 begins 10: end if
|
||
its execution at 𝑡 = 25, executing for 𝐶2 (𝑀 , 𝐿) while increasing 𝑎𝑔 𝑒𝐷 2
|
||
11: else
|
||
𝑡
|
||
by one. Finally, the second job of 𝜏2 starts its execution at 𝑡 = 37. 12: 𝑠𝑙𝑎𝑐 𝑘𝐴−
|
||
𝑘
|
||
= 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 − (𝐶𝑘𝐴 (𝐻) − 𝐶𝑘𝐴 (𝐿))
|
||
𝐴−
|
||
if 𝑠𝑙𝑎𝑐 𝑘𝑘 ≥ 0 then
|
||
Comparing Fig. 3(b) that represents EDF-BE with Fig. 3(c) depicting 13:
|
||
EDF-Slack, we observe that the aging of 𝜏1 and 𝜏2 increases in the same 14: return (𝑠𝑘 (𝑠𝑙𝑎𝑐 𝑘𝐴− 𝑘
|
||
+ 𝐶𝑘𝐷 (𝐿)), 𝐻)
|
||
amount in both cases. However, the key difference lies in the execution 15: else
|
||
𝑡
|
||
of the first job of 𝜏1 . Under EDF-Slack, this job is able to execute with a 16: return (𝐿, 𝑓𝑘 (𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 + 𝐶𝑘𝐴 (𝐿)))
|
||
high-workload execution, while under EDF-BE, it can only execute with 17: end if
|
||
a middle-workload execution, which allows for higher expectations of 18: end if
|
||
tracking accuracy in EDF-Slack. 19: end if
|
||
|
||
The following proves the timing guarantee of EDF-Slack.
|
||
Theorem 2. A task set 𝜏 that satisfies Eq. (2) is schedulable under
|
||
EDF-Slack . • If the slack is less than or equal to zero, the algorithm returns 𝐿
|
||
and 𝐿 (Lines 1–2).
|
||
Proof. We prove this by contradiction. Assume, for the sake of con- • Otherwise, the algorithm compares the ages of the detection step
|
||
tradiction, that the task set 𝜏 satisfies Eq. (2), but is not schedulable (𝑎𝑔 𝑒𝐷
|
||
𝑘
|
||
) and the association step (𝑎𝑔 𝑒𝐴
|
||
𝑘
|
||
) (Lines 3–4).
|
||
under EDF-Slack. This implies that at some time 𝑡, the total utilization
|
||
exceeds 1.0, and hence a deadline miss occurs for some job 𝐽𝑖 in 𝜏. – If 𝑎𝑔 𝑒𝐷
|
||
𝑘
|
||
is smaller than 𝑎𝑔 𝑒𝐴 𝑘
|
||
, indicating the detection step
|
||
Let 𝑡𝑚𝑖𝑠𝑠 denote the earliest such time at which a deadline miss occurs, requires more resources, the algorithm calculates 𝑠𝑙𝑎𝑐 𝑘𝐷− 𝑘
|
||
,
|
||
i.e., 𝑡𝑚𝑖𝑠𝑠 = 𝑑𝑖 , where 𝑑𝑖 is the deadline of 𝐽𝑖 . By the definition of EDF- representing the remaining slack after executing
|
||
Slack, at each time 𝑡, the slack time for each task is computed based high-workload detection (Line 5).
|
||
on the highest-priority job 𝐽1 (𝑡𝑐 𝑢𝑟 ), where 𝑡𝑐 𝑢𝑟 denotes the current time. – If 𝑠𝑙𝑎𝑐 𝑘𝐷−
|
||
𝑘
|
||
is greater than or equal to zero, the high-workload
|
||
Since no tasks are released in the interval [𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 )], the slack time detection is followed by middle- or high-workload associa-
|
||
ensures that lower-priority tasks cannot block the execution of 𝐽1 . As a tion depending on 𝑠𝑙𝑎𝑐 𝑘𝐷− 𝑘
|
||
(Lines 6–7). In this case, 𝑓𝑘 (𝑥) is
|
||
result, the blocking term in Eq. (2) remains valid during this interval. set as follows:
|
||
Now, since EDF-Slack is based on EDF scheduling, the total utiliza- ∗ 𝐿 for 𝑥 < 𝐶𝑘𝐴 (𝑀),
|
||
tion 𝑈 (𝑡) at any time 𝑡 can be expressed as:
|
||
∑ ∗ 𝑀 for 𝐶𝑘𝐴 (𝑀) ≤ 𝑥 < 𝐶𝑘𝐴 (𝐻),
|
||
𝐶𝑖
|
||
𝑈 (𝑡) = + 𝐵(𝑡), ∗ 𝐻 for 𝑥 ≥ 𝐶𝑘𝐴 (𝐻).
|
||
𝑑
|
||
𝐽 ∈𝜏(𝑡) 𝑖
|
||
− 𝑡𝑐 𝑢𝑟
|
||
𝑖
|
||
|
||
where 𝐶𝑖 is the remaining execution time of task 𝐽𝑖 , and 𝐵(𝑡) is the – If 𝑠𝑙𝑎𝑐 𝑘𝐷−
|
||
𝑘
|
||
is less than zero, the algorithm determines if
|
||
blocking term. According to Eq. (2), 𝑈 (𝑡) ≤ 1.0 for all 𝑡. Since 𝑡𝑚𝑖𝑠𝑠 is middle- or high-workload detection can be performed based
|
||
𝑡
|
||
the earliest time a deadline miss occurs, we must have 𝑈 (𝑡𝑚𝑖𝑠𝑠 ) > 1.0. on 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 , followed by low-workload association (Lines
|
||
However, by Eq. (2), we know that 𝑈 (𝑡) ≤ 1.0 for all 𝑡 ≥ 𝑡𝑐 𝑢𝑟 , including 8–10). In this case, 𝑠𝑘 (𝑥) is set as follows:
|
||
𝑡𝑚𝑖𝑠𝑠 . This leads to a contradiction, as the assumption that 𝑈 (𝑡𝑚𝑖𝑠𝑠 ) > ∗ 𝐿 for 𝑥 < 𝐶𝑘𝐷 (𝑀),
|
||
1.0 contradicts the fact that 𝑈 (𝑡) ≤ 1.0 holds at all times. Therefore,
|
||
∗ 𝑀 for 𝐶𝑘𝐷 (𝑀) ≤ 𝑥 < 𝐶𝑘𝐷 (𝐻),
|
||
no deadline miss can occur, and the task set 𝜏 is schedulable under
|
||
EDF-Slack. □ ∗ 𝐻 for 𝑥 ≥ 𝐶𝑘𝐷 (𝐻).
|
||
|
||
Note that the proof is self-contained, but a different proof can be • Lines 11–18 follow a similar procedure for determining the ex-
|
||
found in [5]. ecution options, giving preference to the association step. Here,
|
||
Determination of execution options. EDF-BE and EDF-Slack use 𝑠𝑙𝑎𝑐 𝑘𝐷− represents the remaining slack after executing the high-
|
||
𝑘
|
||
different slack concepts to ensure timely execution of tasks while im- workload association.
|
||
proving tracking accuracy by executing beyond the minimum
|
||
According to the definition of aging, 𝑎𝑔 𝑒𝐷 𝑘
|
||
(and 𝑎𝑔 𝑒𝐴
|
||
𝑘
|
||
) increase
|
||
(i.e., 𝐶𝑖 (𝐿, 𝐿)). As shown in Figs. 3(b) and (c), both EDF-BE and
|
||
by one when middle- or high-workload detection (and association) is
|
||
EDF-Slack enhance the aging of detection and association through
|
||
performed.
|
||
predefined mechanisms. The goal of these mechanisms is to balance the
|
||
DNN-SAM proposed in [10] introduces two scheduling algorithms:
|
||
aging of detection and association, minimizing continuous omissions in
|
||
EDF-MandFirst and EDF-Slack. Unlike CA-MOT, both DNN-SAM al-
|
||
updating motion and feature information, thereby maximizing tracking
|
||
gorithms target multi-object detection (MOD) tasks. The primary dis-
|
||
accuracy. tinction between MOT and MOD lies in the presence or absence of
|
||
Algorithm 2 outlines the process for determining the execution dependencies between consecutive frames. In MOD, the detection oper-
|
||
options for the detection and association steps of task 𝜏𝑘 at time 𝑡𝑐 𝑢𝑟 ation for a given frame does not utilize any information from previous
|
||
𝑡
|
||
based on the slack 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 calculated in Algorithm 1. frames. Therefore, techniques that rely on previous frame information,
|
||
|
||
7
|
||
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
|
||
|
||
|
||
such as aging-aware methods, cannot be employed in the DNN-SAM al- Table 2
|
||
gorithms. Another key difference is that DNN-SAM is responsible solely Execution time measurement (average and maximum) in terms of image size, feature
|
||
size, and scheduling overhead.
|
||
for detection execution and does not handle the association task. Both
|
||
Time (ms) 𝐶𝑖𝐷 𝐶𝑖𝐴 𝐶𝑖𝑠𝑐 ℎ𝑒
|
||
DNN-SAM and CA-MOT algorithms are based on EDF and prioritize
|
||
executing jobs with the earliest deadlines among the released tasks. L M H L M H
|
||
|
||
However, in contrast to CA-MOT, DNN-SAM splits each job at release Average 28.0 30.6 36.7 8.3 63.4 74.4 0.3
|
||
Maximum 43.6 53.5 67.6 11.3 74.0 125.2 0.6
|
||
into a mandatory job, responsible for execution in the safety-critical
|
||
area, and an optional job, responsible for execution in non-critical
|
||
areas. When any mandatory job is present in the waiting queue, it
|
||
is always executed first using the EDF algorithm. The distinction be-
|
||
tween MandFirst and EDF-Slack arises from whether the execution
|
||
of an optional job may interfere with the execution of a mandatory
|
||
job. Specifically, the scheduling behavior of DNN-SAM and EDF-Slack
|
||
operates as follows:
|
||
|
||
• EDF-MandFirst in [10]: Any mandatory job in the waiting queue
|
||
has a higher priority than optional jobs and is scheduled using
|
||
EDF. If no mandatory jobs are in the queue, optional jobs are
|
||
executed using EDF, ensuring that they do not interfere with the Fig. 4. Comparison for two tasks with the periods (equal to the relative deadlines) of
|
||
execution of future release jobs of mandatory tasks. 180 ms and 270 ms.
|
||
• EDF-Slack in [10]: Any mandatory job in the waiting queue has
|
||
a higher priority than optional jobs and is scheduled using EDF.
|
||
If no mandatory jobs are in the queue, optional jobs are executed the ground truth of objects in all frames with the tracking results
|
||
using EDF, potentially interfering with the execution of future- obtained from the given techniques to measure accuracy. The
|
||
release mandatory jobs, within the slack calculated from the job’s KITTI dataset consists solely of data captured from forward-facing
|
||
runtime. cameras and does not utilize different cameras, meaning there
|
||
• EDF-BE of CA-MOT: A job is not split and has three execu- is no overlap in the areas they cover. Additionally, as it does
|
||
tion options for both detection and association. It is executed not assume simultaneous capture by each camera, there are no
|
||
with the maximum workload option to avoid interfering with synchronization issues. CA-MOT aims to maximize the average
|
||
the execution of future release jobs, but only when exactly one accuracy for the MOT tasks corresponding to all given cameras
|
||
job is present in the waiting queue. The aging of detection and without missing any deadlines. This assumes that CA-MOT oper-
|
||
association tasks is considered for accuracy maximization. ates independently of camera interdependencies, with all cameras
|
||
• EDF-Slack of CA-MOT: A job is not split and has three execution receiving the same forward-facing camera feed.
|
||
options for both detection and association. Regardless of the • Execution time measurement: To obtain the WCET of different
|
||
number of jobs in the waiting queue, the job is executed with execution options for detection and association, we measured the
|
||
the maximum workload option, potentially interfering with the execution time by iterating 1000 times for each sub-tasks with
|
||
execution of future release jobs based on the slack calculated three different execution options of an MOT task and then took
|
||
from its runtime. The aging of detection and association tasks is the largest value. We also measured the worst-case time required
|
||
considered for accuracy maximization. for slack calculation and scheduling decisions such as Algorithms
|
||
1 and 2. Table 2 shows the measurement results.
|
||
5. Evaluation
|
||
5.2. Experiment result
|
||
This section evaluates the effectiveness of CA-MOT in achieving R1
|
||
and R2 for multiple MOT tasks.
|
||
We consider task sets in which schedulability is not guaranteed with
|
||
the high-workload execution for detection and association
|
||
5.1. Experiment setting
|
||
(i.e., 𝐶𝑖 (𝐻 , 𝐻)) for all tasks but is guaranteed with the minimum
|
||
execution (𝐶𝑖 (𝐿, 𝐿)) according to Eq. (2). Note that the schedulability
|
||
• Software: CA-MOT employs the tracking-by-detection of which
|
||
with 𝐶𝑖 (𝑥, 𝑦) for 𝑥, 𝑦 ∈ {𝐿, 𝑀 , 𝐻} can be judged with Eq. (2) by
|
||
the detector is one of the most popular detectors, YOLOv5 [14]
|
||
substituting 𝐶𝑖 (𝐿, 𝐿) to 𝐶𝑖 (𝑥, 𝑦). To evaluate the effectiveness of CA-
|
||
model, and tracker is StrongSORT [7]. We confirmed that other
|
||
MOT we consider the following including a baseline and our two
|
||
detectors (i.e. YOLOX, Faster-RCNN) exhibit a similar trend to
|
||
proposed approaches.
|
||
YOLOv5 in terms of MOTA and execution time, as shown in Fig. 7.
|
||
For feature extraction conducted as a part of association, we used • Detection first (DF): non-preemptive EDF in which the execution
|
||
OS-Net [10]. The YOLOv5 model was pretrained on the COCO option of all tasks 𝜏𝑖 ∈ 𝜏 is equally fixed to the rightmost
|
||
Dataset [18], while OS-Net was pretrained on the MSMT Dataset
|
||
one among {𝐶𝑖 (𝐿, 𝐿), 𝐶𝑖 (𝑀 , 𝐿), 𝐶𝑖 (𝐻 , 𝐿), 𝐶𝑖 (𝐻 , 𝑀), 𝐶𝑖 (𝐻 , 𝐻)} that
|
||
[19]. The experimental environment is with Ubuntu 18.04.6 LTS,
|
||
satisfies the schedulability condition in Eq. (2).
|
||
CUDA 11.4, and PyTorch 1.12.
|
||
• EDF-BE: EDF-BE of which task set passes the schedulability con-
|
||
• Hardware: We consider the NVIDIA Jetson Xavier as a GPU-
|
||
dition in Eq. (2), which is proposed in Section 4.2.
|
||
enabled embedded board [20]. The NVIDIA Jetson Xavier features
|
||
• EDF-Slack: EDF-Slack of which task set passes the schedulability
|
||
a 64-bit 8-core CPU, 32 GB Memory, and 512-core Volta GPU. We
|
||
condition in Eq. (2), which is proposed in Section 4.3.
|
||
utilized the MAXN mode provided by the NVIDIA Jetson Xavier.
|
||
• Dataset and performance metric: We used the KITTI Dataset Fig. 4 represents the tracking accuracy and the proportion of three
|
||
[13], which contains data collected from autonomous vehicle execution options (i.e., 𝐿, 𝑀, and 𝐻) selected during detection and
|
||
driving. To evaluate the accuracy of each region, we measured association for two tasks with different periods: 180 and 270 ms
|
||
the MOTA [15] as the most well-known performance metric for (milliseconds). As shown in Fig. 4(a), for overall accuracy, EDF-BE and
|
||
tracking accuracy for critical and entire regions. MOTA compares EDF-Slack achieve 20.2% and 26.6%, respectively, while DF achieves
|
||
|
||
8
|
||
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
|
||
|
||
|
||
|
||
|
||
Fig. 5. Comparison for four tasks with the same period (equal to the relative deadline) Fig. 6. Visualization on KITTI dataset for three tasks with the periods of 400 ms. (For
|
||
of 400 ms. interpretation of the references to color in this figure legend, the reader is referred to
|
||
the web version of this article.)
|
||
|
||
|
||
13.4%, which demonstrate the effectiveness of slack utilization and
|
||
balancing aging of detection and association in increasing tracking
|
||
accuracy. We observe that the slack reclamation performed by Algo-
|
||
rithm 1 in EDF-Slack is significantly more effective in achieving high
|
||
tracking accuracy than in EDF-BE which has limitations in obtaining
|
||
a substantial amount of slack. For critical accuracy, EDF-BE and EDF-
|
||
Slack achieve much higher accuracies, which are 28.3% and 32.2%,
|
||
respectively, compared to 15.4% of DF. Based on this observation,
|
||
we can interpret that even though EDF-BE obtains a smaller amount
|
||
Fig. 7. MOTA and execution time on other detectors.
|
||
of slack compared to EDF-Slack, it efficiently performs tracking for
|
||
the critical region with limited computing resources. On the other
|
||
hand, EDF-Slack provides high tracking accuracy not only for the entire
|
||
region but also for the safety-critical region, thanks to its efficient slack Additional experiments are conducted to ascertain if CA-MOT ex-
|
||
reclamation. As seen in Fig. 4, EDF-Slack exhibits a significantly higher hibits comparable behavioral patterns across a range of detectors,
|
||
proportion of high-workload execution and middle-workload execution including YOLOv5, which was evaluated previously. Fig. 7 displays
|
||
for detection and association compared to other execution options. On the MOTA and execution time for various contemporary detectors,
|
||
the other hand, EDF-BE shows a slight proportion of middle-workload analyzed according to their workload. Modern detectors are generally
|
||
execution, while the majority of cases involve low-workload execution. classified into one-stage and two-stage categories based on their archi-
|
||
Fig. 5 depicts the results of another experiment involving three tecture and further into anchor-free and anchor-based types, contingent
|
||
different sets of tasks, with the number of tasks ranging from two on their use of predefined anchors for object detection. Our study
|
||
to four, all having the same periods (i.e., 400 ms with a guaranteed incorporated YOLOv5, a standard one-stage anchor-based detector. We
|
||
minimum execution 𝐶𝑖 (𝐿, 𝐿), but no guaranteed maximum execution also investigate the performance of the two-stage anchor-based detector
|
||
𝐶𝑖 (𝐻 , 𝐻) for 𝜏𝑖 ∈ 𝜏). In Fig. 5(a), the tracking accuracy of the evaluated Faster-RCNN [9] and the one-stage anchor-free detector YOLOX [21],
|
||
approaches is shown as the number of tasks increases. For the case to verify the consistency of results. Faster-RCNN utilized ResNet-50
|
||
of two tasks, EDF-Slack achieves an overall accuracy of 41.8% and a as its backbone network, while YOLOX was configured with a small
|
||
critical accuracy of 41.4%, while EDF-BE achieves an overall accuracy version model. Both models were trained using the COCO dataset.
|
||
of 24.3% and a critical accuracy of 27.2%. In contrast, DF achieves Despite minor discrepancies in specific ratios, the results consistently
|
||
lower accuracy, with an overall accuracy of 18.0% and a critical demonstrate that both MOTA and execution time escalate in conjunc-
|
||
accuracy of 18.7%. As the number of tasks increases, both EDF-BE and tion with increasing workload, as shown in Figs. 7(a) and (b). The
|
||
EDF-Slack experience a decrease in accuracy, but they still outperform runtime trend of YOLOX is particularly noteworthy, which closely
|
||
DF in terms of tracking accuracy. Even with only four tasks, EDF- mirrors that of YOLOv5. This pattern indicates that similar outcomes
|
||
BE yields lower overall accuracy than DF, as it can only detect part may be expected from other detectors akin to YOLOv5.
|
||
of the image when selecting the high workload option. Nevertheless,
|
||
by prioritizing computations in critical regions at low and medium
|
||
6. Related work
|
||
workloads, EDF-BE attains higher critical accuracy than DF. Fig. 5(b)
|
||
presents the distribution of execution options for EDF-BE and EDF-Slack
|
||
The tracking-by-detection model is a commonly used method in
|
||
when there are three tasks. Similar to Fig. 4, it is evident that both
|
||
the MOT field. It has shown significant progress and enhanced perfor-
|
||
EDF-BE and EDF-Slack allocate the workload between the detection and
|
||
association steps in a balanced manner using the ages. Additionally, mance recently, largely thanks to the evolution of deep neural networks
|
||
EDF-Slack can reclaim more slack compared to EDF-BE. (DNNs). A well-recognized model in this field, SORT (simple online
|
||
Fig. 6 presents the tracking outcomes of a single task within a set and real-time tracking) [22], does its matching based mainly on where
|
||
of three tasks, each with a period of 400 ms, comparing (a) the DF objects are located, using detection tools to achieve this. To push this
|
||
algorithm and (b) EDF-Slack. In the visualization, each tracked object model further, DeepSORT [6] builds on the SORT model by adding a
|
||
is represented by a unique color and ID within a bounding box, with DNN-based re-identification model. This allows for the extraction of ob-
|
||
the symbol ‘‘#’’ indicating the frame number. In the DF scenario, each ject features. By adding this layer, DeepSORT utilizes both the object’s
|
||
task executes 𝐶𝑖 (𝐻 , 𝐿), leading to insufficient computational resources location and its visual information, leading to a stronger performance.
|
||
for proper association. This inadequacy results in DF’s failure to track Recent work in this area, such as Deep OC-SORT [23] and Strong-
|
||
two objects within the safety-critical region in the 167th frame and SORT [7], is geared towards enhancing the accuracy of these models
|
||
causes an ID switch from 4 to 10 in the subsequent 168th frame, even more, focusing especially on refining and improving the matching
|
||
as illustrated in Fig. 6. Conversely, EDF-Slack leverages aging and algorithms used in these systems. However, it is critical to understand
|
||
slack techniques to allocate sufficient computational resources for both that these approaches are mainly designed for situations where there
|
||
detection and association tasks, enabling accurate tracking of all objects are plenty of computing resources. Therefore, they might struggle to
|
||
in the safety-critical region. meet the timing needs in systems that are restricted in resources, like
|
||
|
||
9
|
||
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
|
||
|
||
|
||
the embedded systems in self-driving vehicles where resources may be Declaration of competing interest
|
||
scarce.
|
||
Considering self-driving vehicles, which are fundamentally systems The authors declare the following financial interests/personal rela-
|
||
where safety is critical, even the smallest delays or slight drops in tionships which may be considered as potential competing interests:
|
||
accuracy can lead to significant and potentially dangerous risks. Some Hyeongboo Baek reports financial support was provided by National
|
||
research such as DNN-SAM [5] has tried to tackle these problems by Research Foundation of Korea. If there are other authors, they declare
|
||
suggesting frameworks that concentrate specifically on safety-critical that they have no known competing financial interests or personal
|
||
areas. These frameworks give priority to critical accuracy and use relationships that could have appeared to influence the work reported
|
||
uncertainty handling to ensure the highest safety standards. However, in this paper.
|
||
these research studies and their related approaches are mostly designed
|
||
for multi-object detection systems and may not directly apply to or be Acknowledgments
|
||
effective in multi-object tracking. Likewise, another study, RT-MOT [2],
|
||
aims to maximize the overall accuracy of multi-object tracking and This work was supported by the National Research Foundation of
|
||
ensure on-time execution, but it overlooks the importance of individual Korea (NRF) grant funded by the Korea government (MSIT) (RS-2023-
|
||
objects in its approach. To address these limitations, our suggested 00250742, 2022R1A4A3018824, RS-2024-00438248). This work was
|
||
framework, known as CA-MOT, aims to confront these challenges partly supported by the Institute of Information & Communications
|
||
directly. By leveraging the unique traits of multi-object tracking in Technology Planning & Evaluation(IITP)-ITRC(Information Technol-
|
||
safety-critical systems, CA-MOT ensures on-time execution and boosts ogy Research Center) grant funded by the Korea government(MSIT)
|
||
tracking accuracy for objects that could potentially be dangerous to the (IITP-2025-RS-2023-00259061).
|
||
system. It builds on previous work while addressing their weaknesses
|
||
to create a safer and more efficient tracking system. Data availability
|
||
|
||
Data will be made available on request.
|
||
7. Discussion
|
||
|
||
A limitation of CA-MOT is its exclusive reliance on a single CPU and References
|
||
GPU, which restricts scalability. A recent approach, Batch-MOT [3],
|
||
addresses this limitation by processing input images from multiple [1] M. Yang, S. Wang, J. Bakita, T. Vu, F.D. Smith, J.H. Anderson, J.-M. Frahm,
|
||
Re-thinking CNN frameworks for time-sensitive autonomous-driving applications:
|
||
cameras through a shared queue, distributing CPU operations across Addressing an industrial challenge, in: Proceedings of IEEE Real-Time Technology
|
||
multiple CPUs, and employing batch processing on a single GPU. How- and Applications Symposium, IEEE, 2019, pp. 305–317.
|
||
ever, this approach may introduce additional communication overhead [2] D. Kang, S. Lee, H.S. Chwa, S.-H. Bae, C.M. Kang, J. Lee, H. Baek, RT-MOT:
|
||
among CPUs, potentially determining its overall efficiency. The primary Confidence-aware real-time scheduling framework for multi-object tracking tasks,
|
||
in: Proceedings of IEEE Real-Time Systems Symposium, IEEE, 2022, pp. 318–330.
|
||
contribution of Batch-MOT lies in its online schedulability analysis,
|
||
[3] D. Kang, S. Lee, C.-H. Hong, J. Lee, H. Baek, Batch-MOT: Batch-enabled real-
|
||
which dynamically determines the maximum number of images that time scheduling for multi-object tracking tasks, IEEE Trans. Comput.-Aided Des.
|
||
can be batch-processed without violating their deadlines. Nonethe- Integr. Circuits Syst. (2024).
|
||
less, unlike CA-MOT, Batch-MOT lacks support for multiple execution [4] S. Liu, X. Fu, M. Wigness, P. David, S. Yao, L. Sha, T. Abdelzaher, Self-cueing
|
||
real-time attention scheduling in criticality-aware visual machine perception, in:
|
||
strategies during the association phase, resulting in suboptimal resource
|
||
Proceedings of IEEE Real-Time Technology and Applications Symposium, IEEE,
|
||
utilization for individual MOT tasks. Enhancing CA-MOT by incor- 2022, pp. 173–186.
|
||
porating batch processing capabilities to address these shortcomings [5] W. Kang, S. Chung, J.Y. Kim, Y. Lee, K. Lee, J. Lee, K.G. Shin, H.S. Chwa,
|
||
presents a promising avenue for future research. Furthermore, as high- DNN-SAM: Split-and-merge DNN execution for real-time object detection, in:
|
||
Proceedings of IEEE Real-Time Technology and Applications Symposium, 2022,
|
||
lighted in previous studies, deploying CA-MOT on real-world platforms,
|
||
URL https://rtcl.dgist.ac.kr/index.php/publication-2/.
|
||
such as the F1/10 autonomous driving platform [5], offers significant [6] N. Wojke, A. Bewley, D. Paulus, Simple online and realtime tracking with a
|
||
potential for further investigation and practical validation. deep association metric, in: Proceedings of the IEEE International Conference on
|
||
Image Processing, IEEE, 2017, pp. 3645–3649.
|
||
[7] Y. Du, Y. Song, B. Yang, Y. Zhao, StrongSORT: Make deepsort great again, 2022,
|
||
8. Conclusion arXiv preprint arXiv:2202.13514.
|
||
[8] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-
|
||
time object detection, in: Proceedings of the IEEE/CVF Conference on Computer
|
||
In this paper, we proposed, CA-MOT, a new criticality-aware MOT
|
||
Vision and Pattern Recognition, 2016, pp. 779–788.
|
||
execution and scheduling framework. Aiming at achieving critical- [9] R. Girshick, Fast R-CNN, in: Proceedings of the IEEE International Conference
|
||
accuracy maximization and timing guarantee, CA-MOT first proposes on Computer Vision, 2015, pp. 1440–1448.
|
||
a new system design to offer a control knob between tracking accuracy [10] K. Zhou, Y. Yang, A. Cavallaro, T. Xiang, Omni-scale feature learning for
|
||
person re-identification, in: Proceedings of the IEEE International Conference
|
||
and timing guarantee to efficiently utilize limited computing resources.
|
||
on Computer Vision, 2019, pp. 3702–3712.
|
||
Then, CA-MOT develops two scheduling algorithms to effectively uti- [11] Y. Zhang, P. Sun, Y. Jiang, D. Yu, F. Weng, Z. Yuan, P. Luo, W. Liu, X.
|
||
lize the system design while using the notions of slack and aging of Wang, ByteTrack: Multi-object tracking by associating every detection box, in:
|
||
detection and association. Using various task sets and real-world au- Proceedings of the European Conference on Computer Vision, Springer, 2022,
|
||
pp. 1–21.
|
||
tonomous driving data, we demonstrated that CA-MOT can obtain high
|
||
[12] Y. Zhang, C. Wang, X. Wang, W. Zeng, W. Liu, FairMOT: On the fairness of
|
||
tracking accuracy of entire and safety-critical regions while ensuring detection and re-identification in multiple object tracking, Int. J. Comput. Vis.
|
||
the timely execution of all MOT tasks. 129 (11) (2021) 3069–3087.
|
||
[13] A. Geiger, P. Lenz, R. Urtasun, Are we ready for autonomous driving? The
|
||
KITTI vision benchmark suite, in: Proceedings of the IEEE/CVF Conference on
|
||
CRediT authorship contribution statement Computer Vision and Pattern Recognition, 2012.
|
||
[14] ultralytics, YOLOv5 [Online], 2022, Available: https://github.com/ultralytics/
|
||
yolov5.
|
||
Donghwa Kang: Writing – original draft, Software, Methodology,
|
||
[15] K. Bernardin, R. Stiefelhagen, Evaluating multiple object tracking performance:
|
||
Formal analysis. Jinkyu Lee: Writing – review & editing, Valida- the clear mot metrics, EURASIP J. Image Video Process. 2008 (2008) 1–10.
|
||
tion, Formal analysis. Hyeongboo Baek: Writing – review & editing, [16] G. Welch, G. Bishop, et al., An introduction to the Kalman filter, ACM SIGGRAPH
|
||
Supervision, Funding acquisition, Formal analysis, Conceptualization. (1995).
|
||
|
||
|
||
10
|
||
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
|
||
|
||
|
||
[17] T.P. Baker, A stack-based resource allocation policy for realtime processes, in: Jinkyu Lee is an associate professor in the Department
|
||
Proceedings of IEEE Real-Time Systems Symposium, IEEE, 1990, pp. 191–200. of Computer Science and Engineering at Sungkyunkwan
|
||
[18] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, University (SKKU), South Korea, where he joined in 2014.
|
||
C.L. Zitnick, Microsoft COCO: Common objects in context, in: Proceedings of the He received the BS, MS, and Ph.D. degrees in computer
|
||
European Conference on Computer Vision, Springer, 2014, pp. 740–755. science from the Korea Advanced Institute of Science and
|
||
[19] L. Wei, S. Zhang, W. Gao, Q. Tian, Person transfer gan to bridge domain gap Technology (KAIST), South Korea, in 2004, 2006, and 2011,
|
||
for person re-identification, in: Proceedings of the IEEE/CVF Conference on respectively. He has been a research fellow/visiting scholar
|
||
Computer Vision and Pattern Recognition, 2018, pp. 79–88. in the Department of Electrical Engineering and Computer
|
||
[20] NVIDIA, NVIDIA Xavier Developer Kit. [Online], 2022, Available: https://www. Science, University of Michigan until 2014. His research
|
||
nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-agx-xavier. interests include system design and analysis with timing
|
||
[21] Z. Ge, S. Liu, F. Wang, Z. Li, J. Sun, Yolox: Exceeding yolo series in 2021, 2021, guarantees, QoS support, and resource management in real-
|
||
arXiv preprint arXiv:2107.08430. time embedded systems and cyber–physical systems. He won
|
||
[22] A. Bewley, Z. Ge, L. Ott, F. Ramos, B. Upcroft, Simple online and realtime track- the best student paper award from the 17th IEEE Real-Time
|
||
ing, in: Proceedings of the IEEE International Conference on Image Processing, and Embedded Technology and Applications Symposium
|
||
IEEE, 2016, pp. 3464–3468. (RTAS) in 2011 and the Best Paper Award from the 33rd
|
||
[23] G. Maggiolino, A. Ahmad, J. Cao, K. Kitani, Deep oc-sort: Multi-pedestrian IEEE Real-Time Systems Symposium (RTSS) in 2012.
|
||
tracking by adaptive re-identification, 2023, arXiv preprint arXiv:2302.11813.
|
||
|
||
Hyeongboo Baek is an associate professor in the Depart-
|
||
Dongwha Kang is a Ph.D. course student in the School ment of Artificial Intelligence, University of Seoul (UOS),
|
||
of Computing, Korea Advanced Institute of Science and South Korea. He received the BS degree in Computer Science
|
||
Technology (KAIST), South Korea. He received a BS and and Engineering from Konkuk University, South Korea, in
|
||
MS degree in computer science from Incheon National Uni- 2010 and the MS and Ph.D. degrees in Computer Science
|
||
versity (INU) in 2022 and 2024 respectively. His research from KAIST, South Korea, in 2012 and 2016, respectively.
|
||
interests include artificial intelligence, autonomous systems, His research interests include cyber–physical systems, real-
|
||
and real-time embedded systems. time embedded systems, and system security. He won the
|
||
best paper award from the 33rd IEEE Real-Time Systems
|
||
Symposium (RTSS) in 2012.
|
||
|
||
|
||
|
||
|
||
11
|
||
|