Files
opaque-lattice/papers_txt/Real-time-scheduling-for-multi-object-tracking-tasks-_2025_Journal-of-System.txt
2026-01-06 12:49:26 -07:00

901 lines
106 KiB
Plaintext
Raw Permalink Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Journal of Systems Architecture 160 (2025) 103349
Contents lists available at ScienceDirect
Journal of Systems Architecture
journal homepage: www.elsevier.com/locate/sysarc
Real-time scheduling for multi-object tracking tasks in regions with different
criticalities
Donghwa Kang a , Jinkyu Lee b ,, Hyeongboo Baek c ,
a Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea
b
Sungkyunkwan University (SKKU), Suwon, South Korea
c
University of Seoul (UOS), Seoul, South Korea
ARTICLE INFO ABSTRACT
Keywords: Autonomous vehicles (AVs) utilize sensors such as LiDAR and cameras to iteratively perform sensing, decision-
Multi-object tracking making, and actions. Multi-object tracking (MOT) systems are employed in the sensing stage of AVs, using these
Real-time scheduling sensors to detect and track objects like pedestrians and vehicles, thereby enhancing situational awareness.
Timing guarantee
These systems must handle regions of varying criticality and dynamically shifting locations, all within limited
Criticality-awareness
computing resources. Previous DNN-based MOT approaches primarily focused on tracking accuracy, but timing
Autonomous driving
guarantees are becoming increasingly vital for autonomous driving. Although recent studies have introduced
MOT scheduling frameworks with timing guarantees, they are either restricted to single-camera systems or
fail to prioritize safety-critical regions in the input images. We propose CA-MOT, a Criticality-Aware MOT
execution and scheduling framework for multiple cameras. CA-MOT provides a control knob that balances
tracking accuracy in safety-critical regions and timing guarantees. By effectively utilizing this control knob,
CA-MOT achieves both high accuracy and timing guarantees. We evaluated CA-MOTs performance using a
GPU-enabled embedded board commonly employed in AVs, with data from real-world autonomous driving
scenarios.
1. Introduction pooling and convolutional layers) using CNN (convolutional neural
network)-based models (e.g., OS-Net [10]). For unmatched objects,
Autonomous vehicles (AVs) are systems that iteratively perform location-based methods like intersection over union (IoU) are applied.
sensing, decision-making, and actions using various sensors such as MOT input images exhibit two key characteristics: (i) regions with
LiDAR, radar, inertial measurement units (IMU), and cameras [1]. varying levels of criticality and (ii) dynamically shifting locations. With
Multi-object tracking (MOT) systems, used in the perception stage limited computing resources in AVs, it is crucial to deliver different
of AVs, track objects like pedestrians and cars, enhancing situational levels of service quality based on criticality. Safety-critical regions,
awareness. Since MOT information is periodically transferred to control where objects with a short time-to-collision (e.g., under 2 s) cluster,
tasks, timely execution must be guaranteed to ensure safety and prevent must be prioritized. If multiple clusters exist, the broader area en-
severe accidents [24]. Low accuracy, despite timely execution, may compassing them is considered the safety-critical region, as defined in
result in missed objects, thus compromising AVs safety [2,4,5]. There-
DNN-SAM [5]. Established methods compute time-to-collision using Li-
fore, AV MOT systems should ensure timing guarantees with maximized
DAR and IMU data; we follow the approach from DNN-SAM. This leads
accuracy.
to two requirements for criticality-aware MOT systems: (R1) accuracy
Tracking-by-detection [6,7] is widely used due to its high accuracy
maximization for safety-critical regions and (R2) timing guarantees.
and ability to leverage state-of-the-art DNN-based detection models
Most existing DNN-based MOT approaches focus on accuracy [7,
(e.g., YOLO series [8], Faster R-CNN [9]). For each input image from
each camera, tracking-by-detection performs two tasks: detection and 11,12], but timing guarantees are increasingly critical in autonomous
association. Detection uses DNN-based models to sense the motion driving. Recent research has proposed MOT resource scheduling frame-
information of objects, such as location and velocity, while association works that guarantee timing for every MOT execution [2,4]. How-
matches objects between frames based on extracted feature informa- ever, [2] overlooks safety-criticality, while [4] focuses on a single task.
tion (also called feature vectors or feature maps obtained through We address safety-criticality across multiple tasks, raising the following
Corresponding authors.
E-mail addresses: anima0729@kaist.ac.kr (D. Kang), jinkyu.lee@skku.edu (J. Lee), hbbaek359@gmail.com (H. Baek).
https://doi.org/10.1016/j.sysarc.2025.103349
Received 22 September 2024; Received in revised form 13 December 2024; Accepted 20 January 2025
Available online 28 January 2025
1383-7621/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
challenges:
C1. How to balance R1 and R2 to efficiently use limited computing
resources.
C2. How to achieve both R1 and R2 by effectively using the control
knob developed from C1.
In this paper, we propose CA-MOT, a Criticality-Aware MOT exe-
cution and scheduling framework for multiple MOT tasks. To address
C1, CA-MOT offers three execution options (low, middle, and high
workloads) to balance R1 and R2 for both detection and association.
To address C2, CA-MOT introduces the notion of aging for detection
and association sub-tasks, estimating the reliability of motion and
feature information over time. Balancing the aging of these tasks is
essential to achieve R1 and R2 with limited resources (to be discussed
in Section 3.4). Based on this, CA-MOT develops two scheduling algo-
rithms: EDF-BE and EDF-Slack. EDF-BE increases the workload of tasks
waiting in the ready queue for execution (referred to as active tasks)
without compromising the R2 bound when no other tasks are pending.
In contrast, EDF-Slack is designed to handle scenarios with multiple
active tasks. Fig. 1. Tracking accuracy and execution time on different execution options of
detection and association.
To validate CA-MOTs performance in meeting R1 and R2, we
conducted extensive experiments on an NVIDIA Jetson Xavier using
the KITTI Dataset [13]. Additionally, we applied three detectors in our
experiments: YOLOv5 [14], YOLOX [8], and Faster-RCNN [9]. a system that tracks specific objects moving between the fields of view
The contributions of this paper are as follows: of multiple cameras (called hand-over), this is beyond the scope of
our work. This paper focuses on dividing the multi-object tracking task
• We motivate the importance of balancing between aging of de- into two subtasks (i.e., detection and association) and using DNN-based
tection and association to achieve R1 and R2 (Section 2). MOT-specific properties (i.e., reuse of motion and feature information)
• We propose a new system design, CA-MOT that addresses R1 and to achieve R1 and R2 under limited resources.
R2 considering varying levels of criticality in different regions for
multiple MOT tasks (Section 3). 2.2. Trade-off between accuracy and execution time
• We develop new scheduling algorithms to effectively achieve R1
and R2 by balancing between aging of detection and association
To address C1, we consider two factors: (i) the input image size
for each MOT task (Section 4).
and detection within the safety-critical region, and (ii) the number of
• We demonstrate the effectiveness of CA-MOT in achieving R1 and
objects used for feature extraction during association across all detected
R2 using a real-world self-driving dataset (Section 5).
objects in each frame.
Fig. 1(a) compares the multi-object tracking accuracy (MOTA) [15]
2. Motivation for the overall and safety-critical regions (referred to as overall and
critical accuracy) and the execution time for a single MOT task using
This section presents target systems and motivates the system de- three input image sizes (256 × 256, 416 × 416, 672 × 672). Overall
sign of CA-MOT to address C1 and C2 based on measurement-based accuracy considers all objects, while critical accuracy focuses on the
observations. safety-critical region. YOLOv5 [14] is used for detection, and features
are extracted for all detected objects. The KITTI dataset [13] is used.
2.1. Target system
For image sizes 256 × 256 and 416 × 416, detection is performed
on a cropped region of interest (RoI) that includes the safety-critical
CA-MOT targets 2D MOT systems on AVs equipped with multiple
region. If the RoI is smaller, it is resized to include the critical region;
camera sensors. Each MOT task performs MOT execution on consecu-
otherwise, the critical region is cropped accordingly. The safety-critical
tive input frames received from the corresponding camera sensor at a
region will be defined in Section 3. For the 672 × 672 size (i.e., the
predetermined period. As this recurring task is required to complete
original input size), detection occurs without cropping.
a job within a specified deadline, each MOT task is considered a
As shown in Fig. 1(a), reducing image size leads to a notable
real-time task with a period and deadline. CA-MOT employs tracking-
decrease in overall accuracy, while critical accuracy decreases less
by-detection comprising two steps of MOT execution: detection and
association. The front-end detector performs detection by exploiting significantly due to prioritization of the safety-critical region in the RoI.
the existing stand-alone DNN-based detector to identify the position Additionally, execution time decreases as the image size is reduced,
and class of objects in the input image. Using the locations of detected demonstrating a trade-off between R1 and R2 when focusing on the
objects, the feature extractor (e.g., the deployed CNN model such as critical region.
OSNet) extracts features (i.e., feature vectors or feature maps) for each Fig. 1(b) shows the impact of varying the number of objects used
object. These features capture the visual characteristics of each object. for feature extraction on accuracy and execution time, with the image
The back-end tracker compares the feature similarities between objects size fixed at 672 × 672. The number of objects ranges from zero, three,
in the current frame and the previous frame, matching objects with and more than three. OS-Net [10] is used for feature extraction. As
high similarity. For any remaining unmatched objects, a location-based shown in Fig. 1(b), as the number of objects with feature extraction
matching method such as IoU is applied. The tracker then stores the increases, both overall and critical accuracy improve, but this also leads
motion information (position and velocity) and the features of each to increased execution time. This highlights a trade-off between R1 and
object in preparation for the next frame. R2 based on the number of objects considered for feature extraction.
We assume a system in which each camera independently tracks Section 3.3 details the MOT execution pipeline of CA-MOT, which
objects moving within its field of view. While it is possible to consider leverages these observations to effectively address C1.
2
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
Fig. 2. System design of CA-MOT: the key features are (a) an aging-aware scheduler that provides timing guarantees and a criticality-aware flexible MOT execution pipeline
including (b) a detection module that accommodates varying input sizes and (c) an association module that handles a varying number of objects for feature extraction.
2.3. Different combination of detection and association • It provides a timing guarantee while providing prioritized track-
ing accuracy for the safety-critical regions by exploiting an MOT-
To address C2, Fig. 1(c) reveals an intriguing observation that differ- specific property.
ent combinations of image sizes and the number of feature extractions
To address the first goal, CA-MOT implements a criticality-aware
yield distinct effects on accuracy and execution time. The experiment
flexible MOT execution pipeline in which detection and association
was conducted over 100 consecutive frames.
are performed with different execution option by leveraging the ob-
In Fig. 1(c), the execution of detection or association is denoted by
servations discussed in Section 2.2. To address the second goal, CA-
𝑃 or 𝐹 . 𝑃 represents partial computation, where detection is performed MOT develops an aging-aware task-level scheduler to provide accuracy
only on the region of interest (RoI) at a size of 256 × 256, including maximization while providing a timing guarantee by exploiting the
a safety-critical region, and association is limited to location-based observations discussed in Section 2.3 building upon the MOT execution
association without feature extraction. 𝐹 represents full computation, pipeline. The MOT execution pipeline and the scheduler are imple-
where detection is performed on the entire image at a size of 672 × 672, mented as separate threads, and they are communicated with shared
and association includes feature extraction for all objects. The number memory.
in the upper right of the notation indicates how many times the combi- CA-MOT does not require modifications to existing DNN models
nation of detection and association has been performed. For example, (e.g., detectors and feature extractors), which allows for reusing most
the notation 𝐹 𝐹 50 𝑃 𝑃 50 indicates that we use 𝐹 for both the detection (if not all) stand-alone detectors and feature extractors. Notably, state-
and association steps in the first 50 frames and 𝑃 for both phases in the of-the-art detectors like YOLOv5 are inherently designed to handle
remaining 50 frames. To mitigate the issue of objects outside the critical varying input image sizes, and all CNN models can perform batch exe-
region not being detected due to cropping, which can decrease the cution on multiple images (each corresponding to a different object). As
accuracy of the non-critical region, we utilize the position information shown in Fig. 2, the key features of CA-MOT are: (a) a scheduling policy
of objects from the previous frame as predicted position information for that selects one input per camera, provides timing guarantees, and
the current frame using a prediction model such as Kalman filter [16] adjusts the workload for detection and association; (b) a module that
during the execution of 𝑃 in the detection step. Except for 𝑃 𝑃 100 and processes detection with inputs of varying sizes; and (c) a module that
𝐹 𝐹 100 , all combinations have the same proportion of 𝐹 and 𝑃 for the extracts features from a pre-determined number of detected objects.
entire frames.
As shown in Fig. 1(c), although 𝐹 𝐹 50 𝑃 𝑃 50 and 𝑃 𝐹 100 have similar 3.2. Workflow
execution times with (𝑃 𝐹 + 𝐹 𝑃 )100 and (𝐹 𝐹 + 𝑃 𝑃 )100 , they show lower
tracking accuracy. This indicates that different combinations of 𝐹 and Fig. 2 presents the workflow of CA-MOT. During system operation,
𝑃 can have a varying impact on accuracy. The observation in Fig. 1(c) the task scheduler maintains a queue to store images periodically
necessitates a new scheduler that is capable of obtaining high tracking received from each camera sensor (⃝). 1 Then, the task scheduler deter-
accuracy by capturing an MOT-specific property referred to as aging, mines the following for tasks in the queue: (a) the task to be scheduled,
which will be detailed in Section 3.4. (b) the execution option for the detector, and (c) the execution option
for the association (⃝).
2 After an image moves to the MOT execution
3. System design of CA-MOT pipeline, the critical region identification module identifies a safety-
critical region from the image and crops (or not) a RoI including the
This section presents the goal and design of CA-MOT to address C1 safety-critical region according to the execution option for the detector
and C2. (⃝).
3 Depending on the execution option, the cropped RoI or entire
image is processed for detection (⃝).4 Furthermore, depending on the
3.1. System overview number of objects for which features are extracted, CA-MOT selectively
extracts features for detected objects (⃝).
5 All detected objects are then
CA-MOT utilizes a tracking-by-detection approach, consisting of matched with the tracked objects from the previous frame. If both the
two steps: detection and association, where the front-end detector em- detected and tracked objects have feature vectors, they are associated
ploys a pre-existing DNN-based detector to detect and classify objects through feature-based matching. Otherwise, they are associated solely
in the input image, and the unmatched objects are matched using a based on their locations (⃝).
6
location-based method like IoU. CA-MOT aims at providing prioritized
tracking accuracy for the safety-critical region with a timing guarantee 3.3. Criticality-aware flexible MOT execution pipeline
for every MOT execution on limited computing resources by addressing
C1 and C2 discussed in Section 1, which has the following design goals. The MOT execution pipeline conducts detection and association
sequentially. CA-MOT can employ any existing stand-alone DNN-based
• It provides different execution options not only for detection but detectors as long as it can accommodate different sizes of input images
also association considering different criticality of regions in input (e.g., YOLO series) and offer a clear trade-off between accuracy and
images. execution time. For each input image with a size of 672 × 672, the
3
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
detector performs the detection to identify the location and class of association for each task at every scheduling decision. The scheduler
multiple objects in the image. Once the scheduler determines the task manages MOT tasks using a single queue and is triggered when a task
(associated with an input image) to be scheduled and the execution completes its execution or a new task is released. As three execution
option for the tasks, the detection is performed for the task according options (e.g., low, middle, and high workloads) are provided for each
to the execution option. CA-MOT provides three execution options detection and association under CA-MOT, the scheduler decides the
(i.e., low, middle, and high workloads, respectively) providing a trade- image size (e.g., 256 × 256, 416 × 416, and 672 × 672) for detection
off between execution time and accuracy. For low and middle workload and feature size (e.g., zero, three, and more than three) for association
detections, CA-MOT first identifies the RoI with sizes of 256 × 256 and according to the scheduling algorithms (to be presented in Section 4).
416 × 416, respectively, and then detection is performed on cropped As discussed in Section 2.3, various combinations of image sizes
RoI, which includes the safety-critical region. The area outside the and the quantity of feature extractions result in different impacts on
RoI is not subject to detection, and the motion information (e.g., size, tracking accuracy. This is due to an important property of the MOT
position, velocity, direction) of objects detected in the previous frame system, which involves supplementing non-updated motion or feature
is used in the prediction models such as the Kalman filter to obtain information during detection and association in the current frame by
the estimated information of objects in the current frame. On the other utilizing information from the previous frame. For example, in scenar-
hand, high-workload detection is performed on the original image with ios with low and middle workload detection, the detection process does
a size of 672 × 672. not cover the area outside the RoI. Instead, the motion information
We define the area that encompasses all safety-critical objects, of objects detected in the previous frame, such as their size, position,
which are objects with a time-to-collision of less than two seconds, as velocity, and direction, is leveraged to estimate the corresponding
the safety-critical area. If the safety-critical area exceeds the input size information for objects in the current frame. Moreover, during the as-
for the detector, as determined by the detection process (e.g., 256 × 256 sociation step, if the feature extracted from the immediately preceding
or 416 × 416), the safety-critical area is cropped and resized to the frame is unavailable due to low- and middle-workload associations,
corresponding dimensions before being fed into the detector model. the feature-based matching algorithm compares the features extracted
The locations of safety-critical objects are determined based on their from objects in the current frame with the features extracted from the
most recently computed positions, without projecting future safety- nearest past frames. Therefore, the tracking accuracy is determined
critical regions from them. There are numerous existing approaches by the reliability of the reused motion and feature information of the
that calculate time-to-collision based on the relative positions of objects objects. To capture the reliability of the motion and feature information
and the ego vehicle given LiDAR and IMU data, and we assume the use of objects, we propose a new notion of aging that specifies the number
of one such method. It is also important to note that the KITTI dataset of middle- or high-workload executions of detection and association
provides both LiDAR and IMU data. For example, areas where objects conducted from the beginning of the MOT task, respectively. In order
with a time-to-collision of less than 2 s congregate can be defined as to update the motion and feature information as frequently as possi-
safety-critical regions, and if multiple such areas exist, the encompassing ble using limited computing resources, it is necessary to balance the
area that includes all of them would be considered the safety-critical aging of detection and association for each task. Note that increasing
region. Please note that we adhere to the definition of the safety-critical the aging of detection and association for all tasks simultaneously in
region as defined in the existing paper DNN-SAM in [5]. It is assumed every MOT execution is generally not feasible due to limited com-
that the critical region is pre-calculated by external sensors such as puting resources. Therefore, a mechanism is required to balance the
LiDAR and IMU and provided to CA-MOT. If an input image does not aging of detection and association for all tasks while providing timing
have a critical region, the entire frame is considered a critical region. guarantees under constrained resources. To this end, we propose new
As seen in Fig. 2, GPU is used only for the inference of DNN models, scheduling algorithms that will be detailed in the next section.
such as the detector (e.g., YOLOv5) and feature extractor (e.g., OSNet),
while all other execution is performed on the CPU. 4. Scheduling algorithm
For the association, the MOT system uses the two-step approach [7].
Initially, a CNN-based model (e.g., OS-Net [10]) is employed by the This section presents a task model and proposes new scheduling
tracker to extract features from the detected objects. The tracker then algorithms building upon CA-MOT.
compares these features between the current and previous frames to
identify object pairs with the highest feature similarity. For the re- 4.1. Task model
maining objects that are not matched based on feature comparison, a
location-based matching method such as IoU (intersection over union) Targeting MOT systems in AVs that involve 𝑛 camera sensors, we
is used. CA-MOT also provides three execution options (i.e., low, consider a set 𝜏 consisting of 𝑛 MOT tasks denoted as 𝜏𝑖 ∈ 𝜏. Each
middle, and high workloads, respectively) for the association. When MOT task 𝜏𝑖 is responsible for conducting MOT execution using input
it comes to the middle and high workload associations, the tracker images provided periodically by each camera sensor. As we employ
extracts features from some (e.g., three) of the detected objects or the methodology of tracking-by-detection, an MOT task consists of
all of the detected objects, and then performs consecutive feature- detection and association sub-tasks. Thus, the specification of each
based and location-based matchings. On the other hand, low-workload MOT task 𝜏𝑖 is given as 𝜏𝑖 = (𝑇𝑖 , 𝐶𝑖 (𝑠𝑖 , 𝑓𝑖 ), 𝐷𝑖 ), where 𝑇𝑖 represents the
association performs location-based matching only. Depending on the period (or the minimum inter-arrival time), 𝐶𝑖 (𝑠𝑖 , 𝑓𝑖 ) denotes the worst-
execution option, CA-MOT may extract features from only a subset or case execution time (WCET) based on the execution options (i.e., low,
all of the detected objects, which means that the feature information middle, and high-workload execution) for detection and association
of objects may not be updated every time. Therefore, during feature- sub-tasks, respectively, and 𝐷𝑖 indicates the relative deadline. The
based matching, the algorithm compares the features extracted from execution time of the detection sub-task depends on the image size
the objects in the current frame with the closest previously extracted 𝑠𝑖𝑆𝑖 = {𝐿, 𝑀 , 𝐻}, where 𝐿, 𝑀 , 𝐻 are 256 × 256, 416 × 416, and
features of the tracked objects and matches the two objects with the 672 × 672, respectively. Note that CA-MOT supports arbitrary non-
highest feature similarity. decreasing sizes for 𝑆𝑖 = {𝐿, 𝑀 , 𝐻}. On the other hand, the execution
time of the association sub-task depends on the feature size 𝑓𝑖
3.4. Aging-aware task scheduler 𝐹𝑖 = {𝐿, 𝑀 , 𝐻}, where 𝐿, 𝑀 , 𝐻 are zero, from one to three, and more
than three, respectively. Note that the tracking-by-detection methodol-
The CA-MOT implements a thread-level task scheduler to determine ogy performs the association phase sequentially through feature-based
the task to be scheduled and execution options for detection and matching followed by location-based matching using IoU. If 𝑓𝑖 is equal
4
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
Table 1 On the other hand, the worst-case execution time 𝐶𝑖𝐴 (𝑓𝑖 ) of the associa-
Notations used in the scheduling algorithms.
tion task depends on the feature size 𝑓𝑖 . It is calculated by considering
Symbol Description the time required for extracting features from detected objects and
𝜏𝑖 Task 𝑖 in the system performing matching methods such as feature-based and IoU-based
𝑇𝑖 Period of task 𝜏𝑖 (minimum inter-arrival time)
matching. An MOT task 𝜏𝑖 is considered schedulable if every job 𝐽𝑖
𝐷𝑖 Relative deadline of task 𝜏𝑖
(invoked by 𝜏𝑖 ) completes its execution within the relative deadline 𝐷𝑖 .
𝐶𝑖 (𝑋 , 𝑌 ) Worst-case execution time (WCET) of task 𝑖.
The overall schedulability of the system is determined by ensuring that
𝑋: image size for detection (𝐿, 𝑀 , 𝐻)
𝑌 : feature size for association (𝐿, 𝑀 , 𝐻) every task 𝜏𝑖 ∈ 𝜏 is schedulable.
𝑠𝑖 Image size for the detection sub-task of task 𝑖
𝑓𝑖 Feature size for the association sub-task of task 𝑖 4.2. EDF best-effort
𝑆𝑖 Set of image size options for task 𝑖 (𝑆𝑖 = {𝐿, 𝑀 , 𝐻})
𝐹𝑖 Set of feature size options for task 𝑖 (𝐹𝑖 = {𝐿, 𝑀 , 𝐻}) Building upon the system design of CA-MOT presented in Section 3,
𝐿 Low workload execution
we develop two scheduling algorithms that aim to provide not only
𝑀 Middle workload execution
𝐻 High workload execution high tracking accuracy for the safety-critical regions but also a tim-
𝐶𝑖𝐷 (𝑠𝑖 ) WCET of the detection sub-task of task 𝑖, based on image
ing guarantee for every MOT execution. To this end, the proposed
size 𝑠𝑖 scheduling algorithms have the following two features: (F1) an offline
𝐶𝑖𝐴 (𝑓𝑖 ) WCET of the association sub-task of task 𝑖, based on timing guarantee for the minimum execution (i.e., low-workload exe-
feature size 𝑓𝑖 cution for both detection and association) of every MOT execution and
𝑅𝐶𝑖 (𝐿, 𝐿) Remaining execution time for the minimum execution of (F2) an online policy to maximize tracking accuracy by systematically
task 𝑖 increasing workload (i.e., middle- or high-workload execution) of an
𝑎𝑔 𝑒𝐷 Aging value of the detection sub-task of task 𝑖 MOT execution using notions of slack and aging without compromising
𝑖
𝑎𝑔 𝑒𝐴
𝑖
Aging value of the association sub-task of task 𝑖 timing guarantee.
𝑠𝑙𝑎𝑐 𝑘𝑖𝑡 Slack time available for task 𝑖 at the current time 𝑡𝑐 𝑢𝑟 The proposed scheduling algorithms are based on the non-preem-
𝑐 𝑢𝑟
𝑞𝑖 Minimum execution time of task 𝑖 in the interval ptive earliest deadline first (EDF) scheduling algorithm, which assigns
[𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 )] higher priority to jobs with earlier deadlines without allowing any
𝑝 Sum of the minimum execution times for all tasks preemption. To provide the first feature F1, CA-MOT employs the
𝑑1 (𝑡𝑐 𝑢𝑟 ) Earliest deadline or future release time at time instant 𝑡𝑐 𝑢𝑟 existing schedulability analysis developed for non-preemptive EDF as
𝑠𝑙𝑎𝑐 𝑘𝐷 Remaining slack after executing high-workload detection follows.
𝑖
for task 𝑖
𝑠𝑙𝑎𝑐 𝑘𝐴 Remaining slack after executing high-workload Lemma 1. For a set 𝜏 of MOT tasks scheduled by non-preemptive EDF,
𝑖
association for task 𝑖 minimum execution 𝐶𝑖 (𝐿, 𝐿) of every task 𝜏𝑖 ∈ 𝜏 can be executed without
deadline miss as long as the following holds for every task 𝜏𝑖 ∈ 𝜏.
max𝜏𝑖 𝐶𝑖 (𝐿, 𝐿) ∑ 𝐶𝑖 (𝐿, 𝐿)
+ ≤ 1.0 (2)
to 𝐿, this indicates that no feature extraction has been performed for min𝜏𝑖 𝑇𝑖 𝜏 ∈𝜏
𝑇𝑖
𝑖
the frame, and thus, feature-based matching is skipped, proceeding
directly to location-based matching. In the case of 𝐻, we employ the Proof. The lemma presents a schedulability condition for non-preem-
maximum number of objects as defined by the environment (for the ptive EDF, and its proof is outlined as follows. Let us target 𝜏𝑘 ∈ 𝜏;
dataset considered, this is based on values measured across all videos), also, consider a virtual task 𝜏𝑥 ∉ 𝜏, whose 𝑇𝑥 and 𝐶𝑥 (𝐿, 𝐿) are set
for example, 10. Then, the worst-case execution time 𝐶𝑖 (𝑠𝑖 , 𝑓𝑖 ) of each to min𝜏𝑖 ∈𝜏 𝑇𝑖 and max𝜏𝑖 ∈𝜏 𝐶𝑖 (𝐿, 𝐿), respectively. Now, we compare the
MOT task 𝜏𝑖 is derived as follows. finishing time of a job of 𝜏𝑘 when (Case 1) 𝜏 is scheduled by non-
preemptive EDF, and (Case 2) 𝜏 {𝜏𝑥 } is scheduled by preemptive EDF.
𝐶𝑖 (𝑠𝑖 , 𝑓𝑖 ) = 𝐶𝑖𝐷 (𝑠𝑖 ) + 𝐶𝑖𝐴 (𝑓𝑖 ), (1)
Since at most one lower-priority job can block a high-priority job under
where 𝐶𝑖𝐷 (𝑠𝑖 ) and 𝐶𝑖𝐴 (𝑓𝑖 ) are the worst-case execution times of detection non-preemptive scheduling, 𝜏𝑘 can be blocked by at most one lower-
and association sub-tasks according to 𝑠𝑖 and 𝑓𝑖 , respectively. As shown priority job under Case 1; obviously, the WCET of the lower-priority job
in Fig. 2, both the detection and the association sub-tasks involve GPU is upper-bounded by max𝜏𝑖 ∈𝜏 𝐶𝑖 (𝐿, 𝐿). Also, to block all the following
operations, with their respective WCETs including the communication jobs of 𝜏𝑘 , the blocking frequency should be no smaller than 𝑇𝑘 , which
costs between the CPU and GPU. Note that the detection sub-task, is lower-bounded by min𝜏𝑖 ∈𝜏 𝑇𝑖 . Therefore, the finishing time of a job of
denoted as 𝜏𝑖𝐷 , and the association sub-task, denoted as 𝜏𝑖𝐴 , are executed 𝜏𝑖 under Case 1 is no later than that under Case 2. Once we apply the
consecutively without any preemption while sharing the same period well-known schedulability condition for preemptive EDF to Case 2, the
and relative deadline. Similarly, when an active task is running, it condition is the same as Eq. (2), which proves the lemma. □
executes without any interruptions, while other tasks wait in the queue.
Note that the proof is self-contained, but a different proof for
In addition, each task runs on an environment where non-preemption
between the GPU and CPU is guaranteed. To ensure this, while the Lemma 1 can be found in [5,17].
CPU is running, the GPU waits for input from the CPU. Once the GPU To provide the second feature F2, the proposed scheduling al-
receives the input and is activated, the CPU waits until it receives the gorithms (i) dynamically increase the workload of each MOT task
results from the GPU, as illustrated in Fig. 2. As seen Fig. 2, GPU (e.g., from low workload to middle or high workload) without compro-
is used only for the inference of DNN models, such as the detector mising the timing guarantee while (ii) balance the aging of detection
(e.g., YOLOv5) and feature extractor (e.g., OSNet), while all other and association of every task. We propose two scheduling algorithms
execution is performed on the CPU. Also, CA-MOT does not allow that simultaneously provide (i) and (ii) in different ways: EDF-BE
parallel execution for multiple MOT executions (see Table 1). (EDF Best-Effort) and EDF-Slack (EDF with Slack reclamation), adapted
The worst-case execution time 𝐶𝑖𝐷 (𝑠𝑖 ) of the detection sub-task is from [5]. EDF-BE and EDF-Slack utilize slacks defined differently, but
determined by the sum of various components, including preprocessing use the same mechanism (in Algorithm 2) to decide on the execution
time (such as cropping and resizing the input image), image transfer option that employs a notion of aging.
time from CPU memory to GPU memory, model inference time to Let 𝑑1 (𝑡𝑐 𝑢𝑟 ) be the earliest deadline or future release time among
𝑡
obtain candidate objects, and postprocessing time (e.g., applying non- all tasks at a time instant 𝑡𝑐 𝑢𝑟 . The slack 𝑠𝑙𝑎𝑐 𝑘𝑖𝑐 𝑢𝑟 of task 𝜏𝑖 at 𝑡𝑐 𝑢𝑟
maximum suppression) to extract the final objects from the candidates. under the EDF-BE is defined as the expected remaining time up to
5
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
Algorithm 1 Slack calculation for 𝜏𝑘 at 𝑡𝑐 𝑢𝑟 under EDF-Slack
Input: 𝜏, 𝑡𝑐 𝑢𝑟
𝑡
Output: 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟
1: 𝑝 = 0, 𝑈 = the left-hand-side of Equation (2)
2: for 𝑖 = 𝑛 to 1, 𝜏𝑖 ∈ {𝜏1 , ..., 𝜏𝑛 |𝑑1 (𝑡𝑐 𝑢𝑟 ) ≤ ⋯ ≤ 𝑑𝑛 (𝑡𝑐 𝑢𝑟 )} do
𝐶𝑖 (𝐿, 𝐿)
3: 𝑈 =𝑈
𝑇𝑖
4: 𝑞𝑖 = max(0, 𝑅𝐶𝑖 (𝐿, 𝐿) (1 𝑈 ) ⋅ (𝑑𝑖 (𝑡𝑐 𝑢𝑟 ) 𝑑1 (𝑡𝑐 𝑢𝑟 )))
( 𝑅𝐶𝑖 (𝐿, 𝐿) 𝑞𝑖 )
5: 𝑈 = min 1.0, 𝑈 +
𝑑𝑖 (𝑡𝑐 𝑢𝑟 ) 𝑑1 (𝑡𝑐 𝑢𝑟 )
6: 𝑝 = 𝑝 + 𝑞𝑖
7: end for
𝑡
8: return 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 = 𝑑1 (𝑡𝑐 𝑢𝑟 ) 𝑡𝑐 𝑢𝑟 𝑝
that does not exceed the earliest deadline or future release, ensuring
Fig. 3. Execution timeline of multiple MOT tasks under (a) baseline (non-preemptive execution without deadline misses. The following lemma present the
EDF), (b) EDF-BE, and (c) EDF-Slack scheduling policies.
timing guarantee of EDF-BE.
Theorem 1. A task set 𝜏 that satisfies the condition in Eq. (2) is
𝑑1 (𝑡𝑐 𝑢𝑟 ) after the execution of 𝐶𝑖 (𝐿, 𝐿) is completed, which is calculated schedulable by EDF-BE .
by 𝑑1 (𝑡𝑐 𝑢𝑟 ) 𝑡𝑐 𝑢𝑟 𝐶𝑖 (𝐿, 𝐿). This slack value is only valid when there
are no more than two tasks in the waiting queue at time 𝑡𝑐 𝑢𝑟 and no Proof. According to Lemma 1, for a task set 𝜏 that satisfies Eq. (2),
future releases within the interval [𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 )). Using the slack value the minimum execution time 𝐶𝑖 (𝐿, 𝐿) of all tasks 𝜏𝑖 ∈ 𝜏 guarantees
conditionally provided at a scheduling decision, EDF-BE can perform execution without deadline misses. At each scheduling decision at 𝑡
middle- or high-workload execution for detection and/or association. under the online policy of EDF-BE, the execution of a job exploiting
any slack value does not impose additional inference on any other job.
Example. Figs. 3(a) and (b) present a scheduling scenario of the This guarantees that all tasks 𝜏𝑖 receive no more interference than what
baseline algorithm (i.e., non-preemptive EDF) and EDF-BE with an they would receive under non-preemptive EDF scheduling. Thus, this
example task set. We consider an example task set 𝜏 = {𝜏1 , 𝜏2 } of theorem holds. □
which 𝐶𝑖 = 𝐶𝑖𝐷 (𝐻) + 𝐶𝑖𝐴 (𝐻) = 25, 𝑇𝑖 = 25, 𝐶𝑖𝐷 (𝑠𝑖 ) = {5, 9, 12}, and
𝐶𝑖𝐴 (𝑓𝑖 ) = {3, 8, 13} hold for 𝜏𝑖 ∈ 𝜏. As shown in Figs. 3(a) and (b), each 4.3. EDF with slack reclamation
first job of 𝜏1 and 𝜏2 are released at 𝑡 = 0 and 𝑡 = 13, respectively. In
the baseline algorithm, the first job of 𝜏1 executes for 25 time units, In the case of EDF-BE, more workload than the minimum execution
and then the first job of 𝜏2 starts its execution at 𝑡 = 25 resulting in can only be processed when there is a single job in the waiting queue at
a deadline miss at 𝑡 = 38. Let 𝑎𝑔 𝑒𝐷 𝐴
𝑖 and 𝑎𝑔 𝑒𝑖 be the aging value of a given time 𝑡𝑐 𝑢𝑟 and no additional releases occur until 𝑑1 (𝑡𝑐 𝑢𝑟 ). This cre-
detection and association of 𝜏𝑖 . The aging value is an integer satisfying ates a limited opportunity for MOT tasks in CA-MOT to perform more
𝑎𝑔 𝑒𝐷 𝐴 𝐷 𝐴
𝑖 , 𝑎𝑔 𝑒𝑖 ≥ 0, and 𝑎𝑔 𝑒𝑖 and 𝑎𝑔 𝑒𝑖 for all task 𝜏𝑖 ∈ 𝜏 are set to zero workload than the minimum execution, thus restricting the potential
at the beginning of the system. The, 𝑎𝑔 𝑒𝐷 𝐴
𝑖 (and 𝑎𝑔 𝑒𝑖 ) increases by one to improve tracking accuracy. To address this limitation, we integrate
at each time when a detection (and association) is run with middle- or the approach presented in [5] into EDF-Slack, allowing it to compute
high-workload. In other words, the aging value refers to the number of slack in a different way than EDF-BE.
executions excluding those with low workloads. By adjusting the aging Let 𝑑𝑖 (𝑡𝑐 𝑢𝑟 ) denote the 𝑖th earliest deadline or release time at 𝑡𝑐 𝑢𝑟 , and
value, a balance is maintained so that neither detection nor association 𝑅𝐶𝑖 (𝐿, 𝐿) represent the remaining execution time required to complete
becomes disproportionately large. the minimum execution 𝐶𝑖 (𝐿, 𝐿). Algorithm 1 outlines the calculation
𝑡
of the slack value 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 for task 𝜏𝑘 at 𝑡𝑐 𝑢𝑟 within the EDF-Slack
Compared to EDF-Slack, EDF-BE is a simpler algorithm that utilizes
algorithm, triggered at each scheduling decision. Since EDF is a job-
as many resources as possible, executing a job for greater than 𝐶𝑖 (𝐿, 𝐿)
level fixed-priority scheduling policy, wherein the priority of a job
up to its closest future release only when there is exactly one job in the
remains constant throughout its execution, scheduling decisions under
waiting queue. EDF-BE naturally guarantees no deadline misses in any
EDF occur either at the commencement of a jobs execution or upon its
job execution. This is because, as stated in Lemma 1, the execution of
completion. In the interval [𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 )], EDF-Slack processes tasks in
𝐶𝑖 (𝐿, 𝐿) without deadline misses for all jobs is guaranteed under EDF.
reverse EDF order, starting from the task with the latest deadline. Job
Furthermore, when a job executes for more than 𝐶𝑖 (𝐿, 𝐿) under EDF-
𝐽𝑘 of 𝜏𝑘 has the highest priority at 𝑡𝑐 𝑢𝑟 , with 𝑑1 (𝑡𝑐 𝑢𝑟 ) being its deadline,
BE, there is only one active job in the waiting queue at that time. In
as EDF-Slack follows the EDF policy. The goal of the slack calculation
the case of EDF-BE, when the first job of task 𝜏1 starts its execution
in Algorithm 1 is to delay the execution of all other tasks 𝜏𝑖 ∈ 𝜏 𝜏𝑘
at 𝑡 = 0, it executes the minimum execution 𝐶1 (𝐿, 𝐿) = 8 until the
beyond 𝑑1 (𝑡𝑐 𝑢𝑟 ) while ensuring that future deadlines are met. This is
earliest deadline or future release at 𝑡 = 13, resulting in a slack of five. repeated for all tasks in the waiting queue. To ensure 𝜏𝑖 completes
Utilizing this slack, the task 𝜏1 then executes 𝐶1 (𝑀 , 𝐿), and the aging 𝐶𝑖 (𝐿, 𝐿) before 𝑑𝑖 (𝑡𝑐 𝑢𝑟 ), EDF-Slack calculates the maximum execution
factor 𝑎𝑔 𝑒𝐷
𝑖 increases by one. For the first job of task 𝜏2 , released at time in the interval 𝑑1 (𝑡𝑐 𝑢𝑟 ), 𝑑𝑖 (𝑡𝑐 𝑢𝑟 ), which is (1 𝑈 ) ⋅ (𝑑𝑖 (𝑡𝑐 𝑢𝑟 ) 𝑑1 (𝑡𝑐 𝑢𝑟 )),
𝑡 = 13, there is a slack of 4 until the earliest deadline or future release where 𝑈 denote the left-hand-side of Eq. (2).
at 𝑡 = 25. Thus, 𝐶2 (𝑀 , 𝐿) is executed, and 𝑎𝑔 𝑒𝐷 𝑖 increases by one. The The key steps in the slack calculation are as follows:
second job of task 𝜏1 , released at 𝑡 = 13, has a slack of 5 until the earliest
deadline or future release at 𝑡 = 38. To balance 𝑎𝑔 𝑒𝐷 𝐴
𝑖 and 𝑎𝑔 𝑒𝑖 , 𝐶1 (𝐿, 𝑀) • 𝑞𝑖 is computed as the minimum execution of 𝜏𝑖 in the interval
𝐴
is executed, and 𝑎𝑔 𝑒𝑖 increases by one. The details of the online policy 𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 ) (Lines 34).
that effectively balances the aging of detection and association for each • 𝑅𝐶𝑖 (𝐿, 𝐿) is either zero or 𝐶𝑖 (𝐿, 𝐿), since scheduling decisions
task will be provided at the end of this section. As can be observed are only made upon job completion or release in non-preemptive
from the figure, at each scheduling decision, an execution is performed scheduling (Line 4).
6
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
• The execution rate of 𝜏𝑖 in the interval 𝑑1 (𝑡𝑐 𝑢𝑟 ), 𝑑𝑖 (𝑡𝑐 𝑢𝑟 ) is calculated Algorithm 2 Determination of execution options
and recorded (Line 5). 𝑡
Input: 𝜏, 𝑡𝑐 𝑢𝑟 , 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟
𝑝 is set as the sum of the minimum execution times of all tasks
Output: (𝑠𝑘 , 𝑓𝑘 )
𝜏𝑖 ∈ 𝜏 (Line 6). 𝑡
1: if 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 ≤ 0 then
• The slack is then determined as the remaining time slots, exclud-
2: return (𝐿, 𝐿)
ing 𝑝 (i.e., the sum of 𝑞𝑖 ), within the interval 𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 ) (Line
3: else
7).
4: if 𝑎𝑔 𝑒𝐷
𝑘
𝑎𝑔 𝑒𝐴
𝑘
then
𝑡
Example. Fig. 3(c) illustrates a scheduling scenario of EDF-Slack using 5: 𝑠𝑙𝑎𝑐 𝑘𝐷
𝑘
= 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 (𝐶𝑘𝐷 (𝐻) 𝐶𝑘𝐷 (𝐿))
the same example tasks as shown in Figs. 3(a) and (b). The initial jobs 6: if 𝑠𝑙𝑎𝑐 𝑘𝐷
𝑘
≥ 0 then
of 𝜏1 and 𝜏2 are released at 𝑡 = 0 and 𝑡 = 13, respectively. Applying 7: return (𝐻 , 𝑓𝑘 (𝑠𝑙𝑎𝑐 𝑘𝐷 𝑘
+ 𝐶𝑘𝐴 (𝐿)))
Algorithm 1, the calculated slack value for 𝜏1 at 𝑡𝑐 𝑢𝑟 = 0 is 17, allowing 8: else
𝑡
the first job of 𝜏1 to execute for 𝐶1 (𝐻 , 𝐻) until 𝑡 = 25. Furthermore, 9: return (𝑠𝑘 (𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 + 𝐶𝑘𝐷 (𝐿)), 𝐿)
𝑎𝑔 𝑒𝐷
1
and 𝑎𝑔 𝑒𝐴1
increment by one. Subsequently, the first job of 𝜏2 begins 10: end if
its execution at 𝑡 = 25, executing for 𝐶2 (𝑀 , 𝐿) while increasing 𝑎𝑔 𝑒𝐷 2
11: else
𝑡
by one. Finally, the second job of 𝜏2 starts its execution at 𝑡 = 37. 12: 𝑠𝑙𝑎𝑐 𝑘𝐴
𝑘
= 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 (𝐶𝑘𝐴 (𝐻) 𝐶𝑘𝐴 (𝐿))
𝐴
if 𝑠𝑙𝑎𝑐 𝑘𝑘 ≥ 0 then
Comparing Fig. 3(b) that represents EDF-BE with Fig. 3(c) depicting 13:
EDF-Slack, we observe that the aging of 𝜏1 and 𝜏2 increases in the same 14: return (𝑠𝑘 (𝑠𝑙𝑎𝑐 𝑘𝐴 𝑘
+ 𝐶𝑘𝐷 (𝐿)), 𝐻)
amount in both cases. However, the key difference lies in the execution 15: else
𝑡
of the first job of 𝜏1 . Under EDF-Slack, this job is able to execute with a 16: return (𝐿, 𝑓𝑘 (𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 + 𝐶𝑘𝐴 (𝐿)))
high-workload execution, while under EDF-BE, it can only execute with 17: end if
a middle-workload execution, which allows for higher expectations of 18: end if
tracking accuracy in EDF-Slack. 19: end if
The following proves the timing guarantee of EDF-Slack.
Theorem 2. A task set 𝜏 that satisfies Eq. (2) is schedulable under
EDF-Slack . • If the slack is less than or equal to zero, the algorithm returns 𝐿
and 𝐿 (Lines 12).
Proof. We prove this by contradiction. Assume, for the sake of con- • Otherwise, the algorithm compares the ages of the detection step
tradiction, that the task set 𝜏 satisfies Eq. (2), but is not schedulable (𝑎𝑔 𝑒𝐷
𝑘
) and the association step (𝑎𝑔 𝑒𝐴
𝑘
) (Lines 34).
under EDF-Slack. This implies that at some time 𝑡, the total utilization
exceeds 1.0, and hence a deadline miss occurs for some job 𝐽𝑖 in 𝜏. If 𝑎𝑔 𝑒𝐷
𝑘
is smaller than 𝑎𝑔 𝑒𝐴 𝑘
, indicating the detection step
Let 𝑡𝑚𝑖𝑠𝑠 denote the earliest such time at which a deadline miss occurs, requires more resources, the algorithm calculates 𝑠𝑙𝑎𝑐 𝑘𝐷 𝑘
,
i.e., 𝑡𝑚𝑖𝑠𝑠 = 𝑑𝑖 , where 𝑑𝑖 is the deadline of 𝐽𝑖 . By the definition of EDF- representing the remaining slack after executing
Slack, at each time 𝑡, the slack time for each task is computed based high-workload detection (Line 5).
on the highest-priority job 𝐽1 (𝑡𝑐 𝑢𝑟 ), where 𝑡𝑐 𝑢𝑟 denotes the current time. If 𝑠𝑙𝑎𝑐 𝑘𝐷
𝑘
is greater than or equal to zero, the high-workload
Since no tasks are released in the interval [𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 )], the slack time detection is followed by middle- or high-workload associa-
ensures that lower-priority tasks cannot block the execution of 𝐽1 . As a tion depending on 𝑠𝑙𝑎𝑐 𝑘𝐷 𝑘
(Lines 67). In this case, 𝑓𝑘 (𝑥) is
result, the blocking term in Eq. (2) remains valid during this interval. set as follows:
Now, since EDF-Slack is based on EDF scheduling, the total utiliza- 𝐿 for 𝑥 < 𝐶𝑘𝐴 (𝑀),
tion 𝑈 (𝑡) at any time 𝑡 can be expressed as:
𝑀 for 𝐶𝑘𝐴 (𝑀) ≤ 𝑥 < 𝐶𝑘𝐴 (𝐻),
𝐶𝑖
𝑈 (𝑡) = + 𝐵(𝑡), 𝐻 for 𝑥𝐶𝑘𝐴 (𝐻).
𝑑
𝐽 ∈𝜏(𝑡) 𝑖
𝑡𝑐 𝑢𝑟
𝑖
where 𝐶𝑖 is the remaining execution time of task 𝐽𝑖 , and 𝐵(𝑡) is the If 𝑠𝑙𝑎𝑐 𝑘𝐷
𝑘
is less than zero, the algorithm determines if
blocking term. According to Eq. (2), 𝑈 (𝑡) ≤ 1.0 for all 𝑡. Since 𝑡𝑚𝑖𝑠𝑠 is middle- or high-workload detection can be performed based
𝑡
the earliest time a deadline miss occurs, we must have 𝑈 (𝑡𝑚𝑖𝑠𝑠 ) > 1.0. on 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 , followed by low-workload association (Lines
However, by Eq. (2), we know that 𝑈 (𝑡) ≤ 1.0 for all 𝑡𝑡𝑐 𝑢𝑟 , including 810). In this case, 𝑠𝑘 (𝑥) is set as follows:
𝑡𝑚𝑖𝑠𝑠 . This leads to a contradiction, as the assumption that 𝑈 (𝑡𝑚𝑖𝑠𝑠 ) > 𝐿 for 𝑥 < 𝐶𝑘𝐷 (𝑀),
1.0 contradicts the fact that 𝑈 (𝑡) ≤ 1.0 holds at all times. Therefore,
𝑀 for 𝐶𝑘𝐷 (𝑀) ≤ 𝑥 < 𝐶𝑘𝐷 (𝐻),
no deadline miss can occur, and the task set 𝜏 is schedulable under
EDF-Slack. □ 𝐻 for 𝑥𝐶𝑘𝐷 (𝐻).
Note that the proof is self-contained, but a different proof can be • Lines 1118 follow a similar procedure for determining the ex-
found in [5]. ecution options, giving preference to the association step. Here,
Determination of execution options. EDF-BE and EDF-Slack use 𝑠𝑙𝑎𝑐 𝑘𝐷 represents the remaining slack after executing the high-
𝑘
different slack concepts to ensure timely execution of tasks while im- workload association.
proving tracking accuracy by executing beyond the minimum
According to the definition of aging, 𝑎𝑔 𝑒𝐷 𝑘
(and 𝑎𝑔 𝑒𝐴
𝑘
) increase
(i.e., 𝐶𝑖 (𝐿, 𝐿)). As shown in Figs. 3(b) and (c), both EDF-BE and
by one when middle- or high-workload detection (and association) is
EDF-Slack enhance the aging of detection and association through
performed.
predefined mechanisms. The goal of these mechanisms is to balance the
DNN-SAM proposed in [10] introduces two scheduling algorithms:
aging of detection and association, minimizing continuous omissions in
EDF-MandFirst and EDF-Slack. Unlike CA-MOT, both DNN-SAM al-
updating motion and feature information, thereby maximizing tracking
gorithms target multi-object detection (MOD) tasks. The primary dis-
accuracy. tinction between MOT and MOD lies in the presence or absence of
Algorithm 2 outlines the process for determining the execution dependencies between consecutive frames. In MOD, the detection oper-
options for the detection and association steps of task 𝜏𝑘 at time 𝑡𝑐 𝑢𝑟 ation for a given frame does not utilize any information from previous
𝑡
based on the slack 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 calculated in Algorithm 1. frames. Therefore, techniques that rely on previous frame information,
7
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
such as aging-aware methods, cannot be employed in the DNN-SAM al- Table 2
gorithms. Another key difference is that DNN-SAM is responsible solely Execution time measurement (average and maximum) in terms of image size, feature
size, and scheduling overhead.
for detection execution and does not handle the association task. Both
Time (ms) 𝐶𝑖𝐷 𝐶𝑖𝐴 𝐶𝑖𝑠𝑐 𝑒
DNN-SAM and CA-MOT algorithms are based on EDF and prioritize
executing jobs with the earliest deadlines among the released tasks. L M H L M H
However, in contrast to CA-MOT, DNN-SAM splits each job at release Average 28.0 30.6 36.7 8.3 63.4 74.4 0.3
Maximum 43.6 53.5 67.6 11.3 74.0 125.2 0.6
into a mandatory job, responsible for execution in the safety-critical
area, and an optional job, responsible for execution in non-critical
areas. When any mandatory job is present in the waiting queue, it
is always executed first using the EDF algorithm. The distinction be-
tween MandFirst and EDF-Slack arises from whether the execution
of an optional job may interfere with the execution of a mandatory
job. Specifically, the scheduling behavior of DNN-SAM and EDF-Slack
operates as follows:
• EDF-MandFirst in [10]: Any mandatory job in the waiting queue
has a higher priority than optional jobs and is scheduled using
EDF. If no mandatory jobs are in the queue, optional jobs are
executed using EDF, ensuring that they do not interfere with the Fig. 4. Comparison for two tasks with the periods (equal to the relative deadlines) of
execution of future release jobs of mandatory tasks. 180 ms and 270 ms.
• EDF-Slack in [10]: Any mandatory job in the waiting queue has
a higher priority than optional jobs and is scheduled using EDF.
If no mandatory jobs are in the queue, optional jobs are executed the ground truth of objects in all frames with the tracking results
using EDF, potentially interfering with the execution of future- obtained from the given techniques to measure accuracy. The
release mandatory jobs, within the slack calculated from the jobs KITTI dataset consists solely of data captured from forward-facing
runtime. cameras and does not utilize different cameras, meaning there
• EDF-BE of CA-MOT: A job is not split and has three execu- is no overlap in the areas they cover. Additionally, as it does
tion options for both detection and association. It is executed not assume simultaneous capture by each camera, there are no
with the maximum workload option to avoid interfering with synchronization issues. CA-MOT aims to maximize the average
the execution of future release jobs, but only when exactly one accuracy for the MOT tasks corresponding to all given cameras
job is present in the waiting queue. The aging of detection and without missing any deadlines. This assumes that CA-MOT oper-
association tasks is considered for accuracy maximization. ates independently of camera interdependencies, with all cameras
• EDF-Slack of CA-MOT: A job is not split and has three execution receiving the same forward-facing camera feed.
options for both detection and association. Regardless of the • Execution time measurement: To obtain the WCET of different
number of jobs in the waiting queue, the job is executed with execution options for detection and association, we measured the
the maximum workload option, potentially interfering with the execution time by iterating 1000 times for each sub-tasks with
execution of future release jobs based on the slack calculated three different execution options of an MOT task and then took
from its runtime. The aging of detection and association tasks is the largest value. We also measured the worst-case time required
considered for accuracy maximization. for slack calculation and scheduling decisions such as Algorithms
1 and 2. Table 2 shows the measurement results.
5. Evaluation
5.2. Experiment result
This section evaluates the effectiveness of CA-MOT in achieving R1
and R2 for multiple MOT tasks.
We consider task sets in which schedulability is not guaranteed with
the high-workload execution for detection and association
5.1. Experiment setting
(i.e., 𝐶𝑖 (𝐻 , 𝐻)) for all tasks but is guaranteed with the minimum
execution (𝐶𝑖 (𝐿, 𝐿)) according to Eq. (2). Note that the schedulability
• Software: CA-MOT employs the tracking-by-detection of which
with 𝐶𝑖 (𝑥, 𝑦) for 𝑥, 𝑦 ∈ {𝐿, 𝑀 , 𝐻} can be judged with Eq. (2) by
the detector is one of the most popular detectors, YOLOv5 [14]
substituting 𝐶𝑖 (𝐿, 𝐿) to 𝐶𝑖 (𝑥, 𝑦). To evaluate the effectiveness of CA-
model, and tracker is StrongSORT [7]. We confirmed that other
MOT we consider the following including a baseline and our two
detectors (i.e. YOLOX, Faster-RCNN) exhibit a similar trend to
proposed approaches.
YOLOv5 in terms of MOTA and execution time, as shown in Fig. 7.
For feature extraction conducted as a part of association, we used • Detection first (DF): non-preemptive EDF in which the execution
OS-Net [10]. The YOLOv5 model was pretrained on the COCO option of all tasks 𝜏𝑖 ∈ 𝜏 is equally fixed to the rightmost
Dataset [18], while OS-Net was pretrained on the MSMT Dataset
one among {𝐶𝑖 (𝐿, 𝐿), 𝐶𝑖 (𝑀 , 𝐿), 𝐶𝑖 (𝐻 , 𝐿), 𝐶𝑖 (𝐻 , 𝑀), 𝐶𝑖 (𝐻 , 𝐻)} that
[19]. The experimental environment is with Ubuntu 18.04.6 LTS,
satisfies the schedulability condition in Eq. (2).
CUDA 11.4, and PyTorch 1.12.
• EDF-BE: EDF-BE of which task set passes the schedulability con-
• Hardware: We consider the NVIDIA Jetson Xavier as a GPU-
dition in Eq. (2), which is proposed in Section 4.2.
enabled embedded board [20]. The NVIDIA Jetson Xavier features
• EDF-Slack: EDF-Slack of which task set passes the schedulability
a 64-bit 8-core CPU, 32 GB Memory, and 512-core Volta GPU. We
condition in Eq. (2), which is proposed in Section 4.3.
utilized the MAXN mode provided by the NVIDIA Jetson Xavier.
• Dataset and performance metric: We used the KITTI Dataset Fig. 4 represents the tracking accuracy and the proportion of three
[13], which contains data collected from autonomous vehicle execution options (i.e., 𝐿, 𝑀, and 𝐻) selected during detection and
driving. To evaluate the accuracy of each region, we measured association for two tasks with different periods: 180 and 270 ms
the MOTA [15] as the most well-known performance metric for (milliseconds). As shown in Fig. 4(a), for overall accuracy, EDF-BE and
tracking accuracy for critical and entire regions. MOTA compares EDF-Slack achieve 20.2% and 26.6%, respectively, while DF achieves
8
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
Fig. 5. Comparison for four tasks with the same period (equal to the relative deadline) Fig. 6. Visualization on KITTI dataset for three tasks with the periods of 400 ms. (For
of 400 ms. interpretation of the references to color in this figure legend, the reader is referred to
the web version of this article.)
13.4%, which demonstrate the effectiveness of slack utilization and
balancing aging of detection and association in increasing tracking
accuracy. We observe that the slack reclamation performed by Algo-
rithm 1 in EDF-Slack is significantly more effective in achieving high
tracking accuracy than in EDF-BE which has limitations in obtaining
a substantial amount of slack. For critical accuracy, EDF-BE and EDF-
Slack achieve much higher accuracies, which are 28.3% and 32.2%,
respectively, compared to 15.4% of DF. Based on this observation,
we can interpret that even though EDF-BE obtains a smaller amount
Fig. 7. MOTA and execution time on other detectors.
of slack compared to EDF-Slack, it efficiently performs tracking for
the critical region with limited computing resources. On the other
hand, EDF-Slack provides high tracking accuracy not only for the entire
region but also for the safety-critical region, thanks to its efficient slack Additional experiments are conducted to ascertain if CA-MOT ex-
reclamation. As seen in Fig. 4, EDF-Slack exhibits a significantly higher hibits comparable behavioral patterns across a range of detectors,
proportion of high-workload execution and middle-workload execution including YOLOv5, which was evaluated previously. Fig. 7 displays
for detection and association compared to other execution options. On the MOTA and execution time for various contemporary detectors,
the other hand, EDF-BE shows a slight proportion of middle-workload analyzed according to their workload. Modern detectors are generally
execution, while the majority of cases involve low-workload execution. classified into one-stage and two-stage categories based on their archi-
Fig. 5 depicts the results of another experiment involving three tecture and further into anchor-free and anchor-based types, contingent
different sets of tasks, with the number of tasks ranging from two on their use of predefined anchors for object detection. Our study
to four, all having the same periods (i.e., 400 ms with a guaranteed incorporated YOLOv5, a standard one-stage anchor-based detector. We
minimum execution 𝐶𝑖 (𝐿, 𝐿), but no guaranteed maximum execution also investigate the performance of the two-stage anchor-based detector
𝐶𝑖 (𝐻 , 𝐻) for 𝜏𝑖 ∈ 𝜏). In Fig. 5(a), the tracking accuracy of the evaluated Faster-RCNN [9] and the one-stage anchor-free detector YOLOX [21],
approaches is shown as the number of tasks increases. For the case to verify the consistency of results. Faster-RCNN utilized ResNet-50
of two tasks, EDF-Slack achieves an overall accuracy of 41.8% and a as its backbone network, while YOLOX was configured with a small
critical accuracy of 41.4%, while EDF-BE achieves an overall accuracy version model. Both models were trained using the COCO dataset.
of 24.3% and a critical accuracy of 27.2%. In contrast, DF achieves Despite minor discrepancies in specific ratios, the results consistently
lower accuracy, with an overall accuracy of 18.0% and a critical demonstrate that both MOTA and execution time escalate in conjunc-
accuracy of 18.7%. As the number of tasks increases, both EDF-BE and tion with increasing workload, as shown in Figs. 7(a) and (b). The
EDF-Slack experience a decrease in accuracy, but they still outperform runtime trend of YOLOX is particularly noteworthy, which closely
DF in terms of tracking accuracy. Even with only four tasks, EDF- mirrors that of YOLOv5. This pattern indicates that similar outcomes
BE yields lower overall accuracy than DF, as it can only detect part may be expected from other detectors akin to YOLOv5.
of the image when selecting the high workload option. Nevertheless,
by prioritizing computations in critical regions at low and medium
6. Related work
workloads, EDF-BE attains higher critical accuracy than DF. Fig. 5(b)
presents the distribution of execution options for EDF-BE and EDF-Slack
The tracking-by-detection model is a commonly used method in
when there are three tasks. Similar to Fig. 4, it is evident that both
the MOT field. It has shown significant progress and enhanced perfor-
EDF-BE and EDF-Slack allocate the workload between the detection and
association steps in a balanced manner using the ages. Additionally, mance recently, largely thanks to the evolution of deep neural networks
EDF-Slack can reclaim more slack compared to EDF-BE. (DNNs). A well-recognized model in this field, SORT (simple online
Fig. 6 presents the tracking outcomes of a single task within a set and real-time tracking) [22], does its matching based mainly on where
of three tasks, each with a period of 400 ms, comparing (a) the DF objects are located, using detection tools to achieve this. To push this
algorithm and (b) EDF-Slack. In the visualization, each tracked object model further, DeepSORT [6] builds on the SORT model by adding a
is represented by a unique color and ID within a bounding box, with DNN-based re-identification model. This allows for the extraction of ob-
the symbol # indicating the frame number. In the DF scenario, each ject features. By adding this layer, DeepSORT utilizes both the objects
task executes 𝐶𝑖 (𝐻 , 𝐿), leading to insufficient computational resources location and its visual information, leading to a stronger performance.
for proper association. This inadequacy results in DFs failure to track Recent work in this area, such as Deep OC-SORT [23] and Strong-
two objects within the safety-critical region in the 167th frame and SORT [7], is geared towards enhancing the accuracy of these models
causes an ID switch from 4 to 10 in the subsequent 168th frame, even more, focusing especially on refining and improving the matching
as illustrated in Fig. 6. Conversely, EDF-Slack leverages aging and algorithms used in these systems. However, it is critical to understand
slack techniques to allocate sufficient computational resources for both that these approaches are mainly designed for situations where there
detection and association tasks, enabling accurate tracking of all objects are plenty of computing resources. Therefore, they might struggle to
in the safety-critical region. meet the timing needs in systems that are restricted in resources, like
9
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
the embedded systems in self-driving vehicles where resources may be Declaration of competing interest
scarce.
Considering self-driving vehicles, which are fundamentally systems The authors declare the following financial interests/personal rela-
where safety is critical, even the smallest delays or slight drops in tionships which may be considered as potential competing interests:
accuracy can lead to significant and potentially dangerous risks. Some Hyeongboo Baek reports financial support was provided by National
research such as DNN-SAM [5] has tried to tackle these problems by Research Foundation of Korea. If there are other authors, they declare
suggesting frameworks that concentrate specifically on safety-critical that they have no known competing financial interests or personal
areas. These frameworks give priority to critical accuracy and use relationships that could have appeared to influence the work reported
uncertainty handling to ensure the highest safety standards. However, in this paper.
these research studies and their related approaches are mostly designed
for multi-object detection systems and may not directly apply to or be Acknowledgments
effective in multi-object tracking. Likewise, another study, RT-MOT [2],
aims to maximize the overall accuracy of multi-object tracking and This work was supported by the National Research Foundation of
ensure on-time execution, but it overlooks the importance of individual Korea (NRF) grant funded by the Korea government (MSIT) (RS-2023-
objects in its approach. To address these limitations, our suggested 00250742, 2022R1A4A3018824, RS-2024-00438248). This work was
framework, known as CA-MOT, aims to confront these challenges partly supported by the Institute of Information & Communications
directly. By leveraging the unique traits of multi-object tracking in Technology Planning & Evaluation(IITP)-ITRC(Information Technol-
safety-critical systems, CA-MOT ensures on-time execution and boosts ogy Research Center) grant funded by the Korea government(MSIT)
tracking accuracy for objects that could potentially be dangerous to the (IITP-2025-RS-2023-00259061).
system. It builds on previous work while addressing their weaknesses
to create a safer and more efficient tracking system. Data availability
Data will be made available on request.
7. Discussion
A limitation of CA-MOT is its exclusive reliance on a single CPU and References
GPU, which restricts scalability. A recent approach, Batch-MOT [3],
addresses this limitation by processing input images from multiple [1] M. Yang, S. Wang, J. Bakita, T. Vu, F.D. Smith, J.H. Anderson, J.-M. Frahm,
Re-thinking CNN frameworks for time-sensitive autonomous-driving applications:
cameras through a shared queue, distributing CPU operations across Addressing an industrial challenge, in: Proceedings of IEEE Real-Time Technology
multiple CPUs, and employing batch processing on a single GPU. How- and Applications Symposium, IEEE, 2019, pp. 305317.
ever, this approach may introduce additional communication overhead [2] D. Kang, S. Lee, H.S. Chwa, S.-H. Bae, C.M. Kang, J. Lee, H. Baek, RT-MOT:
among CPUs, potentially determining its overall efficiency. The primary Confidence-aware real-time scheduling framework for multi-object tracking tasks,
in: Proceedings of IEEE Real-Time Systems Symposium, IEEE, 2022, pp. 318330.
contribution of Batch-MOT lies in its online schedulability analysis,
[3] D. Kang, S. Lee, C.-H. Hong, J. Lee, H. Baek, Batch-MOT: Batch-enabled real-
which dynamically determines the maximum number of images that time scheduling for multi-object tracking tasks, IEEE Trans. Comput.-Aided Des.
can be batch-processed without violating their deadlines. Nonethe- Integr. Circuits Syst. (2024).
less, unlike CA-MOT, Batch-MOT lacks support for multiple execution [4] S. Liu, X. Fu, M. Wigness, P. David, S. Yao, L. Sha, T. Abdelzaher, Self-cueing
real-time attention scheduling in criticality-aware visual machine perception, in:
strategies during the association phase, resulting in suboptimal resource
Proceedings of IEEE Real-Time Technology and Applications Symposium, IEEE,
utilization for individual MOT tasks. Enhancing CA-MOT by incor- 2022, pp. 173186.
porating batch processing capabilities to address these shortcomings [5] W. Kang, S. Chung, J.Y. Kim, Y. Lee, K. Lee, J. Lee, K.G. Shin, H.S. Chwa,
presents a promising avenue for future research. Furthermore, as high- DNN-SAM: Split-and-merge DNN execution for real-time object detection, in:
Proceedings of IEEE Real-Time Technology and Applications Symposium, 2022,
lighted in previous studies, deploying CA-MOT on real-world platforms,
URL https://rtcl.dgist.ac.kr/index.php/publication-2/.
such as the F1/10 autonomous driving platform [5], offers significant [6] N. Wojke, A. Bewley, D. Paulus, Simple online and realtime tracking with a
potential for further investigation and practical validation. deep association metric, in: Proceedings of the IEEE International Conference on
Image Processing, IEEE, 2017, pp. 36453649.
[7] Y. Du, Y. Song, B. Yang, Y. Zhao, StrongSORT: Make deepsort great again, 2022,
8. Conclusion arXiv preprint arXiv:2202.13514.
[8] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real-
time object detection, in: Proceedings of the IEEE/CVF Conference on Computer
In this paper, we proposed, CA-MOT, a new criticality-aware MOT
Vision and Pattern Recognition, 2016, pp. 779788.
execution and scheduling framework. Aiming at achieving critical- [9] R. Girshick, Fast R-CNN, in: Proceedings of the IEEE International Conference
accuracy maximization and timing guarantee, CA-MOT first proposes on Computer Vision, 2015, pp. 14401448.
a new system design to offer a control knob between tracking accuracy [10] K. Zhou, Y. Yang, A. Cavallaro, T. Xiang, Omni-scale feature learning for
person re-identification, in: Proceedings of the IEEE International Conference
and timing guarantee to efficiently utilize limited computing resources.
on Computer Vision, 2019, pp. 37023712.
Then, CA-MOT develops two scheduling algorithms to effectively uti- [11] Y. Zhang, P. Sun, Y. Jiang, D. Yu, F. Weng, Z. Yuan, P. Luo, W. Liu, X.
lize the system design while using the notions of slack and aging of Wang, ByteTrack: Multi-object tracking by associating every detection box, in:
detection and association. Using various task sets and real-world au- Proceedings of the European Conference on Computer Vision, Springer, 2022,
pp. 121.
tonomous driving data, we demonstrated that CA-MOT can obtain high
[12] Y. Zhang, C. Wang, X. Wang, W. Zeng, W. Liu, FairMOT: On the fairness of
tracking accuracy of entire and safety-critical regions while ensuring detection and re-identification in multiple object tracking, Int. J. Comput. Vis.
the timely execution of all MOT tasks. 129 (11) (2021) 30693087.
[13] A. Geiger, P. Lenz, R. Urtasun, Are we ready for autonomous driving? The
KITTI vision benchmark suite, in: Proceedings of the IEEE/CVF Conference on
CRediT authorship contribution statement Computer Vision and Pattern Recognition, 2012.
[14] ultralytics, YOLOv5 [Online], 2022, Available: https://github.com/ultralytics/
yolov5.
Donghwa Kang: Writing original draft, Software, Methodology,
[15] K. Bernardin, R. Stiefelhagen, Evaluating multiple object tracking performance:
Formal analysis. Jinkyu Lee: Writing review & editing, Valida- the clear mot metrics, EURASIP J. Image Video Process. 2008 (2008) 110.
tion, Formal analysis. Hyeongboo Baek: Writing review & editing, [16] G. Welch, G. Bishop, et al., An introduction to the Kalman filter, ACM SIGGRAPH
Supervision, Funding acquisition, Formal analysis, Conceptualization. (1995).
10
D. Kang et al. Journal of Systems Architecture 160 (2025) 103349
[17] T.P. Baker, A stack-based resource allocation policy for realtime processes, in: Jinkyu Lee is an associate professor in the Department
Proceedings of IEEE Real-Time Systems Symposium, IEEE, 1990, pp. 191200. of Computer Science and Engineering at Sungkyunkwan
[18] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, University (SKKU), South Korea, where he joined in 2014.
C.L. Zitnick, Microsoft COCO: Common objects in context, in: Proceedings of the He received the BS, MS, and Ph.D. degrees in computer
European Conference on Computer Vision, Springer, 2014, pp. 740755. science from the Korea Advanced Institute of Science and
[19] L. Wei, S. Zhang, W. Gao, Q. Tian, Person transfer gan to bridge domain gap Technology (KAIST), South Korea, in 2004, 2006, and 2011,
for person re-identification, in: Proceedings of the IEEE/CVF Conference on respectively. He has been a research fellow/visiting scholar
Computer Vision and Pattern Recognition, 2018, pp. 7988. in the Department of Electrical Engineering and Computer
[20] NVIDIA, NVIDIA Xavier Developer Kit. [Online], 2022, Available: https://www. Science, University of Michigan until 2014. His research
nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-agx-xavier. interests include system design and analysis with timing
[21] Z. Ge, S. Liu, F. Wang, Z. Li, J. Sun, Yolox: Exceeding yolo series in 2021, 2021, guarantees, QoS support, and resource management in real-
arXiv preprint arXiv:2107.08430. time embedded systems and cyberphysical systems. He won
[22] A. Bewley, Z. Ge, L. Ott, F. Ramos, B. Upcroft, Simple online and realtime track- the best student paper award from the 17th IEEE Real-Time
ing, in: Proceedings of the IEEE International Conference on Image Processing, and Embedded Technology and Applications Symposium
IEEE, 2016, pp. 34643468. (RTAS) in 2011 and the Best Paper Award from the 33rd
[23] G. Maggiolino, A. Ahmad, J. Cao, K. Kitani, Deep oc-sort: Multi-pedestrian IEEE Real-Time Systems Symposium (RTSS) in 2012.
tracking by adaptive re-identification, 2023, arXiv preprint arXiv:2302.11813.
Hyeongboo Baek is an associate professor in the Depart-
Dongwha Kang is a Ph.D. course student in the School ment of Artificial Intelligence, University of Seoul (UOS),
of Computing, Korea Advanced Institute of Science and South Korea. He received the BS degree in Computer Science
Technology (KAIST), South Korea. He received a BS and and Engineering from Konkuk University, South Korea, in
MS degree in computer science from Incheon National Uni- 2010 and the MS and Ph.D. degrees in Computer Science
versity (INU) in 2022 and 2024 respectively. His research from KAIST, South Korea, in 2012 and 2016, respectively.
interests include artificial intelligence, autonomous systems, His research interests include cyberphysical systems, real-
and real-time embedded systems. time embedded systems, and system security. He won the
best paper award from the 33rd IEEE Real-Time Systems
Symposium (RTSS) in 2012.
11