Journal of Systems Architecture 160 (2025) 103349 Contents lists available at ScienceDirect Journal of Systems Architecture journal homepage: www.elsevier.com/locate/sysarc Real-time scheduling for multi-object tracking tasks in regions with different criticalities Donghwa Kang a , Jinkyu Lee b ,∗, Hyeongboo Baek c ,∗ a Korea Advanced Institute of Science and Technology (KAIST), Daejeon, South Korea b Sungkyunkwan University (SKKU), Suwon, South Korea c University of Seoul (UOS), Seoul, South Korea ARTICLE INFO ABSTRACT Keywords: Autonomous vehicles (AVs) utilize sensors such as LiDAR and cameras to iteratively perform sensing, decision- Multi-object tracking making, and actions. Multi-object tracking (MOT) systems are employed in the sensing stage of AVs, using these Real-time scheduling sensors to detect and track objects like pedestrians and vehicles, thereby enhancing situational awareness. Timing guarantee These systems must handle regions of varying criticality and dynamically shifting locations, all within limited Criticality-awareness computing resources. Previous DNN-based MOT approaches primarily focused on tracking accuracy, but timing Autonomous driving guarantees are becoming increasingly vital for autonomous driving. Although recent studies have introduced MOT scheduling frameworks with timing guarantees, they are either restricted to single-camera systems or fail to prioritize safety-critical regions in the input images. We propose CA-MOT, a Criticality-Aware MOT execution and scheduling framework for multiple cameras. CA-MOT provides a control knob that balances tracking accuracy in safety-critical regions and timing guarantees. By effectively utilizing this control knob, CA-MOT achieves both high accuracy and timing guarantees. We evaluated CA-MOT’s performance using a GPU-enabled embedded board commonly employed in AVs, with data from real-world autonomous driving scenarios. 1. Introduction pooling and convolutional layers) using CNN (convolutional neural network)-based models (e.g., OS-Net [10]). For unmatched objects, Autonomous vehicles (AVs) are systems that iteratively perform location-based methods like intersection over union (IoU) are applied. sensing, decision-making, and actions using various sensors such as MOT input images exhibit two key characteristics: (i) regions with LiDAR, radar, inertial measurement units (IMU), and cameras [1]. varying levels of criticality and (ii) dynamically shifting locations. With Multi-object tracking (MOT) systems, used in the perception stage limited computing resources in AVs, it is crucial to deliver different of AVs, track objects like pedestrians and cars, enhancing situational levels of service quality based on criticality. Safety-critical regions, awareness. Since MOT information is periodically transferred to control where objects with a short time-to-collision (e.g., under 2 s) cluster, tasks, timely execution must be guaranteed to ensure safety and prevent must be prioritized. If multiple clusters exist, the broader area en- severe accidents [2–4]. Low accuracy, despite timely execution, may compassing them is considered the safety-critical region, as defined in result in missed objects, thus compromising AVs’ safety [2,4,5]. There- DNN-SAM [5]. Established methods compute time-to-collision using Li- fore, AV MOT systems should ensure timing guarantees with maximized DAR and IMU data; we follow the approach from DNN-SAM. This leads accuracy. to two requirements for criticality-aware MOT systems: (R1) accuracy Tracking-by-detection [6,7] is widely used due to its high accuracy maximization for safety-critical regions and (R2) timing guarantees. and ability to leverage state-of-the-art DNN-based detection models Most existing DNN-based MOT approaches focus on accuracy [7, (e.g., YOLO series [8], Faster R-CNN [9]). For each input image from each camera, tracking-by-detection performs two tasks: detection and 11,12], but timing guarantees are increasingly critical in autonomous association. Detection uses DNN-based models to sense the motion driving. Recent research has proposed MOT resource scheduling frame- information of objects, such as location and velocity, while association works that guarantee timing for every MOT execution [2,4]. How- matches objects between frames based on extracted feature informa- ever, [2] overlooks safety-criticality, while [4] focuses on a single task. tion (also called feature vectors or feature maps obtained through We address safety-criticality across multiple tasks, raising the following ∗ Corresponding authors. E-mail addresses: anima0729@kaist.ac.kr (D. Kang), jinkyu.lee@skku.edu (J. Lee), hbbaek359@gmail.com (H. Baek). https://doi.org/10.1016/j.sysarc.2025.103349 Received 22 September 2024; Received in revised form 13 December 2024; Accepted 20 January 2025 Available online 28 January 2025 1383-7621/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies. D. Kang et al. Journal of Systems Architecture 160 (2025) 103349 challenges: C1. How to balance R1 and R2 to efficiently use limited computing resources. C2. How to achieve both R1 and R2 by effectively using the control knob developed from C1. In this paper, we propose CA-MOT, a Criticality-Aware MOT exe- cution and scheduling framework for multiple MOT tasks. To address C1, CA-MOT offers three execution options (low, middle, and high workloads) to balance R1 and R2 for both detection and association. To address C2, CA-MOT introduces the notion of aging for detection and association sub-tasks, estimating the reliability of motion and feature information over time. Balancing the aging of these tasks is essential to achieve R1 and R2 with limited resources (to be discussed in Section 3.4). Based on this, CA-MOT develops two scheduling algo- rithms: EDF-BE and EDF-Slack. EDF-BE increases the workload of tasks waiting in the ready queue for execution (referred to as active tasks) without compromising the R2 bound when no other tasks are pending. In contrast, EDF-Slack is designed to handle scenarios with multiple active tasks. Fig. 1. Tracking accuracy and execution time on different execution options of detection and association. To validate CA-MOT’s performance in meeting R1 and R2, we conducted extensive experiments on an NVIDIA Jetson Xavier using the KITTI Dataset [13]. Additionally, we applied three detectors in our experiments: YOLOv5 [14], YOLOX [8], and Faster-RCNN [9]. a system that tracks specific objects moving between the fields of view The contributions of this paper are as follows: of multiple cameras (called hand-over), this is beyond the scope of our work. This paper focuses on dividing the multi-object tracking task • We motivate the importance of balancing between aging of de- into two subtasks (i.e., detection and association) and using DNN-based tection and association to achieve R1 and R2 (Section 2). MOT-specific properties (i.e., reuse of motion and feature information) • We propose a new system design, CA-MOT that addresses R1 and to achieve R1 and R2 under limited resources. R2 considering varying levels of criticality in different regions for multiple MOT tasks (Section 3). 2.2. Trade-off between accuracy and execution time • We develop new scheduling algorithms to effectively achieve R1 and R2 by balancing between aging of detection and association To address C1, we consider two factors: (i) the input image size for each MOT task (Section 4). and detection within the safety-critical region, and (ii) the number of • We demonstrate the effectiveness of CA-MOT in achieving R1 and objects used for feature extraction during association across all detected R2 using a real-world self-driving dataset (Section 5). objects in each frame. Fig. 1(a) compares the multi-object tracking accuracy (MOTA) [15] 2. Motivation for the overall and safety-critical regions (referred to as overall and critical accuracy) and the execution time for a single MOT task using This section presents target systems and motivates the system de- three input image sizes (256 × 256, 416 × 416, 672 × 672). Overall sign of CA-MOT to address C1 and C2 based on measurement-based accuracy considers all objects, while critical accuracy focuses on the observations. safety-critical region. YOLOv5 [14] is used for detection, and features are extracted for all detected objects. The KITTI dataset [13] is used. 2.1. Target system For image sizes 256 × 256 and 416 × 416, detection is performed on a cropped region of interest (RoI) that includes the safety-critical CA-MOT targets 2D MOT systems on AVs equipped with multiple region. If the RoI is smaller, it is resized to include the critical region; camera sensors. Each MOT task performs MOT execution on consecu- otherwise, the critical region is cropped accordingly. The safety-critical tive input frames received from the corresponding camera sensor at a region will be defined in Section 3. For the 672 × 672 size (i.e., the predetermined period. As this recurring task is required to complete original input size), detection occurs without cropping. a job within a specified deadline, each MOT task is considered a As shown in Fig. 1(a), reducing image size leads to a notable real-time task with a period and deadline. CA-MOT employs tracking- decrease in overall accuracy, while critical accuracy decreases less by-detection comprising two steps of MOT execution: detection and association. The front-end detector performs detection by exploiting significantly due to prioritization of the safety-critical region in the RoI. the existing stand-alone DNN-based detector to identify the position Additionally, execution time decreases as the image size is reduced, and class of objects in the input image. Using the locations of detected demonstrating a trade-off between R1 and R2 when focusing on the objects, the feature extractor (e.g., the deployed CNN model such as critical region. OSNet) extracts features (i.e., feature vectors or feature maps) for each Fig. 1(b) shows the impact of varying the number of objects used object. These features capture the visual characteristics of each object. for feature extraction on accuracy and execution time, with the image The back-end tracker compares the feature similarities between objects size fixed at 672 × 672. The number of objects ranges from zero, three, in the current frame and the previous frame, matching objects with and more than three. OS-Net [10] is used for feature extraction. As high similarity. For any remaining unmatched objects, a location-based shown in Fig. 1(b), as the number of objects with feature extraction matching method such as IoU is applied. The tracker then stores the increases, both overall and critical accuracy improve, but this also leads motion information (position and velocity) and the features of each to increased execution time. This highlights a trade-off between R1 and object in preparation for the next frame. R2 based on the number of objects considered for feature extraction. We assume a system in which each camera independently tracks Section 3.3 details the MOT execution pipeline of CA-MOT, which objects moving within its field of view. While it is possible to consider leverages these observations to effectively address C1. 2 D. Kang et al. Journal of Systems Architecture 160 (2025) 103349 Fig. 2. System design of CA-MOT: the key features are (a) an aging-aware scheduler that provides timing guarantees and a criticality-aware flexible MOT execution pipeline including (b) a detection module that accommodates varying input sizes and (c) an association module that handles a varying number of objects for feature extraction. 2.3. Different combination of detection and association • It provides a timing guarantee while providing prioritized track- ing accuracy for the safety-critical regions by exploiting an MOT- To address C2, Fig. 1(c) reveals an intriguing observation that differ- specific property. ent combinations of image sizes and the number of feature extractions To address the first goal, CA-MOT implements a criticality-aware yield distinct effects on accuracy and execution time. The experiment flexible MOT execution pipeline in which detection and association was conducted over 100 consecutive frames. are performed with different execution option by leveraging the ob- In Fig. 1(c), the execution of detection or association is denoted by servations discussed in Section 2.2. To address the second goal, CA- 𝑃 or 𝐹 . 𝑃 represents partial computation, where detection is performed MOT develops an aging-aware task-level scheduler to provide accuracy only on the region of interest (RoI) at a size of 256 × 256, including maximization while providing a timing guarantee by exploiting the a safety-critical region, and association is limited to location-based observations discussed in Section 2.3 building upon the MOT execution association without feature extraction. 𝐹 represents full computation, pipeline. The MOT execution pipeline and the scheduler are imple- where detection is performed on the entire image at a size of 672 × 672, mented as separate threads, and they are communicated with shared and association includes feature extraction for all objects. The number memory. in the upper right of the notation indicates how many times the combi- CA-MOT does not require modifications to existing DNN models nation of detection and association has been performed. For example, (e.g., detectors and feature extractors), which allows for reusing most the notation 𝐹 𝐹 50 𝑃 𝑃 50 indicates that we use 𝐹 for both the detection (if not all) stand-alone detectors and feature extractors. Notably, state- and association steps in the first 50 frames and 𝑃 for both phases in the of-the-art detectors like YOLOv5 are inherently designed to handle remaining 50 frames. To mitigate the issue of objects outside the critical varying input image sizes, and all CNN models can perform batch exe- region not being detected due to cropping, which can decrease the cution on multiple images (each corresponding to a different object). As accuracy of the non-critical region, we utilize the position information shown in Fig. 2, the key features of CA-MOT are: (a) a scheduling policy of objects from the previous frame as predicted position information for that selects one input per camera, provides timing guarantees, and the current frame using a prediction model such as Kalman filter [16] adjusts the workload for detection and association; (b) a module that during the execution of 𝑃 in the detection step. Except for 𝑃 𝑃 100 and processes detection with inputs of varying sizes; and (c) a module that 𝐹 𝐹 100 , all combinations have the same proportion of 𝐹 and 𝑃 for the extracts features from a pre-determined number of detected objects. entire frames. As shown in Fig. 1(c), although 𝐹 𝐹 50 𝑃 𝑃 50 and 𝑃 𝐹 100 have similar 3.2. Workflow execution times with (𝑃 𝐹 + 𝐹 𝑃 )100 and (𝐹 𝐹 + 𝑃 𝑃 )100 , they show lower tracking accuracy. This indicates that different combinations of 𝐹 and Fig. 2 presents the workflow of CA-MOT. During system operation, 𝑃 can have a varying impact on accuracy. The observation in Fig. 1(c) the task scheduler maintains a queue to store images periodically necessitates a new scheduler that is capable of obtaining high tracking received from each camera sensor (⃝). 1 Then, the task scheduler deter- accuracy by capturing an MOT-specific property referred to as aging, mines the following for tasks in the queue: (a) the task to be scheduled, which will be detailed in Section 3.4. (b) the execution option for the detector, and (c) the execution option for the association (⃝). 2 After an image moves to the MOT execution 3. System design of CA-MOT pipeline, the critical region identification module identifies a safety- critical region from the image and crops (or not) a RoI including the This section presents the goal and design of CA-MOT to address C1 safety-critical region according to the execution option for the detector and C2. (⃝). 3 Depending on the execution option, the cropped RoI or entire image is processed for detection (⃝).4 Furthermore, depending on the 3.1. System overview number of objects for which features are extracted, CA-MOT selectively extracts features for detected objects (⃝). 5 All detected objects are then CA-MOT utilizes a tracking-by-detection approach, consisting of matched with the tracked objects from the previous frame. If both the two steps: detection and association, where the front-end detector em- detected and tracked objects have feature vectors, they are associated ploys a pre-existing DNN-based detector to detect and classify objects through feature-based matching. Otherwise, they are associated solely in the input image, and the unmatched objects are matched using a based on their locations (⃝). 6 location-based method like IoU. CA-MOT aims at providing prioritized tracking accuracy for the safety-critical region with a timing guarantee 3.3. Criticality-aware flexible MOT execution pipeline for every MOT execution on limited computing resources by addressing C1 and C2 discussed in Section 1, which has the following design goals. The MOT execution pipeline conducts detection and association sequentially. CA-MOT can employ any existing stand-alone DNN-based • It provides different execution options not only for detection but detectors as long as it can accommodate different sizes of input images also association considering different criticality of regions in input (e.g., YOLO series) and offer a clear trade-off between accuracy and images. execution time. For each input image with a size of 672 × 672, the 3 D. Kang et al. Journal of Systems Architecture 160 (2025) 103349 detector performs the detection to identify the location and class of association for each task at every scheduling decision. The scheduler multiple objects in the image. Once the scheduler determines the task manages MOT tasks using a single queue and is triggered when a task (associated with an input image) to be scheduled and the execution completes its execution or a new task is released. As three execution option for the tasks, the detection is performed for the task according options (e.g., low, middle, and high workloads) are provided for each to the execution option. CA-MOT provides three execution options detection and association under CA-MOT, the scheduler decides the (i.e., low, middle, and high workloads, respectively) providing a trade- image size (e.g., 256 × 256, 416 × 416, and 672 × 672) for detection off between execution time and accuracy. For low and middle workload and feature size (e.g., zero, three, and more than three) for association detections, CA-MOT first identifies the RoI with sizes of 256 × 256 and according to the scheduling algorithms (to be presented in Section 4). 416 × 416, respectively, and then detection is performed on cropped As discussed in Section 2.3, various combinations of image sizes RoI, which includes the safety-critical region. The area outside the and the quantity of feature extractions result in different impacts on RoI is not subject to detection, and the motion information (e.g., size, tracking accuracy. This is due to an important property of the MOT position, velocity, direction) of objects detected in the previous frame system, which involves supplementing non-updated motion or feature is used in the prediction models such as the Kalman filter to obtain information during detection and association in the current frame by the estimated information of objects in the current frame. On the other utilizing information from the previous frame. For example, in scenar- hand, high-workload detection is performed on the original image with ios with low and middle workload detection, the detection process does a size of 672 × 672. not cover the area outside the RoI. Instead, the motion information We define the area that encompasses all safety-critical objects, of objects detected in the previous frame, such as their size, position, which are objects with a time-to-collision of less than two seconds, as velocity, and direction, is leveraged to estimate the corresponding the safety-critical area. If the safety-critical area exceeds the input size information for objects in the current frame. Moreover, during the as- for the detector, as determined by the detection process (e.g., 256 × 256 sociation step, if the feature extracted from the immediately preceding or 416 × 416), the safety-critical area is cropped and resized to the frame is unavailable due to low- and middle-workload associations, corresponding dimensions before being fed into the detector model. the feature-based matching algorithm compares the features extracted The locations of safety-critical objects are determined based on their from objects in the current frame with the features extracted from the most recently computed positions, without projecting future safety- nearest past frames. Therefore, the tracking accuracy is determined critical regions from them. There are numerous existing approaches by the reliability of the reused motion and feature information of the that calculate time-to-collision based on the relative positions of objects objects. To capture the reliability of the motion and feature information and the ego vehicle given LiDAR and IMU data, and we assume the use of objects, we propose a new notion of aging that specifies the number of one such method. It is also important to note that the KITTI dataset of middle- or high-workload executions of detection and association provides both LiDAR and IMU data. For example, areas where objects conducted from the beginning of the MOT task, respectively. In order with a time-to-collision of less than 2 s congregate can be defined as to update the motion and feature information as frequently as possi- safety-critical regions, and if multiple such areas exist, the encompassing ble using limited computing resources, it is necessary to balance the area that includes all of them would be considered the safety-critical aging of detection and association for each task. Note that increasing region. Please note that we adhere to the definition of the safety-critical the aging of detection and association for all tasks simultaneously in region as defined in the existing paper DNN-SAM in [5]. It is assumed every MOT execution is generally not feasible due to limited com- that the critical region is pre-calculated by external sensors such as puting resources. Therefore, a mechanism is required to balance the LiDAR and IMU and provided to CA-MOT. If an input image does not aging of detection and association for all tasks while providing timing have a critical region, the entire frame is considered a critical region. guarantees under constrained resources. To this end, we propose new As seen in Fig. 2, GPU is used only for the inference of DNN models, scheduling algorithms that will be detailed in the next section. such as the detector (e.g., YOLOv5) and feature extractor (e.g., OSNet), while all other execution is performed on the CPU. 4. Scheduling algorithm For the association, the MOT system uses the two-step approach [7]. Initially, a CNN-based model (e.g., OS-Net [10]) is employed by the This section presents a task model and proposes new scheduling tracker to extract features from the detected objects. The tracker then algorithms building upon CA-MOT. compares these features between the current and previous frames to identify object pairs with the highest feature similarity. For the re- 4.1. Task model maining objects that are not matched based on feature comparison, a location-based matching method such as IoU (intersection over union) Targeting MOT systems in AVs that involve 𝑛 camera sensors, we is used. CA-MOT also provides three execution options (i.e., low, consider a set 𝜏 consisting of 𝑛 MOT tasks denoted as 𝜏𝑖 ∈ 𝜏. Each middle, and high workloads, respectively) for the association. When MOT task 𝜏𝑖 is responsible for conducting MOT execution using input it comes to the middle and high workload associations, the tracker images provided periodically by each camera sensor. As we employ extracts features from some (e.g., three) of the detected objects or the methodology of tracking-by-detection, an MOT task consists of all of the detected objects, and then performs consecutive feature- detection and association sub-tasks. Thus, the specification of each based and location-based matchings. On the other hand, low-workload MOT task 𝜏𝑖 is given as 𝜏𝑖 = (𝑇𝑖 , 𝐶𝑖 (𝑠𝑖 , 𝑓𝑖 ), 𝐷𝑖 ), where 𝑇𝑖 represents the association performs location-based matching only. Depending on the period (or the minimum inter-arrival time), 𝐶𝑖 (𝑠𝑖 , 𝑓𝑖 ) denotes the worst- execution option, CA-MOT may extract features from only a subset or case execution time (WCET) based on the execution options (i.e., low, all of the detected objects, which means that the feature information middle, and high-workload execution) for detection and association of objects may not be updated every time. Therefore, during feature- sub-tasks, respectively, and 𝐷𝑖 indicates the relative deadline. The based matching, the algorithm compares the features extracted from execution time of the detection sub-task depends on the image size the objects in the current frame with the closest previously extracted 𝑠𝑖 ∈ 𝑆𝑖 = {𝐿, 𝑀 , 𝐻}, where 𝐿, 𝑀 , 𝐻 are 256 × 256, 416 × 416, and features of the tracked objects and matches the two objects with the 672 × 672, respectively. Note that CA-MOT supports arbitrary non- highest feature similarity. decreasing sizes for 𝑆𝑖 = {𝐿, 𝑀 , 𝐻}. On the other hand, the execution time of the association sub-task depends on the feature size 𝑓𝑖 ∈ 3.4. Aging-aware task scheduler 𝐹𝑖 = {𝐿, 𝑀 , 𝐻}, where 𝐿, 𝑀 , 𝐻 are zero, from one to three, and more than three, respectively. Note that the tracking-by-detection methodol- The CA-MOT implements a thread-level task scheduler to determine ogy performs the association phase sequentially through feature-based the task to be scheduled and execution options for detection and matching followed by location-based matching using IoU. If 𝑓𝑖 is equal 4 D. Kang et al. Journal of Systems Architecture 160 (2025) 103349 Table 1 On the other hand, the worst-case execution time 𝐶𝑖𝐴 (𝑓𝑖 ) of the associa- Notations used in the scheduling algorithms. tion task depends on the feature size 𝑓𝑖 . It is calculated by considering Symbol Description the time required for extracting features from detected objects and 𝜏𝑖 Task 𝑖 in the system performing matching methods such as feature-based and IoU-based 𝑇𝑖 Period of task 𝜏𝑖 (minimum inter-arrival time) matching. An MOT task 𝜏𝑖 is considered schedulable if every job 𝐽𝑖 𝐷𝑖 Relative deadline of task 𝜏𝑖 (invoked by 𝜏𝑖 ) completes its execution within the relative deadline 𝐷𝑖 . 𝐶𝑖 (𝑋 , 𝑌 ) Worst-case execution time (WCET) of task 𝑖. The overall schedulability of the system is determined by ensuring that 𝑋: image size for detection (𝐿, 𝑀 , 𝐻) 𝑌 : feature size for association (𝐿, 𝑀 , 𝐻) every task 𝜏𝑖 ∈ 𝜏 is schedulable. 𝑠𝑖 Image size for the detection sub-task of task 𝑖 𝑓𝑖 Feature size for the association sub-task of task 𝑖 4.2. EDF best-effort 𝑆𝑖 Set of image size options for task 𝑖 (𝑆𝑖 = {𝐿, 𝑀 , 𝐻}) 𝐹𝑖 Set of feature size options for task 𝑖 (𝐹𝑖 = {𝐿, 𝑀 , 𝐻}) Building upon the system design of CA-MOT presented in Section 3, 𝐿 Low workload execution we develop two scheduling algorithms that aim to provide not only 𝑀 Middle workload execution 𝐻 High workload execution high tracking accuracy for the safety-critical regions but also a tim- 𝐶𝑖𝐷 (𝑠𝑖 ) WCET of the detection sub-task of task 𝑖, based on image ing guarantee for every MOT execution. To this end, the proposed size 𝑠𝑖 scheduling algorithms have the following two features: (F1) an offline 𝐶𝑖𝐴 (𝑓𝑖 ) WCET of the association sub-task of task 𝑖, based on timing guarantee for the minimum execution (i.e., low-workload exe- feature size 𝑓𝑖 cution for both detection and association) of every MOT execution and 𝑅𝐶𝑖 (𝐿, 𝐿) Remaining execution time for the minimum execution of (F2) an online policy to maximize tracking accuracy by systematically task 𝑖 increasing workload (i.e., middle- or high-workload execution) of an 𝑎𝑔 𝑒𝐷 Aging value of the detection sub-task of task 𝑖 MOT execution using notions of slack and aging without compromising 𝑖 𝑎𝑔 𝑒𝐴 𝑖 Aging value of the association sub-task of task 𝑖 timing guarantee. 𝑠𝑙𝑎𝑐 𝑘𝑖𝑡 Slack time available for task 𝑖 at the current time 𝑡𝑐 𝑢𝑟 The proposed scheduling algorithms are based on the non-preem- 𝑐 𝑢𝑟 𝑞𝑖 Minimum execution time of task 𝑖 in the interval ptive earliest deadline first (EDF) scheduling algorithm, which assigns [𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 )] higher priority to jobs with earlier deadlines without allowing any 𝑝 Sum of the minimum execution times for all tasks preemption. To provide the first feature F1, CA-MOT employs the 𝑑1 (𝑡𝑐 𝑢𝑟 ) Earliest deadline or future release time at time instant 𝑡𝑐 𝑢𝑟 existing schedulability analysis developed for non-preemptive EDF as 𝑠𝑙𝑎𝑐 𝑘𝐷− Remaining slack after executing high-workload detection follows. 𝑖 for task 𝑖 𝑠𝑙𝑎𝑐 𝑘𝐴− Remaining slack after executing high-workload Lemma 1. For a set 𝜏 of MOT tasks scheduled by non-preemptive EDF, 𝑖 association for task 𝑖 minimum execution 𝐶𝑖 (𝐿, 𝐿) of every task 𝜏𝑖 ∈ 𝜏 can be executed without deadline miss as long as the following holds for every task 𝜏𝑖 ∈ 𝜏. max𝜏𝑖 𝐶𝑖 (𝐿, 𝐿) ∑ 𝐶𝑖 (𝐿, 𝐿) + ≤ 1.0 (2) to 𝐿, this indicates that no feature extraction has been performed for min𝜏𝑖 𝑇𝑖 𝜏 ∈𝜏 𝑇𝑖 𝑖 the frame, and thus, feature-based matching is skipped, proceeding directly to location-based matching. In the case of 𝐻, we employ the Proof. The lemma presents a schedulability condition for non-preem- maximum number of objects as defined by the environment (for the ptive EDF, and its proof is outlined as follows. Let us target 𝜏𝑘 ∈ 𝜏; dataset considered, this is based on values measured across all videos), also, consider a virtual task 𝜏𝑥 ∉ 𝜏, whose 𝑇𝑥 and 𝐶𝑥 (𝐿, 𝐿) are set for example, 10. Then, the worst-case execution time 𝐶𝑖 (𝑠𝑖 , 𝑓𝑖 ) of each to min𝜏𝑖 ∈𝜏 𝑇𝑖 and max𝜏𝑖 ∈𝜏 𝐶𝑖 (𝐿, 𝐿), respectively. Now, we compare the MOT task 𝜏𝑖 is derived as follows. finishing time of a job of 𝜏𝑘 when (Case 1) 𝜏 is scheduled by non- preemptive EDF, and (Case 2) 𝜏 ∪ {𝜏𝑥 } is scheduled by preemptive EDF. 𝐶𝑖 (𝑠𝑖 , 𝑓𝑖 ) = 𝐶𝑖𝐷 (𝑠𝑖 ) + 𝐶𝑖𝐴 (𝑓𝑖 ), (1) Since at most one lower-priority job can block a high-priority job under where 𝐶𝑖𝐷 (𝑠𝑖 ) and 𝐶𝑖𝐴 (𝑓𝑖 ) are the worst-case execution times of detection non-preemptive scheduling, 𝜏𝑘 can be blocked by at most one lower- and association sub-tasks according to 𝑠𝑖 and 𝑓𝑖 , respectively. As shown priority job under Case 1; obviously, the WCET of the lower-priority job in Fig. 2, both the detection and the association sub-tasks involve GPU is upper-bounded by max𝜏𝑖 ∈𝜏 𝐶𝑖 (𝐿, 𝐿). Also, to block all the following operations, with their respective WCETs including the communication jobs of 𝜏𝑘 , the blocking frequency should be no smaller than 𝑇𝑘 , which costs between the CPU and GPU. Note that the detection sub-task, is lower-bounded by min𝜏𝑖 ∈𝜏 𝑇𝑖 . Therefore, the finishing time of a job of denoted as 𝜏𝑖𝐷 , and the association sub-task, denoted as 𝜏𝑖𝐴 , are executed 𝜏𝑖 under Case 1 is no later than that under Case 2. Once we apply the consecutively without any preemption while sharing the same period well-known schedulability condition for preemptive EDF to Case 2, the and relative deadline. Similarly, when an active task is running, it condition is the same as Eq. (2), which proves the lemma. □ executes without any interruptions, while other tasks wait in the queue. Note that the proof is self-contained, but a different proof for In addition, each task runs on an environment where non-preemption between the GPU and CPU is guaranteed. To ensure this, while the Lemma 1 can be found in [5,17]. CPU is running, the GPU waits for input from the CPU. Once the GPU To provide the second feature F2, the proposed scheduling al- receives the input and is activated, the CPU waits until it receives the gorithms (i) dynamically increase the workload of each MOT task results from the GPU, as illustrated in Fig. 2. As seen Fig. 2, GPU (e.g., from low workload to middle or high workload) without compro- is used only for the inference of DNN models, such as the detector mising the timing guarantee while (ii) balance the aging of detection (e.g., YOLOv5) and feature extractor (e.g., OSNet), while all other and association of every task. We propose two scheduling algorithms execution is performed on the CPU. Also, CA-MOT does not allow that simultaneously provide (i) and (ii) in different ways: EDF-BE parallel execution for multiple MOT executions (see Table 1). (EDF Best-Effort) and EDF-Slack (EDF with Slack reclamation), adapted The worst-case execution time 𝐶𝑖𝐷 (𝑠𝑖 ) of the detection sub-task is from [5]. EDF-BE and EDF-Slack utilize slacks defined differently, but determined by the sum of various components, including preprocessing use the same mechanism (in Algorithm 2) to decide on the execution time (such as cropping and resizing the input image), image transfer option that employs a notion of aging. time from CPU memory to GPU memory, model inference time to Let 𝑑1 (𝑡𝑐 𝑢𝑟 ) be the earliest deadline or future release time among 𝑡 obtain candidate objects, and postprocessing time (e.g., applying non- all tasks at a time instant 𝑡𝑐 𝑢𝑟 . The slack 𝑠𝑙𝑎𝑐 𝑘𝑖𝑐 𝑢𝑟 of task 𝜏𝑖 at 𝑡𝑐 𝑢𝑟 maximum suppression) to extract the final objects from the candidates. under the EDF-BE is defined as the expected remaining time up to 5 D. Kang et al. Journal of Systems Architecture 160 (2025) 103349 Algorithm 1 Slack calculation for 𝜏𝑘 at 𝑡𝑐 𝑢𝑟 under EDF-Slack Input: 𝜏, 𝑡𝑐 𝑢𝑟 𝑡 Output: 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 1: 𝑝 = 0, 𝑈 = the left-hand-side of Equation (2) 2: for 𝑖 = 𝑛 to 1, 𝜏𝑖 ∈ {𝜏1 , ..., 𝜏𝑛 |𝑑1 (𝑡𝑐 𝑢𝑟 ) ≤ ⋯ ≤ 𝑑𝑛 (𝑡𝑐 𝑢𝑟 )} do 𝐶𝑖 (𝐿, 𝐿) 3: 𝑈 =𝑈− 𝑇𝑖 4: 𝑞𝑖 = max(0, 𝑅𝐶𝑖 (𝐿, 𝐿) − (1 − 𝑈 ) ⋅ (𝑑𝑖 (𝑡𝑐 𝑢𝑟 ) − 𝑑1 (𝑡𝑐 𝑢𝑟 ))) ( 𝑅𝐶𝑖 (𝐿, 𝐿) − 𝑞𝑖 ) 5: 𝑈 = min 1.0, 𝑈 + 𝑑𝑖 (𝑡𝑐 𝑢𝑟 ) − 𝑑1 (𝑡𝑐 𝑢𝑟 ) 6: 𝑝 = 𝑝 + 𝑞𝑖 7: end for 𝑡 8: return 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 = 𝑑1 (𝑡𝑐 𝑢𝑟 ) − 𝑡𝑐 𝑢𝑟 − 𝑝 that does not exceed the earliest deadline or future release, ensuring Fig. 3. Execution timeline of multiple MOT tasks under (a) baseline (non-preemptive execution without deadline misses. The following lemma present the EDF), (b) EDF-BE, and (c) EDF-Slack scheduling policies. timing guarantee of EDF-BE. Theorem 1. A task set 𝜏 that satisfies the condition in Eq. (2) is 𝑑1 (𝑡𝑐 𝑢𝑟 ) after the execution of 𝐶𝑖 (𝐿, 𝐿) is completed, which is calculated schedulable by EDF-BE . by 𝑑1 (𝑡𝑐 𝑢𝑟 ) − 𝑡𝑐 𝑢𝑟 − 𝐶𝑖 (𝐿, 𝐿). This slack value is only valid when there are no more than two tasks in the waiting queue at time 𝑡𝑐 𝑢𝑟 and no Proof. According to Lemma 1, for a task set 𝜏 that satisfies Eq. (2), future releases within the interval [𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 )). Using the slack value the minimum execution time 𝐶𝑖 (𝐿, 𝐿) of all tasks 𝜏𝑖 ∈ 𝜏 guarantees conditionally provided at a scheduling decision, EDF-BE can perform execution without deadline misses. At each scheduling decision at 𝑡 middle- or high-workload execution for detection and/or association. under the online policy of EDF-BE, the execution of a job exploiting any slack value does not impose additional inference on any other job. Example. Figs. 3(a) and (b) present a scheduling scenario of the This guarantees that all tasks 𝜏𝑖 receive no more interference than what baseline algorithm (i.e., non-preemptive EDF) and EDF-BE with an they would receive under non-preemptive EDF scheduling. Thus, this example task set. We consider an example task set 𝜏 = {𝜏1 , 𝜏2 } of theorem holds. □ which 𝐶𝑖 = 𝐶𝑖𝐷 (𝐻) + 𝐶𝑖𝐴 (𝐻) = 25, 𝑇𝑖 = 25, 𝐶𝑖𝐷 (𝑠𝑖 ) = {5, 9, 12}, and 𝐶𝑖𝐴 (𝑓𝑖 ) = {3, 8, 13} hold for 𝜏𝑖 ∈ 𝜏. As shown in Figs. 3(a) and (b), each 4.3. EDF with slack reclamation first job of 𝜏1 and 𝜏2 are released at 𝑡 = 0 and 𝑡 = 13, respectively. In the baseline algorithm, the first job of 𝜏1 executes for 25 time units, In the case of EDF-BE, more workload than the minimum execution and then the first job of 𝜏2 starts its execution at 𝑡 = 25 resulting in can only be processed when there is a single job in the waiting queue at a deadline miss at 𝑡 = 38. Let 𝑎𝑔 𝑒𝐷 𝐴 𝑖 and 𝑎𝑔 𝑒𝑖 be the aging value of a given time 𝑡𝑐 𝑢𝑟 and no additional releases occur until 𝑑1 (𝑡𝑐 𝑢𝑟 ). This cre- detection and association of 𝜏𝑖 . The aging value is an integer satisfying ates a limited opportunity for MOT tasks in CA-MOT to perform more 𝑎𝑔 𝑒𝐷 𝐴 𝐷 𝐴 𝑖 , 𝑎𝑔 𝑒𝑖 ≥ 0, and 𝑎𝑔 𝑒𝑖 and 𝑎𝑔 𝑒𝑖 for all task 𝜏𝑖 ∈ 𝜏 are set to zero workload than the minimum execution, thus restricting the potential at the beginning of the system. The, 𝑎𝑔 𝑒𝐷 𝐴 𝑖 (and 𝑎𝑔 𝑒𝑖 ) increases by one to improve tracking accuracy. To address this limitation, we integrate at each time when a detection (and association) is run with middle- or the approach presented in [5] into EDF-Slack, allowing it to compute high-workload. In other words, the aging value refers to the number of slack in a different way than EDF-BE. executions excluding those with low workloads. By adjusting the aging Let 𝑑𝑖 (𝑡𝑐 𝑢𝑟 ) denote the 𝑖th earliest deadline or release time at 𝑡𝑐 𝑢𝑟 , and value, a balance is maintained so that neither detection nor association 𝑅𝐶𝑖 (𝐿, 𝐿) represent the remaining execution time required to complete becomes disproportionately large. the minimum execution 𝐶𝑖 (𝐿, 𝐿). Algorithm 1 outlines the calculation 𝑡 of the slack value 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 for task 𝜏𝑘 at 𝑡𝑐 𝑢𝑟 within the EDF-Slack Compared to EDF-Slack, EDF-BE is a simpler algorithm that utilizes algorithm, triggered at each scheduling decision. Since EDF is a job- as many resources as possible, executing a job for greater than 𝐶𝑖 (𝐿, 𝐿) level fixed-priority scheduling policy, wherein the priority of a job up to its closest future release only when there is exactly one job in the remains constant throughout its execution, scheduling decisions under waiting queue. EDF-BE naturally guarantees no deadline misses in any EDF occur either at the commencement of a job’s execution or upon its job execution. This is because, as stated in Lemma 1, the execution of completion. In the interval [𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 )], EDF-Slack processes tasks in 𝐶𝑖 (𝐿, 𝐿) without deadline misses for all jobs is guaranteed under EDF. reverse EDF order, starting from the task with the latest deadline. Job Furthermore, when a job executes for more than 𝐶𝑖 (𝐿, 𝐿) under EDF- 𝐽𝑘 of 𝜏𝑘 has the highest priority at 𝑡𝑐 𝑢𝑟 , with 𝑑1 (𝑡𝑐 𝑢𝑟 ) being its deadline, BE, there is only one active job in the waiting queue at that time. In as EDF-Slack follows the EDF policy. The goal of the slack calculation the case of EDF-BE, when the first job of task 𝜏1 starts its execution in Algorithm 1 is to delay the execution of all other tasks 𝜏𝑖 ∈ 𝜏 ⧵ 𝜏𝑘 at 𝑡 = 0, it executes the minimum execution 𝐶1 (𝐿, 𝐿) = 8 until the beyond 𝑑1 (𝑡𝑐 𝑢𝑟 ) while ensuring that future deadlines are met. This is earliest deadline or future release at 𝑡 = 13, resulting in a slack of five. repeated for all tasks in the waiting queue. To ensure 𝜏𝑖 completes Utilizing this slack, the task 𝜏1 then executes 𝐶1 (𝑀 , 𝐿), and the aging 𝐶𝑖 (𝐿, 𝐿) before 𝑑𝑖 (𝑡𝑐 𝑢𝑟 ), EDF-Slack calculates the maximum execution factor 𝑎𝑔 𝑒𝐷 𝑖 increases by one. For the first job of task 𝜏2 , released at time in the interval 𝑑1 (𝑡𝑐 𝑢𝑟 ), 𝑑𝑖 (𝑡𝑐 𝑢𝑟 ), which is (1 − 𝑈 ) ⋅ (𝑑𝑖 (𝑡𝑐 𝑢𝑟 ) − 𝑑1 (𝑡𝑐 𝑢𝑟 )), 𝑡 = 13, there is a slack of 4 until the earliest deadline or future release where 𝑈 denote the left-hand-side of Eq. (2). at 𝑡 = 25. Thus, 𝐶2 (𝑀 , 𝐿) is executed, and 𝑎𝑔 𝑒𝐷 𝑖 increases by one. The The key steps in the slack calculation are as follows: second job of task 𝜏1 , released at 𝑡 = 13, has a slack of 5 until the earliest deadline or future release at 𝑡 = 38. To balance 𝑎𝑔 𝑒𝐷 𝐴 𝑖 and 𝑎𝑔 𝑒𝑖 , 𝐶1 (𝐿, 𝑀) • 𝑞𝑖 is computed as the minimum execution of 𝜏𝑖 in the interval 𝐴 is executed, and 𝑎𝑔 𝑒𝑖 increases by one. The details of the online policy 𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 ) (Lines 3–4). that effectively balances the aging of detection and association for each • 𝑅𝐶𝑖 (𝐿, 𝐿) is either zero or 𝐶𝑖 (𝐿, 𝐿), since scheduling decisions task will be provided at the end of this section. As can be observed are only made upon job completion or release in non-preemptive from the figure, at each scheduling decision, an execution is performed scheduling (Line 4). 6 D. Kang et al. Journal of Systems Architecture 160 (2025) 103349 • The execution rate of 𝜏𝑖 in the interval 𝑑1 (𝑡𝑐 𝑢𝑟 ), 𝑑𝑖 (𝑡𝑐 𝑢𝑟 ) is calculated Algorithm 2 Determination of execution options and recorded (Line 5). 𝑡 Input: 𝜏, 𝑡𝑐 𝑢𝑟 , 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 • 𝑝 is set as the sum of the minimum execution times of all tasks Output: (𝑠𝑘 , 𝑓𝑘 ) 𝜏𝑖 ∈ 𝜏 (Line 6). 𝑡 1: if 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 ≤ 0 then • The slack is then determined as the remaining time slots, exclud- 2: return (𝐿, 𝐿) ing 𝑝 (i.e., the sum of 𝑞𝑖 ), within the interval 𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 ) (Line 3: else 7). 4: if 𝑎𝑔 𝑒𝐷 𝑘 ≤ 𝑎𝑔 𝑒𝐴 𝑘 then 𝑡 Example. Fig. 3(c) illustrates a scheduling scenario of EDF-Slack using 5: 𝑠𝑙𝑎𝑐 𝑘𝐷− 𝑘 = 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 − (𝐶𝑘𝐷 (𝐻) − 𝐶𝑘𝐷 (𝐿)) the same example tasks as shown in Figs. 3(a) and (b). The initial jobs 6: if 𝑠𝑙𝑎𝑐 𝑘𝐷− 𝑘 ≥ 0 then of 𝜏1 and 𝜏2 are released at 𝑡 = 0 and 𝑡 = 13, respectively. Applying 7: return (𝐻 , 𝑓𝑘 (𝑠𝑙𝑎𝑐 𝑘𝐷− 𝑘 + 𝐶𝑘𝐴 (𝐿))) Algorithm 1, the calculated slack value for 𝜏1 at 𝑡𝑐 𝑢𝑟 = 0 is 17, allowing 8: else 𝑡 the first job of 𝜏1 to execute for 𝐶1 (𝐻 , 𝐻) until 𝑡 = 25. Furthermore, 9: return (𝑠𝑘 (𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 + 𝐶𝑘𝐷 (𝐿)), 𝐿) 𝑎𝑔 𝑒𝐷 1 and 𝑎𝑔 𝑒𝐴1 increment by one. Subsequently, the first job of 𝜏2 begins 10: end if its execution at 𝑡 = 25, executing for 𝐶2 (𝑀 , 𝐿) while increasing 𝑎𝑔 𝑒𝐷 2 11: else 𝑡 by one. Finally, the second job of 𝜏2 starts its execution at 𝑡 = 37. 12: 𝑠𝑙𝑎𝑐 𝑘𝐴− 𝑘 = 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 − (𝐶𝑘𝐴 (𝐻) − 𝐶𝑘𝐴 (𝐿)) 𝐴− if 𝑠𝑙𝑎𝑐 𝑘𝑘 ≥ 0 then Comparing Fig. 3(b) that represents EDF-BE with Fig. 3(c) depicting 13: EDF-Slack, we observe that the aging of 𝜏1 and 𝜏2 increases in the same 14: return (𝑠𝑘 (𝑠𝑙𝑎𝑐 𝑘𝐴− 𝑘 + 𝐶𝑘𝐷 (𝐿)), 𝐻) amount in both cases. However, the key difference lies in the execution 15: else 𝑡 of the first job of 𝜏1 . Under EDF-Slack, this job is able to execute with a 16: return (𝐿, 𝑓𝑘 (𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 + 𝐶𝑘𝐴 (𝐿))) high-workload execution, while under EDF-BE, it can only execute with 17: end if a middle-workload execution, which allows for higher expectations of 18: end if tracking accuracy in EDF-Slack. 19: end if The following proves the timing guarantee of EDF-Slack. Theorem 2. A task set 𝜏 that satisfies Eq. (2) is schedulable under EDF-Slack . • If the slack is less than or equal to zero, the algorithm returns 𝐿 and 𝐿 (Lines 1–2). Proof. We prove this by contradiction. Assume, for the sake of con- • Otherwise, the algorithm compares the ages of the detection step tradiction, that the task set 𝜏 satisfies Eq. (2), but is not schedulable (𝑎𝑔 𝑒𝐷 𝑘 ) and the association step (𝑎𝑔 𝑒𝐴 𝑘 ) (Lines 3–4). under EDF-Slack. This implies that at some time 𝑡, the total utilization exceeds 1.0, and hence a deadline miss occurs for some job 𝐽𝑖 in 𝜏. – If 𝑎𝑔 𝑒𝐷 𝑘 is smaller than 𝑎𝑔 𝑒𝐴 𝑘 , indicating the detection step Let 𝑡𝑚𝑖𝑠𝑠 denote the earliest such time at which a deadline miss occurs, requires more resources, the algorithm calculates 𝑠𝑙𝑎𝑐 𝑘𝐷− 𝑘 , i.e., 𝑡𝑚𝑖𝑠𝑠 = 𝑑𝑖 , where 𝑑𝑖 is the deadline of 𝐽𝑖 . By the definition of EDF- representing the remaining slack after executing Slack, at each time 𝑡, the slack time for each task is computed based high-workload detection (Line 5). on the highest-priority job 𝐽1 (𝑡𝑐 𝑢𝑟 ), where 𝑡𝑐 𝑢𝑟 denotes the current time. – If 𝑠𝑙𝑎𝑐 𝑘𝐷− 𝑘 is greater than or equal to zero, the high-workload Since no tasks are released in the interval [𝑡𝑐 𝑢𝑟 , 𝑑1 (𝑡𝑐 𝑢𝑟 )], the slack time detection is followed by middle- or high-workload associa- ensures that lower-priority tasks cannot block the execution of 𝐽1 . As a tion depending on 𝑠𝑙𝑎𝑐 𝑘𝐷− 𝑘 (Lines 6–7). In this case, 𝑓𝑘 (𝑥) is result, the blocking term in Eq. (2) remains valid during this interval. set as follows: Now, since EDF-Slack is based on EDF scheduling, the total utiliza- ∗ 𝐿 for 𝑥 < 𝐶𝑘𝐴 (𝑀), tion 𝑈 (𝑡) at any time 𝑡 can be expressed as: ∑ ∗ 𝑀 for 𝐶𝑘𝐴 (𝑀) ≤ 𝑥 < 𝐶𝑘𝐴 (𝐻), 𝐶𝑖 𝑈 (𝑡) = + 𝐵(𝑡), ∗ 𝐻 for 𝑥 ≥ 𝐶𝑘𝐴 (𝐻). 𝑑 𝐽 ∈𝜏(𝑡) 𝑖 − 𝑡𝑐 𝑢𝑟 𝑖 where 𝐶𝑖 is the remaining execution time of task 𝐽𝑖 , and 𝐵(𝑡) is the – If 𝑠𝑙𝑎𝑐 𝑘𝐷− 𝑘 is less than zero, the algorithm determines if blocking term. According to Eq. (2), 𝑈 (𝑡) ≤ 1.0 for all 𝑡. Since 𝑡𝑚𝑖𝑠𝑠 is middle- or high-workload detection can be performed based 𝑡 the earliest time a deadline miss occurs, we must have 𝑈 (𝑡𝑚𝑖𝑠𝑠 ) > 1.0. on 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 , followed by low-workload association (Lines However, by Eq. (2), we know that 𝑈 (𝑡) ≤ 1.0 for all 𝑡 ≥ 𝑡𝑐 𝑢𝑟 , including 8–10). In this case, 𝑠𝑘 (𝑥) is set as follows: 𝑡𝑚𝑖𝑠𝑠 . This leads to a contradiction, as the assumption that 𝑈 (𝑡𝑚𝑖𝑠𝑠 ) > ∗ 𝐿 for 𝑥 < 𝐶𝑘𝐷 (𝑀), 1.0 contradicts the fact that 𝑈 (𝑡) ≤ 1.0 holds at all times. Therefore, ∗ 𝑀 for 𝐶𝑘𝐷 (𝑀) ≤ 𝑥 < 𝐶𝑘𝐷 (𝐻), no deadline miss can occur, and the task set 𝜏 is schedulable under EDF-Slack. □ ∗ 𝐻 for 𝑥 ≥ 𝐶𝑘𝐷 (𝐻). Note that the proof is self-contained, but a different proof can be • Lines 11–18 follow a similar procedure for determining the ex- found in [5]. ecution options, giving preference to the association step. Here, Determination of execution options. EDF-BE and EDF-Slack use 𝑠𝑙𝑎𝑐 𝑘𝐷− represents the remaining slack after executing the high- 𝑘 different slack concepts to ensure timely execution of tasks while im- workload association. proving tracking accuracy by executing beyond the minimum According to the definition of aging, 𝑎𝑔 𝑒𝐷 𝑘 (and 𝑎𝑔 𝑒𝐴 𝑘 ) increase (i.e., 𝐶𝑖 (𝐿, 𝐿)). As shown in Figs. 3(b) and (c), both EDF-BE and by one when middle- or high-workload detection (and association) is EDF-Slack enhance the aging of detection and association through performed. predefined mechanisms. The goal of these mechanisms is to balance the DNN-SAM proposed in [10] introduces two scheduling algorithms: aging of detection and association, minimizing continuous omissions in EDF-MandFirst and EDF-Slack. Unlike CA-MOT, both DNN-SAM al- updating motion and feature information, thereby maximizing tracking gorithms target multi-object detection (MOD) tasks. The primary dis- accuracy. tinction between MOT and MOD lies in the presence or absence of Algorithm 2 outlines the process for determining the execution dependencies between consecutive frames. In MOD, the detection oper- options for the detection and association steps of task 𝜏𝑘 at time 𝑡𝑐 𝑢𝑟 ation for a given frame does not utilize any information from previous 𝑡 based on the slack 𝑠𝑙𝑎𝑐 𝑘𝑘𝑐 𝑢𝑟 calculated in Algorithm 1. frames. Therefore, techniques that rely on previous frame information, 7 D. Kang et al. Journal of Systems Architecture 160 (2025) 103349 such as aging-aware methods, cannot be employed in the DNN-SAM al- Table 2 gorithms. Another key difference is that DNN-SAM is responsible solely Execution time measurement (average and maximum) in terms of image size, feature size, and scheduling overhead. for detection execution and does not handle the association task. Both Time (ms) 𝐶𝑖𝐷 𝐶𝑖𝐴 𝐶𝑖𝑠𝑐 ℎ𝑒 DNN-SAM and CA-MOT algorithms are based on EDF and prioritize executing jobs with the earliest deadlines among the released tasks. L M H L M H However, in contrast to CA-MOT, DNN-SAM splits each job at release Average 28.0 30.6 36.7 8.3 63.4 74.4 0.3 Maximum 43.6 53.5 67.6 11.3 74.0 125.2 0.6 into a mandatory job, responsible for execution in the safety-critical area, and an optional job, responsible for execution in non-critical areas. When any mandatory job is present in the waiting queue, it is always executed first using the EDF algorithm. The distinction be- tween MandFirst and EDF-Slack arises from whether the execution of an optional job may interfere with the execution of a mandatory job. Specifically, the scheduling behavior of DNN-SAM and EDF-Slack operates as follows: • EDF-MandFirst in [10]: Any mandatory job in the waiting queue has a higher priority than optional jobs and is scheduled using EDF. If no mandatory jobs are in the queue, optional jobs are executed using EDF, ensuring that they do not interfere with the Fig. 4. Comparison for two tasks with the periods (equal to the relative deadlines) of execution of future release jobs of mandatory tasks. 180 ms and 270 ms. • EDF-Slack in [10]: Any mandatory job in the waiting queue has a higher priority than optional jobs and is scheduled using EDF. If no mandatory jobs are in the queue, optional jobs are executed the ground truth of objects in all frames with the tracking results using EDF, potentially interfering with the execution of future- obtained from the given techniques to measure accuracy. The release mandatory jobs, within the slack calculated from the job’s KITTI dataset consists solely of data captured from forward-facing runtime. cameras and does not utilize different cameras, meaning there • EDF-BE of CA-MOT: A job is not split and has three execu- is no overlap in the areas they cover. Additionally, as it does tion options for both detection and association. It is executed not assume simultaneous capture by each camera, there are no with the maximum workload option to avoid interfering with synchronization issues. CA-MOT aims to maximize the average the execution of future release jobs, but only when exactly one accuracy for the MOT tasks corresponding to all given cameras job is present in the waiting queue. The aging of detection and without missing any deadlines. This assumes that CA-MOT oper- association tasks is considered for accuracy maximization. ates independently of camera interdependencies, with all cameras • EDF-Slack of CA-MOT: A job is not split and has three execution receiving the same forward-facing camera feed. options for both detection and association. Regardless of the • Execution time measurement: To obtain the WCET of different number of jobs in the waiting queue, the job is executed with execution options for detection and association, we measured the the maximum workload option, potentially interfering with the execution time by iterating 1000 times for each sub-tasks with execution of future release jobs based on the slack calculated three different execution options of an MOT task and then took from its runtime. The aging of detection and association tasks is the largest value. We also measured the worst-case time required considered for accuracy maximization. for slack calculation and scheduling decisions such as Algorithms 1 and 2. Table 2 shows the measurement results. 5. Evaluation 5.2. Experiment result This section evaluates the effectiveness of CA-MOT in achieving R1 and R2 for multiple MOT tasks. We consider task sets in which schedulability is not guaranteed with the high-workload execution for detection and association 5.1. Experiment setting (i.e., 𝐶𝑖 (𝐻 , 𝐻)) for all tasks but is guaranteed with the minimum execution (𝐶𝑖 (𝐿, 𝐿)) according to Eq. (2). Note that the schedulability • Software: CA-MOT employs the tracking-by-detection of which with 𝐶𝑖 (𝑥, 𝑦) for 𝑥, 𝑦 ∈ {𝐿, 𝑀 , 𝐻} can be judged with Eq. (2) by the detector is one of the most popular detectors, YOLOv5 [14] substituting 𝐶𝑖 (𝐿, 𝐿) to 𝐶𝑖 (𝑥, 𝑦). To evaluate the effectiveness of CA- model, and tracker is StrongSORT [7]. We confirmed that other MOT we consider the following including a baseline and our two detectors (i.e. YOLOX, Faster-RCNN) exhibit a similar trend to proposed approaches. YOLOv5 in terms of MOTA and execution time, as shown in Fig. 7. For feature extraction conducted as a part of association, we used • Detection first (DF): non-preemptive EDF in which the execution OS-Net [10]. The YOLOv5 model was pretrained on the COCO option of all tasks 𝜏𝑖 ∈ 𝜏 is equally fixed to the rightmost Dataset [18], while OS-Net was pretrained on the MSMT Dataset one among {𝐶𝑖 (𝐿, 𝐿), 𝐶𝑖 (𝑀 , 𝐿), 𝐶𝑖 (𝐻 , 𝐿), 𝐶𝑖 (𝐻 , 𝑀), 𝐶𝑖 (𝐻 , 𝐻)} that [19]. The experimental environment is with Ubuntu 18.04.6 LTS, satisfies the schedulability condition in Eq. (2). CUDA 11.4, and PyTorch 1.12. • EDF-BE: EDF-BE of which task set passes the schedulability con- • Hardware: We consider the NVIDIA Jetson Xavier as a GPU- dition in Eq. (2), which is proposed in Section 4.2. enabled embedded board [20]. The NVIDIA Jetson Xavier features • EDF-Slack: EDF-Slack of which task set passes the schedulability a 64-bit 8-core CPU, 32 GB Memory, and 512-core Volta GPU. We condition in Eq. (2), which is proposed in Section 4.3. utilized the MAXN mode provided by the NVIDIA Jetson Xavier. • Dataset and performance metric: We used the KITTI Dataset Fig. 4 represents the tracking accuracy and the proportion of three [13], which contains data collected from autonomous vehicle execution options (i.e., 𝐿, 𝑀, and 𝐻) selected during detection and driving. To evaluate the accuracy of each region, we measured association for two tasks with different periods: 180 and 270 ms the MOTA [15] as the most well-known performance metric for (milliseconds). As shown in Fig. 4(a), for overall accuracy, EDF-BE and tracking accuracy for critical and entire regions. MOTA compares EDF-Slack achieve 20.2% and 26.6%, respectively, while DF achieves 8 D. Kang et al. Journal of Systems Architecture 160 (2025) 103349 Fig. 5. Comparison for four tasks with the same period (equal to the relative deadline) Fig. 6. Visualization on KITTI dataset for three tasks with the periods of 400 ms. (For of 400 ms. interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.) 13.4%, which demonstrate the effectiveness of slack utilization and balancing aging of detection and association in increasing tracking accuracy. We observe that the slack reclamation performed by Algo- rithm 1 in EDF-Slack is significantly more effective in achieving high tracking accuracy than in EDF-BE which has limitations in obtaining a substantial amount of slack. For critical accuracy, EDF-BE and EDF- Slack achieve much higher accuracies, which are 28.3% and 32.2%, respectively, compared to 15.4% of DF. Based on this observation, we can interpret that even though EDF-BE obtains a smaller amount Fig. 7. MOTA and execution time on other detectors. of slack compared to EDF-Slack, it efficiently performs tracking for the critical region with limited computing resources. On the other hand, EDF-Slack provides high tracking accuracy not only for the entire region but also for the safety-critical region, thanks to its efficient slack Additional experiments are conducted to ascertain if CA-MOT ex- reclamation. As seen in Fig. 4, EDF-Slack exhibits a significantly higher hibits comparable behavioral patterns across a range of detectors, proportion of high-workload execution and middle-workload execution including YOLOv5, which was evaluated previously. Fig. 7 displays for detection and association compared to other execution options. On the MOTA and execution time for various contemporary detectors, the other hand, EDF-BE shows a slight proportion of middle-workload analyzed according to their workload. Modern detectors are generally execution, while the majority of cases involve low-workload execution. classified into one-stage and two-stage categories based on their archi- Fig. 5 depicts the results of another experiment involving three tecture and further into anchor-free and anchor-based types, contingent different sets of tasks, with the number of tasks ranging from two on their use of predefined anchors for object detection. Our study to four, all having the same periods (i.e., 400 ms with a guaranteed incorporated YOLOv5, a standard one-stage anchor-based detector. We minimum execution 𝐶𝑖 (𝐿, 𝐿), but no guaranteed maximum execution also investigate the performance of the two-stage anchor-based detector 𝐶𝑖 (𝐻 , 𝐻) for 𝜏𝑖 ∈ 𝜏). In Fig. 5(a), the tracking accuracy of the evaluated Faster-RCNN [9] and the one-stage anchor-free detector YOLOX [21], approaches is shown as the number of tasks increases. For the case to verify the consistency of results. Faster-RCNN utilized ResNet-50 of two tasks, EDF-Slack achieves an overall accuracy of 41.8% and a as its backbone network, while YOLOX was configured with a small critical accuracy of 41.4%, while EDF-BE achieves an overall accuracy version model. Both models were trained using the COCO dataset. of 24.3% and a critical accuracy of 27.2%. In contrast, DF achieves Despite minor discrepancies in specific ratios, the results consistently lower accuracy, with an overall accuracy of 18.0% and a critical demonstrate that both MOTA and execution time escalate in conjunc- accuracy of 18.7%. As the number of tasks increases, both EDF-BE and tion with increasing workload, as shown in Figs. 7(a) and (b). The EDF-Slack experience a decrease in accuracy, but they still outperform runtime trend of YOLOX is particularly noteworthy, which closely DF in terms of tracking accuracy. Even with only four tasks, EDF- mirrors that of YOLOv5. This pattern indicates that similar outcomes BE yields lower overall accuracy than DF, as it can only detect part may be expected from other detectors akin to YOLOv5. of the image when selecting the high workload option. Nevertheless, by prioritizing computations in critical regions at low and medium 6. Related work workloads, EDF-BE attains higher critical accuracy than DF. Fig. 5(b) presents the distribution of execution options for EDF-BE and EDF-Slack The tracking-by-detection model is a commonly used method in when there are three tasks. Similar to Fig. 4, it is evident that both the MOT field. It has shown significant progress and enhanced perfor- EDF-BE and EDF-Slack allocate the workload between the detection and association steps in a balanced manner using the ages. Additionally, mance recently, largely thanks to the evolution of deep neural networks EDF-Slack can reclaim more slack compared to EDF-BE. (DNNs). A well-recognized model in this field, SORT (simple online Fig. 6 presents the tracking outcomes of a single task within a set and real-time tracking) [22], does its matching based mainly on where of three tasks, each with a period of 400 ms, comparing (a) the DF objects are located, using detection tools to achieve this. To push this algorithm and (b) EDF-Slack. In the visualization, each tracked object model further, DeepSORT [6] builds on the SORT model by adding a is represented by a unique color and ID within a bounding box, with DNN-based re-identification model. This allows for the extraction of ob- the symbol ‘‘#’’ indicating the frame number. In the DF scenario, each ject features. By adding this layer, DeepSORT utilizes both the object’s task executes 𝐶𝑖 (𝐻 , 𝐿), leading to insufficient computational resources location and its visual information, leading to a stronger performance. for proper association. This inadequacy results in DF’s failure to track Recent work in this area, such as Deep OC-SORT [23] and Strong- two objects within the safety-critical region in the 167th frame and SORT [7], is geared towards enhancing the accuracy of these models causes an ID switch from 4 to 10 in the subsequent 168th frame, even more, focusing especially on refining and improving the matching as illustrated in Fig. 6. Conversely, EDF-Slack leverages aging and algorithms used in these systems. However, it is critical to understand slack techniques to allocate sufficient computational resources for both that these approaches are mainly designed for situations where there detection and association tasks, enabling accurate tracking of all objects are plenty of computing resources. Therefore, they might struggle to in the safety-critical region. meet the timing needs in systems that are restricted in resources, like 9 D. Kang et al. Journal of Systems Architecture 160 (2025) 103349 the embedded systems in self-driving vehicles where resources may be Declaration of competing interest scarce. Considering self-driving vehicles, which are fundamentally systems The authors declare the following financial interests/personal rela- where safety is critical, even the smallest delays or slight drops in tionships which may be considered as potential competing interests: accuracy can lead to significant and potentially dangerous risks. Some Hyeongboo Baek reports financial support was provided by National research such as DNN-SAM [5] has tried to tackle these problems by Research Foundation of Korea. If there are other authors, they declare suggesting frameworks that concentrate specifically on safety-critical that they have no known competing financial interests or personal areas. These frameworks give priority to critical accuracy and use relationships that could have appeared to influence the work reported uncertainty handling to ensure the highest safety standards. However, in this paper. these research studies and their related approaches are mostly designed for multi-object detection systems and may not directly apply to or be Acknowledgments effective in multi-object tracking. Likewise, another study, RT-MOT [2], aims to maximize the overall accuracy of multi-object tracking and This work was supported by the National Research Foundation of ensure on-time execution, but it overlooks the importance of individual Korea (NRF) grant funded by the Korea government (MSIT) (RS-2023- objects in its approach. To address these limitations, our suggested 00250742, 2022R1A4A3018824, RS-2024-00438248). This work was framework, known as CA-MOT, aims to confront these challenges partly supported by the Institute of Information & Communications directly. By leveraging the unique traits of multi-object tracking in Technology Planning & Evaluation(IITP)-ITRC(Information Technol- safety-critical systems, CA-MOT ensures on-time execution and boosts ogy Research Center) grant funded by the Korea government(MSIT) tracking accuracy for objects that could potentially be dangerous to the (IITP-2025-RS-2023-00259061). system. It builds on previous work while addressing their weaknesses to create a safer and more efficient tracking system. Data availability Data will be made available on request. 7. Discussion A limitation of CA-MOT is its exclusive reliance on a single CPU and References GPU, which restricts scalability. A recent approach, Batch-MOT [3], addresses this limitation by processing input images from multiple [1] M. Yang, S. Wang, J. Bakita, T. Vu, F.D. Smith, J.H. Anderson, J.-M. Frahm, Re-thinking CNN frameworks for time-sensitive autonomous-driving applications: cameras through a shared queue, distributing CPU operations across Addressing an industrial challenge, in: Proceedings of IEEE Real-Time Technology multiple CPUs, and employing batch processing on a single GPU. How- and Applications Symposium, IEEE, 2019, pp. 305–317. ever, this approach may introduce additional communication overhead [2] D. Kang, S. Lee, H.S. Chwa, S.-H. Bae, C.M. Kang, J. Lee, H. Baek, RT-MOT: among CPUs, potentially determining its overall efficiency. The primary Confidence-aware real-time scheduling framework for multi-object tracking tasks, in: Proceedings of IEEE Real-Time Systems Symposium, IEEE, 2022, pp. 318–330. contribution of Batch-MOT lies in its online schedulability analysis, [3] D. Kang, S. Lee, C.-H. Hong, J. Lee, H. Baek, Batch-MOT: Batch-enabled real- which dynamically determines the maximum number of images that time scheduling for multi-object tracking tasks, IEEE Trans. Comput.-Aided Des. can be batch-processed without violating their deadlines. Nonethe- Integr. Circuits Syst. (2024). less, unlike CA-MOT, Batch-MOT lacks support for multiple execution [4] S. Liu, X. Fu, M. Wigness, P. David, S. Yao, L. Sha, T. Abdelzaher, Self-cueing real-time attention scheduling in criticality-aware visual machine perception, in: strategies during the association phase, resulting in suboptimal resource Proceedings of IEEE Real-Time Technology and Applications Symposium, IEEE, utilization for individual MOT tasks. Enhancing CA-MOT by incor- 2022, pp. 173–186. porating batch processing capabilities to address these shortcomings [5] W. Kang, S. Chung, J.Y. Kim, Y. Lee, K. Lee, J. Lee, K.G. Shin, H.S. Chwa, presents a promising avenue for future research. Furthermore, as high- DNN-SAM: Split-and-merge DNN execution for real-time object detection, in: Proceedings of IEEE Real-Time Technology and Applications Symposium, 2022, lighted in previous studies, deploying CA-MOT on real-world platforms, URL https://rtcl.dgist.ac.kr/index.php/publication-2/. such as the F1/10 autonomous driving platform [5], offers significant [6] N. Wojke, A. Bewley, D. Paulus, Simple online and realtime tracking with a potential for further investigation and practical validation. deep association metric, in: Proceedings of the IEEE International Conference on Image Processing, IEEE, 2017, pp. 3645–3649. [7] Y. Du, Y. Song, B. Yang, Y. Zhao, StrongSORT: Make deepsort great again, 2022, 8. Conclusion arXiv preprint arXiv:2202.13514. [8] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: Unified, real- time object detection, in: Proceedings of the IEEE/CVF Conference on Computer In this paper, we proposed, CA-MOT, a new criticality-aware MOT Vision and Pattern Recognition, 2016, pp. 779–788. execution and scheduling framework. Aiming at achieving critical- [9] R. Girshick, Fast R-CNN, in: Proceedings of the IEEE International Conference accuracy maximization and timing guarantee, CA-MOT first proposes on Computer Vision, 2015, pp. 1440–1448. a new system design to offer a control knob between tracking accuracy [10] K. Zhou, Y. Yang, A. Cavallaro, T. Xiang, Omni-scale feature learning for person re-identification, in: Proceedings of the IEEE International Conference and timing guarantee to efficiently utilize limited computing resources. on Computer Vision, 2019, pp. 3702–3712. Then, CA-MOT develops two scheduling algorithms to effectively uti- [11] Y. Zhang, P. Sun, Y. Jiang, D. Yu, F. Weng, Z. Yuan, P. Luo, W. Liu, X. lize the system design while using the notions of slack and aging of Wang, ByteTrack: Multi-object tracking by associating every detection box, in: detection and association. Using various task sets and real-world au- Proceedings of the European Conference on Computer Vision, Springer, 2022, pp. 1–21. tonomous driving data, we demonstrated that CA-MOT can obtain high [12] Y. Zhang, C. Wang, X. Wang, W. Zeng, W. Liu, FairMOT: On the fairness of tracking accuracy of entire and safety-critical regions while ensuring detection and re-identification in multiple object tracking, Int. J. Comput. Vis. the timely execution of all MOT tasks. 129 (11) (2021) 3069–3087. [13] A. Geiger, P. Lenz, R. Urtasun, Are we ready for autonomous driving? The KITTI vision benchmark suite, in: Proceedings of the IEEE/CVF Conference on CRediT authorship contribution statement Computer Vision and Pattern Recognition, 2012. [14] ultralytics, YOLOv5 [Online], 2022, Available: https://github.com/ultralytics/ yolov5. Donghwa Kang: Writing – original draft, Software, Methodology, [15] K. Bernardin, R. Stiefelhagen, Evaluating multiple object tracking performance: Formal analysis. Jinkyu Lee: Writing – review & editing, Valida- the clear mot metrics, EURASIP J. Image Video Process. 2008 (2008) 1–10. tion, Formal analysis. Hyeongboo Baek: Writing – review & editing, [16] G. Welch, G. Bishop, et al., An introduction to the Kalman filter, ACM SIGGRAPH Supervision, Funding acquisition, Formal analysis, Conceptualization. (1995). 10 D. Kang et al. Journal of Systems Architecture 160 (2025) 103349 [17] T.P. Baker, A stack-based resource allocation policy for realtime processes, in: Jinkyu Lee is an associate professor in the Department Proceedings of IEEE Real-Time Systems Symposium, IEEE, 1990, pp. 191–200. of Computer Science and Engineering at Sungkyunkwan [18] T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, University (SKKU), South Korea, where he joined in 2014. C.L. Zitnick, Microsoft COCO: Common objects in context, in: Proceedings of the He received the BS, MS, and Ph.D. degrees in computer European Conference on Computer Vision, Springer, 2014, pp. 740–755. science from the Korea Advanced Institute of Science and [19] L. Wei, S. Zhang, W. Gao, Q. Tian, Person transfer gan to bridge domain gap Technology (KAIST), South Korea, in 2004, 2006, and 2011, for person re-identification, in: Proceedings of the IEEE/CVF Conference on respectively. He has been a research fellow/visiting scholar Computer Vision and Pattern Recognition, 2018, pp. 79–88. in the Department of Electrical Engineering and Computer [20] NVIDIA, NVIDIA Xavier Developer Kit. [Online], 2022, Available: https://www. Science, University of Michigan until 2014. His research nvidia.com/en-us/autonomous-machines/embedded-systems/jetson-agx-xavier. interests include system design and analysis with timing [21] Z. Ge, S. Liu, F. Wang, Z. Li, J. Sun, Yolox: Exceeding yolo series in 2021, 2021, guarantees, QoS support, and resource management in real- arXiv preprint arXiv:2107.08430. time embedded systems and cyber–physical systems. He won [22] A. Bewley, Z. Ge, L. Ott, F. Ramos, B. Upcroft, Simple online and realtime track- the best student paper award from the 17th IEEE Real-Time ing, in: Proceedings of the IEEE International Conference on Image Processing, and Embedded Technology and Applications Symposium IEEE, 2016, pp. 3464–3468. (RTAS) in 2011 and the Best Paper Award from the 33rd [23] G. Maggiolino, A. Ahmad, J. Cao, K. Kitani, Deep oc-sort: Multi-pedestrian IEEE Real-Time Systems Symposium (RTSS) in 2012. tracking by adaptive re-identification, 2023, arXiv preprint arXiv:2302.11813. Hyeongboo Baek is an associate professor in the Depart- Dongwha Kang is a Ph.D. course student in the School ment of Artificial Intelligence, University of Seoul (UOS), of Computing, Korea Advanced Institute of Science and South Korea. He received the BS degree in Computer Science Technology (KAIST), South Korea. He received a BS and and Engineering from Konkuk University, South Korea, in MS degree in computer science from Incheon National Uni- 2010 and the MS and Ph.D. degrees in Computer Science versity (INU) in 2022 and 2024 respectively. His research from KAIST, South Korea, in 2012 and 2016, respectively. interests include artificial intelligence, autonomous systems, His research interests include cyber–physical systems, real- and real-time embedded systems. time embedded systems, and system security. He won the best paper award from the 33rd IEEE Real-Time Systems Symposium (RTSS) in 2012. 11