Files
opaque-lattice/papers_txt/An-autonomous-deep-reinforcement-learning-based-approac_2026_Computer-Standa.txt
2026-01-06 12:49:26 -07:00

1402 lines
166 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Computer Standards & Interfaces 97 (2026) 104098
Contents lists available at ScienceDirect
Computer Standards & Interfaces
journal homepage: www.elsevier.com/locate/csi
An autonomous deep reinforcement learning-based approach for memory
configuration in serverless computing
Zahra Shojaee Rad , Mostafa Ghobaei-Arani * , Reza Ahsan
Department of Computer Engineering, Qo.C., Islamic Azad University, Qom, Iran
A R T I C L E I N F O A B S T R A C T
Keywords: Serverless computing has become very popular in recent years due to its cost savings and flexibility. Serverless
Serverless computing computing is a cloud computing model that allows developers to create and deploy code without having to
Memory configuration manage the infrastructure. It has been embraced due to its scalability, cost savings, and ease of use. However,
Deep reinforcement learning
memory configuration is one of the important challenges in serverless computing due to the transient nature of
Autonomous computing
Function-as-a-service
serverless functions, which are stateless and ephemeral. In this paper, we propose an autonomous approach using
deep reinforcement learning and a reward mechanism for memory configuration called Auto Opt Mem. In the
Auto Opt Mem mechanism, the system learns to allocate memory resources to serverless functions in a way that
balances overall performance and minimizes wastage of resources. Finally, we validate the effectiveness of our
solution, the findings revealed that Auto Opt Mem mechanism enhances resource utilization, reduces operation
cost and latency, and improves quality of service (QoS). Our experiments demonstrate that Auto Opt Mem
mechanism achieves 16.8 % lower latency compared to static allocation, 11.8 % cost reduction, and 6.8 %
improve in QoS, resource utilization, and efficient memory allocation compared with base-line methods.
1. Introduction serverless functions in production use the default memory size, indi­
cating that developers often overlook the importance of resource size
Serverless computing has emerged as an extended cloud computing [9]. Traditional memory configuration methods are often manual set­
model that offers many advantages in flexibility, scalability and cost tings or static allocation, which may lead to inefficiencies such as
efficiency [1]. By separating the management of the underlying infra­ overprovisioning or underutilization. These inefficiencies can lead to
structure, developers can focus on writing and deploying code without increased costs or decreased performance and affect the effectiveness of
worrying about server provisioning or maintenance. There has been a lot the serverless function. By employing deep learning models, memory
of progress in various areas related to serverless computing [2]. One configuration can be automated, leading to an efficient solution. Deep
aspect is Function as a Service (FaaS) that increasingly been associated learning models analyze historical data to dynamically predict optimal
with a variety of applications, including video streaming platforms [3], memory settings, adapt to varying workloads, and minimize latency and
multimedia processing [4], Continuous Integration/Continuous cost. This approach uses the ability of deep learning to identify complex
Deployment (CI/CD) pipelines [5], Artificial Intelligence/Machine patterns and relationships in data, enabling more accurate and efficient
Learning (AI/ML) inference tasks [6], and query processing for Large resource management.
Language Models (LLMs) [7]. FaaS is a serverless cloud computing Recent research shows the importance of intelligent and autonomous
model that allows developers to run small, manageable services in iso­ systems in different domains. For instance, Arduino-based IoT automa­
lated environments called function instances [8]. tion systems have shown how lightweight and adaptive architectures
Despite these advantages, memory configuration in serverless envi­ can improve efficiency and minimize manual intervention in con­
ronments is a complex challenge due to the transient and stateless nature strained environments [10]. Likewise, autonomous AI frameworks for
of serverless functions. Choosing the right amount of memory or fraud detection in the Dark Web demonstrate the ability of self-learning
resource size is important and a challenge because it can result in faster mechanisms to adapt to dynamic and unpredictable conditions [11].
execution times and lower costs. A recent survey found that 47 % of These advances further motivate the need for AI-driven autonomic
* Corresponding author.
E-mail address: mo.ghobaei@iau.ac.ir (M. Ghobaei-Arani).
https://doi.org/10.1016/j.csi.2025.104098
Received 2 May 2025; Received in revised form 16 November 2025; Accepted 17 November 2025
Available online 19 November 2025
0920-5489/© 2025 Elsevier B.V. All rights are reserved, including those for text and data mining, AI training, and similar technologies.
Z. Shojaee Rad et al. Computer Standards & Interfaces 97 (2026) 104098
solutions, such as our proposed Auto Opt Mem framework for serverless 1.4. Paper organization
memory configuration.
This paper is organized into several sections: Section 2 reviews
1.1. Research gap and motivation related memory configuration methods. Section 3 offers background
information. Section 4 presents a comprehensive explanation of the
Previous approaches are often static or limited to a specific platform. proposed solution. Section 5 assesses and discusses the experimental
The need for real-time adaptability to changing workloads is not met, results. Section 6 presents the discussion. Section 7 presents the con­
and they have relied solely on statistical modeling and lacked the ability clusions with our findings and outlines future research directions.
to adapt in real time. The direct relationship between cost, latency, and
QoS has rarely been considered in a comprehensive framework. Our 2. Related works
work addresses these research gaps by introducing Auto Opt Mem based
on MAPE loop and DRL. This section discusses memory configuration approaches in server­
The motivation for this study is that many functions still use default less computing. These approaches are categorized into three main
memory values, resulting in wasted resources, increased costs, and groups: machine learning-based approaches, heuristic-based ap­
reduced quality of service (QoS). For example, 47 % of users rely on proaches, and framework-based approaches.
default memory settings, which emphasizes the importance of intelli­
gent and adaptive optimization [9]. Addressing this gap is essential to 2.1. Machine learning-based approaches
improve performance and cost-effectiveness in serverless environments.
Simon Eismann et al. [17] have presented an approach called
"Sizeless" for predicting the optimal resource size for serverless functions
1.2. Our approach in cloud computing, based on monitoring data from a single memory
size. It highlights the challenges developers face in selecting resource
In this paper, we propose an autonomous memory configuration with sizes and shows that the method can achieve an average prediction error
deep learning model to predict memory setting based on a combination of 15.3 %, optimizing memory allocation for 79 % of functions, resulting
of the concept of the autonomic computing and the deep reinforcement in a 39.7 % speedup and a 2.6 % cost reduction. Anshul Jindal et al. [18]
learning (DRL) with the aim of increasing performance and cost- have presented a tool called FnCapacitor for estimating the Function
effectiveness. To realize autonomic computing, IBM has introduced a Capacity (FC) of Function-as-a-Service (FaaS) functions in serverless
reference framework for autonomic control loops known as the MAPE computing environments. It overcomes performance challenges due to
(Monitor, Analyze, Plan, Execute) loop [12,13]. This control MAPE loop system abstractions and dependencies between functions. Through load
resembles the general agent model put forth by Russell and Norvig [14], testing and modeling, FnCapacitor provides accurate FC predictions
where an intelligent agent observes its surroundings through sensors using statistical methods and deep learning, demonstrating effectiveness
and utilizes these observations to decide on actions to take within that on platforms like AWS Lambda and Google Cloud Functions. Djob
environment. The proposed approach follows the control MAPE loop, Mvondo et al. [19] have presented OFC, an in-memory caching system
which consists of four phases: monitoring (M), analysis (A), planning designed for Functions as a Service (FaaS) platforms to improve per­
(P), and execution (E). First in the monitoring phase, the system ob­ formance by reducing latency during data access. It leverages machine
serves and collects the current state of the environment (memory in learning to predict memory requirements for function invocations, uti­
serverless functions). In the analysis phase, the agent which is a deep lizing otherwise wasted memory from over-provisioning and idle sand­
neural network analyzes the observed state and updates its policy based boxes. OFC demonstrates significant execution time improvements for
on it and the reward received. In the planning phase, the agent schedules both single-stage and pipelined functions, enhancing efficiency without
an action (i.e., memory configuration) based on the learned policy. In requiring changes to existing application code. Myung-Hyun Kim et al.
the execution phase, the scheduled action is applied to the environment. [20] have introduced ShmFaas, a serverless platform designed to
We utilize Deep Reinforcement Learning (DRL) [15,16] as a improve memory utilization for deep neural network (DNN) inference
decision-making tool that leverages the predicted outcomes from the by sharing models in-memory across containers. They address data
analysis phase to determine the best memory configuration during the duplication and cold start issues, particularly in resource-constrained
planning phase. Reinforcement Learning (RL) is a self-learning system edge cloud environments. Experimental results show that ShmFaas re­
that enhances its effectiveness by continuously interacting with the duces memory usage by over 29.4 % compared to common systems,
cloud environment. while maintaining negligible latency overhead and enhancing
throughput. Siddharth Agarwal et al. [21] have presented MemFigLess,
1.3. Main contributions an input-aware memory allocation framework for serverless computing
functions, designed to resource usage and reduce costs. Using a
The main contributions of this research can be summarized as multi-output Random Forest Regression model, it correlates the input
follows: features of the function with memory requirements, leading to accurate
memory configuration. The evaluation shows that MemFigLess can
• We propose an autonomic method using deep reinforcement learning significantly decrease resource allocation and save on runtime costs.
method to predict memory configuration, and this method operates Finally, Table 1 shows a comparison of machine learning-based ap­
based on a reward mechanism. proaches from related studies.
• We designed a multi-objective reward normalization mechanism that
simultaneously balances latency, cost, utilization, and QoS. 2.2. Heuristic-based approaches
• We integrated the MAPE-K control loop with deep reinforcement
learning (DRL) to enable closed-loop online adaptation. Goor-Safarian et al. [22], have present SLAM, a tool for optimizing
• Auto Opt Mem supports real-time continuous learning across varying memory settings for serverless applications consisting of multiple
workloads, which clearly differentiates it from static or offline ML- Function-as-a-Service (FaaS) functions. It addresses the issues in
based predictors. balancing cost and performance while meeting Service Level Objectives
• Experiments validate the effectiveness of the proposed method and (SLOs). By utilizing distributed tracing, SLAM estimates execution times
demonstrate performance improvements in metrics such as latency under various memory settings and identifies optimal configurations.
and cost. Robert Cordingly et al. [23] presented a method called CPU Time
2
Z. Shojaee Rad et al. Computer Standards & Interfaces 97 (2026) 104098
Table 1
Comparison of machine learning-based approaches.
Ref Metric Method Advantage Disadvantage Tools
[17] • Execution time • Multi-target regression model • Reduces execution time • Limited to specific cloud • AWS Lambda
providers
• Resource consumption • Monitoring data • Decrease cost
• Performance overhead • Node.js
• Without requiring performance
test
[18] • Function Capacity (FC) • Statistical and deep learning • Accurate FC predictions • Limited to specific FaaS • Python
approaches platforms
• FnCapacitor
• Google Cloud Functions
(GCF)
• AWS Lambda
[19] • Function Capacity (FC) • OFC tool uses machine learning • Reduction in execution time Cost- • Overhead from cache • Python
effective management
• Utilizes idle memory • Transparent • Java
• OFC
• AWS
[20] • Efficiency of memory • OFC system shares DNN models • Reduces memory usage • Complexity • Python
usage
• Minimizes cold start delays • ShmFaas
• Minimal code changes • Kubernetes
[21] • Memory utilization • Uses input-aware Random Forest • Reduce memory allocation • Limited to specific platforms • AWS Lambda
Regression
• Reduce costs • Overhead from monitoring • Python
• AWS CloudWatch
Accounting Memory Selection (CPU-TAMS) to optimize memory con­ characteristics, such as its direct load/store access, can increase per­
figurations for serverless Function-as-a-Service (FaaS) platforms. formance but also lead to bottlenecks when multiple threads concur­
CPU-TAMS uses CPU time accounting and regression modeling methods rently write to it. They propose a PM-aware scheduling system for
to provide recommendations that reduce execution time and costs. serverless workloads that optimizes job completion time by managing
Tetiana Zubko et al. [24] presented MAFF (Memory Allocation Frame­ concurrent access and improving efficiency while ensuring fairness
work for FaaS functions), which is a framework to optimize memory among applications. Meenakshi Sethunath et al. [30] have proposed a
allocation for serverless functions automatically. MAFF adapts memory joint function warm-up and request routing scheme for serverless
settings based on function requirements and employs various algorithms computing that optimally utilizes both edge and cloud resources. It
to minimize costs and execution duration. The framework was tested on addresses like high latency and cold-start delays by maximizing the hit
AWS Lambda, demonstrating improved performance compared to ratio of requests. It reduced latency, by considering memory and budget
existing memory optimization tools. Josef Spillner [25] discussed constraints. Anisha Kumari, et al. [31] have proposed a performance
resource management for serverless functions, focusing on memory model for optimizing resource allocation in serverless applications,
tracking, profiling, and automatic tuning. The author outlines the issues addressing like cost estimation and performance evaluation. It in­
that developers face in determining memory allocation due to troduces a greedy optimization algorithm to improve end-to-end
coarse-grained configurations from cloud providers. Also proposes tools response time while considering budget constraints. They utilize serv­
to measure memory consumption and dynamically adjust allocations to erless applications on AWS to analyze the trade-offs between perfor­
reduce waste and costs, and improve performance in mance and cost, demonstrating the models effectiveness in optimal
Function-as-a-Service (FaaS) environments. resource configurations. Finally, Table 2 shows a comparison of
Zengpeng Li et al. [26] have presented algorithms for optimizing heuristic-based approaches from related studies.
memory configuration in serverless workflow applications, specifically a
heuristic urgency-based algorithm (UWC) and a meta-heuristic hybrid 2.3. Framework-based approaches
algorithm (BPSO). These algorithms aim to balance execution time and
cost for serverless applications, to solve the challenges posed by memory Anjo Vahldiek-Oberwagner et al. [32] have proposed a Memory-Safe
allocation and performance modeling. Andrea Sabioni et al. [27] have Software and Hardware Architecture (MeSHwA) to enhance serverless
introduced a shared memory approach for function chaining on serv­ computing and microservices by leveraging memory-safe languages like
erless platforms and proposed a container-based architecture that in­ Rust and WebAssembly. It aims to reduce infrastructure overheads
creases the efficiency of function composition on the same host. By using associated with cloud architectures while improving performance and
a message-oriented middleware that operates over shared memory, this security through a unified runtime environment that isolates services
approach reduces response latency and improves resource utilization. effectively. Divyanshu Saxena, et al. [33] have presented Medes, a
The results show performance improvements in request completion serverless computing framework that improves performance and
rates and reduced latency during function execution. Aakanksha Saha resource efficiency by introducing a deduplicated sandbox state. This
et al. [28] have presented EMARS, an efficient resource management state reduces memory usage by removing redundant memory chunks
system designed for serverless cloud computing, focusing on optimizing across sandboxes, allowing for faster function startups and improved
memory allocation for containers. Built on the OpenLambda platform, management of warm and cold states. Experiments show that Medes
EMARS uses predictive models based on application workloads to adjust under memory pressure can reduce end-to-end latency and cold start. Ji
memory limits dynamically, enhancing resource utilization and Li, et al. [34] designed TETRIS, a memory-efficient serverless platform
reducing latency. Experiments demonstrate that tailored memory set­ for deep learning inference. It addresses the memory overconsumption
tings improve performance in serverless functions. Amit Samanta, et al. in serverless systems by implementing tensor sharing and runtime
[29] have discussed the issues and opportunities of integrating persis­ optimization. It reduces memory usage while increasing function den­
tent memory (PM) into serverless computing. It shows how PMs unique sity. TETRIS automates memory sharing and instance scheduling,
3
Z. Shojaee Rad et al. Computer Standards & Interfaces 97 (2026) 104098
Table 2
Comparison of heuristic-based approaches.
Ref Metric Method Advantage Disadvantage Tools
[22] • Memory configuration effectiveness • Distributed tracing • Balances cost and • Limited to specific • Python
based on Service Level Objectives (SLOs) performance platforms
• Max-heap-based optimization algorithm • AWS
• SLAM
[23] • Efficiency • CPU time accounting • Reduces runtime • Limited to specific FaaS • Python
platforms
• Regression modeling • Reduces cost • AWS
• GCF
[24] • Effectiveness for FaaS functions • Algorithms Linear, Binary, Gradient • Lower cost • Require specific function • Python
Descent for self-adaptive memory profiling
optimization
• Faster execution • AWS
[25] • Efficiency • Utilizes memory tracing, profiling, and • Reduce cost • Requires extensive • AWS
autotuning tools profiling data
• Cost optimization • Improve • Functracer
performance
• Autoscaling • Autotuner
resource
• costcalculator
[26] • Efficiency • Heuristic (UWC) and meta-heuristic • Time-cost tradeoff • Complexity • AWS
(BPSO) algorithms
• Optimal workflow • Computational overhead • Python
• UWC
• BPSO
[27] • Response latency • Shared-memory, using a message-oriented • Improve request • Limited to co-located • Message-oriented
middleware completion functions middleware
• Resource usage • Improve rates • Complexity in managing
response time shared memory
• Optimized resource
usage
[28] • Memory allocation efficiency • Workload-based model • Optimizes resource • Complexity • Open Lambda
usage
• Latency • Memory-based model • Reduce latency • Python
[29] • Throughput • Using performance modeling and • Improve • Complexity • Open FaaS
admission control for concurrent access throughput
• Job completion time (JCT) • Reduce latency • Intel Optane
DCPMM
[30] • Latency reduction • Joint function warm-up • Reduces cold-start • Complexity • AWS
latency
• Request hit ratio • Routing for edge and cloud collaboration • Optimize • Dependent on accurate • Azure Function
performance profiling
[31] • End to end response time • Greedy optimization algorithm • Reduce latency • Complexity • Amazon Web
Service
• Cost • Reduce cost • Dependent on accurate • AWS
profiling
• handles cold start
delay
providing efficient resource utilization without compromising environments, for Function-as-a-Service (FaaS) models using microVMs.
performance. It addresses the issues of memory elasticity, during scaling down, by
Dmitrii Ustiugov, et al. [35] have discussed cold-start latency in segregating memory allocations for individual function instances,
serverless computing and introduces vHive, an open-source framework thereby enabling rapid and efficient reclamation without the overhead
for experimentation. It shows the inefficiencies of snapshot-based of page migrations. HotMem improves memory management perfor­
function invocations, which can result in high execution times due to mance, maintaining low latency for function execution. Finally, Table 3
frequent page faults. They propose REAP, that prefetches memory pages shows a comparison of framework-based approaches from related
to reduce cold-start delays by 3.7 times, improving performance of studies.
serverless functions. Ao Wang, et al. [36] have presented INFINICACHE,
an innovative in-memory object caching system leveraging ephemeral 3. Background
serverless functions to provide a cost-effective solution for large object
caching in web applications. It shows the systems ability to achieve cost In this section, we explain the concepts of serverless computing and
savings while maintaining high data availability and performance then provide explanations about memory configuration in serverless
through techniques like erasure coding and intelligent resource man­ computing.
agement. Anurag Khandelwal et al. [37] have presented Jiffy, an elastic
far-memory system for serverless analytics. It overcomes the limitations 3.1. Serverless computing
of existing memory allocation systems by enabling fine-grained,
block-level resource allocation, allowing jobs to meet their real-time Serverless computing enables developers to concentrate on coding
memory needs. Jiffy minimizes performance degradation and resource without the necessity of managing or provisioning servers, which is why
underutilization by dynamically managing memory for individual tasks. its termed "serverless." Serverless computing provides an efficient and
Orestis Lagkas Nikolos, et al. [38] have introduced HotMem, a mecha­ scalable solution for running programs. Its ease of management and
nism designed to enhance memory reclamation in serverless computing lightweight features have made it popular as an implementation model
4
Z. Shojaee Rad et al. Computer Standards & Interfaces 97 (2026) 104098
Table 3
Comparison of framework-based approaches.
Ref Metric Method Advantage Disadvantage Tools
[32] • Performance and • MeSHwA, a memory-safe software and hardware • Increase security and • Complexity • Python
resource efficiency architecture performance
• Rust
• Resource sharing • Wasm
[33] • End-to-end latency • Medes, a framework utilizing memory deduplication • Reduce cold start • Complexity • AWS
to create a new deduplicated sandbox Lambda
• Memory usage • Increase flexibility • Overhead from deduplication • Python
• Optimal memory • Open Whisk
• Open FaaS
• Function
Bench
• CRIU
[34] • Function startup latency • Runtime sharing to reduce memory consumption • Memory savings • Complexity • Open FaaS
• Reduce cold start • Tensor management overhead • TensorFlow
[35] • Cold-start latency • REAP, a mechanism that records and prefetches a • Reduce cold-start
functions working set latency
• Overhead the first • Python s
invocation
• Memory efficiency • Kubernete
[36] • Latency of function • INFINICACHE, caching through tensor sharing and • Cost saving • Limited to large object caching • AWS
invocation erasure coding Lambda
• Data availability • Relying on the limitations of
serverless architecture
[37] • Execution time • Jiffy, an elastic far-memory • Improvement execution • Complexity • Amazon EC2
time
• Memory utilization • Increase resource • Relies on specific serverless • Python
utilization architecture
• C++
• Java
[38] • Memory speed • HotMem, a memory management framework with • Faster reclaim memory • Challenges in memory • Open whisk
rapid reclamation of hotplugged memory management
• Tail latency • Azure
in cloud computing [39]. Serverless computing is an abstraction of cloud when there are fewer concurrent users. This elasticity transforms
computing infrastructure. Serverless computing is a cloud computing serverless computing models into a pay-as-value billing model [39].
model in which a cloud provider or a third-party vendor manages the • Efficiency and performance: Developers do not need to perform
companys servers. The company does not need to purchase, install, complex tasks like multi-threading, HTTP requests, etc. FaaS forces
host, and manage the servers. Instead, the cloud administrator provides developers to focus on building the application rather than config­
all of these services. uring it.
Serverless computing, also known as Function-as-a-Service (FaaS), • Programming languages: Serverless computing supports many pro­
ensures that the code exposed to the developer consists of simple, event- gramming languages, including JavaScript, Java, Python, Go, C#,
driven functions. As a result, developers can focus more on writing code and Swift [44]. This versatility allows developers to choose the
and delivering innovative solutions, without the hassle of creating test language that best suits their project needs and expertise, increasing
environments and provisioning and managing servers for web-based productivity and enabling rapid application development.
applications [40]. FaaS and the term “serverless” can be used inter­
changeably, with serverless computing being FaaS. This is because the Challenges of Serverless Computing:
FaaS platform automatically configures and maintains the context of the
functions and connects them to cloud services without the need for • Vendor lock-in: Vendors typically use proprietary technologies to
developers to provide a server [41,42]. enable their serverless services. This can create problems for users
Features of Serverless Computing: who want to migrate their workloads to another platform. When
migrating to another provider, changes to the code and application
• Pay-per-use: In serverless computing, users pay only for the time architecture are inevitable [45].
their code uses dedicated CPU and storage. Pay-per-use pricing • Cold start latency: Serverless services can experience a latency
models reduce costs. But in cloud services, users pay for over­ known as “cold start.” When the service is first started, it takes some
provisioning of resources like storage and CPU time, which often sit time for the service to respond. The reason for this is the initial
idle [39,43]. configuration of the cloud service provider and resource allocation
• Speed: Teams can move from idea to market more quickly because and initialization of the infrastructure. This initial delay can be a
they can focus entirely on coding, testing, and iterating without the concern in systems that respond to many requests per second.
operational costs and server management. There is no need to update Methods and techniques are important for mitigating the cold start
fundamental infrastructure like operating systems or other software problem [4648].
patches, allowing teams to concentrate on building the best possible • Debugging complexity: Debugging serverless functions is difficult
features without worrying about the underlying infrastructure and due to their transient nature. The reason for this problem is that
resources. Serverless functions typically do not maintain the state of previous
• Scalability and elasticity: The cloud vendor is responsible for auto­ calls (stateless). Also, serverless functions are stateless by design,
matically scaling up capabilities and technologies to meet customer which can complicate application state management. Also, reports
demands. Serverless functions should automatically scale down on serverless function calls should be sent to the developer. These
5
Z. Shojaee Rad et al. Computer Standards & Interfaces 97 (2026) 104098
reports should include detailed stack traces so that developers can 3.3. Deep reinforcement learning
identify the cause of an error. Stack tracing is currently not available
for serverless computing, meaning developers cannot easily identify Neural networks were combined with reinforcement learning for the
the cause of an error [4951]. first time in 1991 when Gerald Tsauro used reinforcement learning to
• Architectural complexity: A serverless application may consist of train a neural network to play backgammon at the master level [54].
multiple functions. The more functions there are, the more complex Deep reinforcement learning (DRL) is a combination of reinforcement
the architecture becomes. This is because each function must be learning and deep neural networks. In reinforcement learning, an agent,
properly linked to other functions and services. Also, managing this while interacting with its environment learns a policy incrementally that
large number of functions and services can be difficult [52]. enables it to maximize long-term rewards. This approach enables the
• Long-running: Serverless computing with long-running functions learning of much more complicated policies that are suitable for
executes the function in a limited and short execution time, while high-dimensional problems by combining with deep neural networks.
some tasks may require a long execution time. These functions do not Fig. 1 shows the elements in a reinforcement learning model.
support long-running execution because these functions are stateless, Deep reinforcement learning is used in various fields such as com­
meaning that if the function is stopped, it cannot be resumed [53]. puter games, robotics, resource management in distributed systems, and
performance optimization of cloud computing systems. Deep learning
3.2. Memory configuration techniques can be used in memory configuration in serverless
computing. And predicted the amount of memory required to run a
Memory configuration in serverless computing is an important serverless application.
aspect that influences application performance, resource efficiency, and Serverless computing is a computing model in which the cloud ser­
cost management. Understanding how to optimally allocate memory for vice provider manages the infrastructure and server resources. A
serverless functions is important for developers who want to maximize developer just needs to write his code and upload it onto the serverless
the benefits of serverless computing [40]. In serverless environments, platform. The advantages of serverless computing include many things:
developers deploy small units of code, known as functions, which are automatic scalability, cost per resource, ease of management. Deep
executed in response to specific events. Each function operates in a learning can be used to improve memory configuration in serverless
stateless manner and is allocated a certain amount of memory at run­ computing. Deep learning can automatically identify memory usage
time. The memory configuration affects several factors: patterns in applications and allocate the required memory based on that.
Benefits of using deep learning for automatic memory configuration in
• Performance: The memory allocated for a function will determine serverless computing applications:
execution time. The more the memory, the faster, as it allows the
CPU to perform better and cold start latency to reduce. But too less • Automation: With deep learning, the technology can easily identify
memory can make it very slow and cause downtime. patterns concerning memory usage in an application and automate
• Cost efficiency: The pricing of serverless computing is pay-per-use; the memory needed for allocation. Consequently, this will reduce
the costs are determined by the amount of resources utilized in time spent on memory configuration.
execution. Memory configuration can help prevent over-allocation of • Optimization: Deep learning can learn how to optimize memory
memory, which leads to high costs. However, a lack of memory will usage by managing how memory is allocated. It helps to decrease
lead to performance issues that need to be balanced [40]. computing costs and enhances the performance of applications.
• Scalability: Memory configuration is performed in such a way that • Flexibility: Deep learning can learn from changes in memory usage
serverless applications will be able to scale seamlessly according to patterns. It can help in the improvement of an application over time.
variable workloads. Since the demand fluctuates, dynamically allo­
cating memory helps achieve good performance without incurring Deep learning to automate the configuration of memory in serverless
excessive costs. computing faces some challenges. First, there is a need for training data.
Deep learning requires enough training data in order to identify patterns
Memory configuration optimization and automated, data-driven in memory usage. Another challenge is the complexity of deep learning,
approaches to memory management in serverless computing increase which also requires a lot of time to learn. However, using deep learning
performance and scalability, and help save costs. These methods can for automating the configuration of memory in serverless computing
allow developers to minimize the challenges of manual memory offers numerous benefits. It will improve memory configuration tech­
configuration. niques and reduce computational costs.
Fig. 1. Elements in a reinforcement learning model.
6
Z. Shojaee Rad et al. Computer Standards & Interfaces 97 (2026) 104098
3.4. Autonomic computing utilization of these functions. Auto Opt Mem utilizes Deep Reinforce­
ment Learning (DRL), a branch of artificial intelligence, to learn an
The MAPE-K model provides a framework for the management of optimal memory configuration policy. DRL enables the system to learn
autonomous and self-adaptive systems [5557]. The model comprises from experience and make decisions based on a reward mechanism [15,
five major components: Monitoring, Analysis, Planning, Execution, and 16]. In this regard, Auto Opt Mem learns the memory resource allocation
Knowledge Management, illustrated in Fig. 2. to SFs for maximum performance while minimizing waste of resources.
Monitoring is where data is gathered on an ongoing basis to measure By employing DRL, Auto Opt Mem takes into account resource
the systems current state and functionality relative to predetermined constraints in the serverless environment, resource requirements of
goals. The subsequent Analysis stage examines this data to bring to light functions, energy consumption, latency, and deployment costs. The
any differences between the present condition and desired outcomes, system learns the optimal memory configuration policy through a
giving the information necessary for adjustment. After problem detec­ training process that involves interacting with the environment and
tion, the Planning stage creates schemes to amend these disparities, receiving feedback in the form of rewards. The automatic memory
specifying how the system needs to modify its behavior. The Execution configuration learning approach in Auto Opt Mem enables the system to
stage then enacts these strategies, modifying the systems behavior to dynamically adapt to changing conditions and optimize memory
attain the desired results. Knowledge Management serves as a repository resource allocation based on the specific needs of each SF. This approach
of knowledge for the important information shared among the various contributes to efficient resource utilization and improved performance
stages. Overall, the MAPE-K model offers a feedback loop that allows in serverless computing environments. This work proposes a framework
autonomous systems to adapt to new situations to ensure optimum called Auto Opt Mem for automatic optimal memory that uses a deep
performance by constantly monitoring and adjusting. This model is learning agent to automatically optimize the memory allocated to each
required to build resilient and efficient autonomous systems. In addi­ serverless function with respect to computational resources. The pri­
tion, autonomous control systems greatly improve the efficiency and mary goal of Auto Opt Mem, utilizing Deep Reinforcement Learning
reliability of industrial processes by minimizing human error and (DRL), is to optimize memory allocation for serverless functions. In
ensuring consistent performance. Autonomous control systems are other words, the main goal is to dynamically adjust the memory con­
closed-loop controls that are preprogrammed to operate autonomously figurations to become highly performant, cost-effective, and efficient
without the intervention of an operator, and they ensure predictable regarding resource utilization. Indeed, it seeks to optimize memory for
results in complex industrial settings. Therefore, the integration of serverless functions to minimize costs and reduce latency while main­
automated control systems and sophisticated monitoring techniques taining or improving Quality of Service (QoS).
makes operations simpler and more reliable, and industrial processes
become a necessity in contemporary manufacturing plants [58]. 4.1.1. Key components
This section describes the elements of the Auto Opt Mem framework
4. Proposed approach that are essential to the automating memory configuration process.
Environment
In this section, we will explain our proposed approach in more detail. The environment is the platform of serverless, such as AWS Lambda
First, we will introduce the framework that utilizes machine learning for or Google Cloud Functions where a set of functions are deployed and
memory configuration in serverless computing, followed by the pre­ executed within. It reflects resource usage information, performance
sentation of a formula and algorithm. Finally, we describe the autono­ metrics, and all other relevant status information.
mous memory configuration using deep learning to predict memory Agent
setting for serverless computing. The agent does the job of making decisions about the memory size to
be assigned to each function. It possesses one DRL model that tries to
4.1. Proposed framework learn through interactions with the environment.
State
In the proposed solution, Auto Opt Mem introduces an autonomous The state St at time t includes:
memory configuration approach based on learning for serverless
computing. The goal of Auto Opt Mem is to address the challenge of • Current memory allocation for each function.
efficiently distributing serverless functions (SF) in a serverless envi­ • Performance metrics (e.g., execution time, latency).
ronment while considering a number of various real-world parameters. • Number of requests received.
One of the key parameters to consider within Auto Opt Mem is memory • Cost metrics.
configuration, a term that shows how much of the available memory • Specific performance requirements.
resources are to be allocated for the serverless function. Memory
configuration has a direct impact on the performance and resource Action
The action At at time t is the choice of memory size from some pre­
defined set of configurations for that particular function.
Reward
The reward Rt at time t is the signal that informs feedback and
therefore controls learning. It needs to be defined with the aim of:
• Encourage efficient memory usage.
• Improve performance metrics.
• Maintain or increase Quality of Service (QoS).
• Minimize operational costs.
Policy
Policy is a strategy by which the agent makes decisions to select
actions according to the current state. The DRL model implements the
policy.
Fig. 2. Autonomic computing (MAPE-K loop) [57]. Fig. 3 shows the iterative loop diagram in deep reinforcement
7
Z. Shojaee Rad et al. Computer Standards & Interfaces 97 (2026) 104098
Fig. 3. Iterative loop diagram in deep reinforcement learning algorithms.
learning algorithms.
Table 4
List of Mathematical Symbols.
4.2. Problem formulation
Definition Notation
This section presents a mathematical model for the memory config­ A set of serverless functions Fi
uration problem in serverless computing. In the following, we describe Memory allocated to function F є fi Mi
Delay of function fi with memory Mi Li (Mi)
the used symbol in more details.
Cost of executing function fi with memory Mi Ci (Mi)
Memory usage for function fi with memory Mi Ui (Mi)
4.2.1. Notation Quality of service criterion for function fi with memory Mi Qi (Mi)
Reward at time t Rt
• F: Set of serverless functions State space S
State space at time t St
• Mi: Memory allocated to function F in fi Space of action A
• Li (Mi): Latency of function fi with memory Mi Action A at time t At
• Ci (Mi): Cost of executing function fi with memory Mi Current memory allocation for each function fi Mi,t
• Ui (Mi): Memory utilization for function fi with memory Mi Latency Li,t
Cost Ci,t
• Qi (Mi): Quality of Service metric for function fi with memory Mi
Quality of service Qi,t
• Rt: Reward at time t Weighting factor for delay α
Weighting factor for cost β
The list of mathematical symbols is summarized in Table 4. Weighting factor for usage (utilization) γ
The framework and automated approach for learning-based memory Weighting factor for service quality δ
Minimum memory size that can be allocated to a function Mi,min
configuration in serverless computing is as follows: The maximum amount of memory that can be allocated to a function Mi,max
State space (S) Minimum acceptable QoS for the function Fi Qmin
i
The state St at time t includes: Expected value E
Policy π
• Current memory allocation Mi,t for each function fi. Value function Vπ
Policy network parameters θ
• Performance metrics such as latency Li,t , cost Ci,t, and Quality of Value network parameters ϕ
Service Qi,t. Action-value function Qπ (s,a)
• The number of requests received.
Action space (A) learning, guiding the agent toward desired behaviors by associating
The action At at time t involves selecting the memory size Mi,t + 1 for rewards or penalties based on the outcome of actions. The reward
each function fi. function Rt is designed to:
Reward function (R)
The reward function is an essential component of reinforcement • Reduce latency and cost.
8
Z. Shojaee Rad et al. Computer Standards & Interfaces 97 (2026) 104098
• Decrease memory utilization and increase QoS. ( )
Qi Mi,t ≥ Qmin
i for alli ∈ F (6)
Which is expressed by Eq. (1): Mi,min and Mi,max are the minimum and maximum memory sizes that
(
∑ (
) can be allocated to the function Fi.
( ) ( ) ( )) ( )
Rt = i∈f
αLi Mi,t + βCi Mi,t γU Mi,t + δQi Mi,t (1) The minimum acceptable QoS for the function Fi is. This ensures that
the QoS is not less than a certain threshold.
where (α, β, γ, δ) are the weighting factors for latency, cost, utilization, 4.2.3. Reinforcement learning formula
and QoS, respectively. That weights can adjust the relative importance This section explains the mathematical formulas used in the rein­
of each component in the reward function. For example, if reducing forcement learning component of the Auto Opt Mem framework.
latency is more important than cost, α can be increased relative to β. Policy (π)
Lower latency is desirable; therefore, -αLi (Mi,t) penalizes higher The policy π(a|s) is the probability distribution over actions given the
latencies. current state. It is the strategy that the decision-maker considers for its
Lower costs are also preferable; thus, -βCi (Mi,t) penalizes higher next action in response to the current state.
expenses. Value function (Vπ)
Memory utilization is desired to be efficient. Over-allocation of Value function refers to the expected long-term discounted reward,
memory leads to resource waste, while under-allocation can degrade which contrasts with short-term rewards (R) as it focuses on the long
performance. γUi (Mi,t) rewards the agent for optimal memory usage. term. It represents the expected return in the long-term resulting from
Higher QoS is preferred; therefore, +δQi (Mi,t) rewards better service the current state s under policy π. The value function is an important
quality. concept in reinforcement learning and represents the expected return
To normalize the reward function for optimizing memory configu­ (cumulative reward) starting from state s and following policy π, and is
ration in serverless computing, all components (cost, latency, utiliza­ defined by Eq. (7):
tion, and QoS) must be on a common scale. Because they have different [ ]
units (for example, costs in dollars, latency in milliseconds). Below is an ∑∞
V π (S) = E γt Rt |S0 = s, π (7)
approach to achieve this normalization. We will use min-max normali­ t=0
zation for each component. The min-max normalization formula is as
follows, and is defined by Eq. (2): Where γ is the discount factor.
x xmin
xʹ = (2) • Expected value (E): Due to the random nature of the environment, it
xmax xmin
is the average or expected return in all possible future states.
Where: • Sum of reward: ∞ t
t=0 γ Rt It represents the Sum of rewards over
time. Rewards are discounted by a factor of γt to prioritize immediate
• X is the original value, rewards over distant future rewards.
• Xmin is the minimum value for that metric, • Discount factor (γ): The discount factor γ (where 0 < γ < 1) de­
• Xmax is the maximum value for that metric. termines the present value of future rewards. A higher γ makes future
rewards more significant, while a lower γ emphasizes immediate
After normalization, the metrics can be combined into the reward rewards.
function without unit conflict, and the final reward formula becomes, • Policy (π): The policy π is the decision rule that gives the probability
which is expressed by Eq. (3): of executing action a from a state s.
( ( ( ) ( ) ( ) ))
∑ Li Mi,t Lmin Ci Mi,t Cmin Ui Mi,t Umin
Rt = α γ Value function is important since it is utilized to measure how good a
i∈f
Lmax Lmin Cmax Cmin Umax Umin given state is under a given policy. It is also a basis for policy comparison
( ) and for determining optimal actions to maximize the long-term reward.
Qi Mi,t Qmin
Qmax Qmin Bellman equation
(3) The Bellman equation provides a recursive decomposition for the
value function, making it a powerful tool for solving reinforcement
Normalization, in effect, normalizes each component into the range learning problems. The Bellman equation for the value function is as
of [0, 1], thus making them comparable even if they are from different follows, and is defined by Eq. (8):
units. Normalizing the reward function components makes them com­
parable and hence they can be combined meaningfully during the V π (S) = Ea π [Rt + γV π (S + 1)|St = s, At = a] (8)
optimization process. This method increases the robustness of the model Reasons for using the Bellman equation:
and enhances convergence in reinforcement learning algorithms.
• Recursive nature: The Bellman equation breaks down the value of a
4.2.2. Optimization problem state into the immediate reward Rt plus the discounted value of the
First, it is necessary to mathematically formulate the problem in next state γVπ(St+1). The recursion makes computation efficient and
terms of optimization and reinforcement learning for the Auto Opt Mem forms the foundation for dynamic programming techniques.
algorithm and its solution. The problem of optimization can be done as • Policy evaluation: By repeatedly updating the value function using
follows, and is expressed by Eq. (4): the recursive relationship, it aids in evaluating the expected return of
T
∑ a policy π.
minM Rt (4) • Optimality principle: For finding the optimum policy, the Bellman
t=0 optimality equation expresses the relationship between the value of a
With the following constraints, and is expressed by Eq. (5): state and the values of subsequent states. This is used in algorithms
like value iteration and Q-learning for finding the optimal value
Mi,min ≤ Mi,t ≤ Mi,max for alli ∈ F (5) function.
Quality of Service constraints, and is expressed by Eq. (6): • Simplification of complexity: The use of a recursive approach al­
lows the Bellman equation to simplify the calculation of the value
9
Z. Shojaee Rad et al. Computer Standards & Interfaces 97 (2026) 104098
function for all states, which otherwise would be computationally monitoring the number of incoming requests for each function. The
infeasible in large state spaces. monitoring phase is important because it records real-time information
that indicates system performance and resource utilization. In short:
Policy gradient
The policy gradient method is used to optimize policy π, and is • Continuously observe the current state of St, including memory
defined by Eq. (9): allocation, performance metrics (latency Li,t , cost Ci,t , utilization
Ui,t, QoS Qi,t), and incoming request rate.
∇θ J(πθ ) = Es dπ ,a πθ [∇θ logπθ (a|s)Qπ (s, a)] (9)
• Collect data from serverless functions and the environment.
where θ are the parameters of the policy, and Qπ (s,a) is the action-value
function. 4.3.2.2. Analyze. In the analysis phase, the system uses the perfor­
The value-action function represents the expected return of taking mance metrics collected during the monitoring phase, and the system
action a in state s and then following policy π, providing a basis for accepts the performance measurements collected during the monitoring
policy improvement. phase. This is achieved by examining the current performance of the
The Bellman equation and value function are the basis of studying serverless functions based on the collected data and calculating a reward
and quantifying long-term impact from these decisions and guide the value that controls the learning. The reward function is usually based on
policy to optimal function. latency, cost, utilization, and QoS, thus allowing for any inefficiencies or
issues that need to be addressed in subsequent phases. In short:
4.3. Proposed algorithm
• Evaluate performance metrics Li(Mi), Qi(Mi), Ci(Mi), Ui(Mi).
• Calculate reward Rt based on the current state.
The proposed Auto Opt Mem algorithm uses deep reinforcement
learning to learn how to assign serverless functions to compute resources
Which is expressed by Eq. (10):
efficiently. Incorporating the MAPE loop, the Auto Opt Mem algorithm
( ( ( ) ( ) ( ) ))
continuously monitors, analyzes, plans, and executes actions to optimize ∑ Li Mi,t Lmin Ci Mi,t Cmin Ui Mi,t Umin
memory allocation in an automous and adaptive manner. The process of Rt = α γ
i∈f
Lmax Lmin Cmax Cmin Umax Umin
the algorithm is as follows: ( )
Qi Mi,t Qmin
4.3.1. Initialization Qmax Qmin
This algorithm is the first step in the deep reinforcement learning (10)
(DRL) process for memory configuration. First, two main networks are
prepared: the Policy Network, which is responsible for choosing the 4.3.2.3. Plan. In the planning phase, the system decides the next ac­
action (e.g., allocating the amount of memory) in each state. This tions based on the analysis observations. In this stage, best memory
network initially starts with random parameters because it has no allocation decisions are selected based on the policies learned from the
knowledge at the beginning and gradually learns to make optimal de­ deep reinforcement learning model and updates in policy network pa­
cisions. The Value Network estimates the long-term value of a state rameters for enhanced future decision-making. This planning assists the
based on the sum of future rewards. This network also starts with system in adjusting its memory allocation policies effectively based on
random parameters. Then, the initial state of the environment (S0) is the current situation and performance analysis. It can be said that:
defined, which includes the memory configuration of each function,
performance indicators (latency, cost, QoS, and resource utilization), • Use policy network πθ to determine the optimal action At (memory
and the rate of incoming requests. This step is the basis of training and allocation for the next time step).
provides the basis for the agents interaction with the environment. • Update policy and value networks using reinforcement learning
This initialization is very important, as it sets the starting point for techniques. Typically involves:
training the networks to learn optimal memory allocation for serverless ○ Calculating the policy gradient using the policy gradient method.
functions, with the aim of minimizing cost and latency and maintaining ○ Updating the policy network parameters θ.
high QoS. This is shown in Algorithm 1. ○ Using the Bellman equation to update the network parameters ϕ.
4.3.2. MAPE loop 4.3.2.4. Execution. The execution phase is responsible for carrying out
In this section, we describe an autonomous memory configuration the actions that have been planned. It allocates memory to every func­
includes four phases: monitoring, analysis, planning, and execution, tion according to the decisions of the planning phase and changes to the
next state, ready to trigger the MAPE cycle once more. Its role is to
4.3.2.1. Monitor. In the monitoring phase, the system monitors the realize the change from planning and analysis, optimize resource utili­
current state of the environment at all times, and the system is always zation, and improve the system as a whole. In summary:
aware of the state of the environment. This includes monitoring the
memory allocated to each serverless function, obtaining performance • Apply the selected action to adjust the memory allocation for each
data such as latency, cost, utilization, and quality of service (QoS), and function.
• Go to the next state St+1 and repeat the loop.
Algorithm 1
Pseudo code for initialization phase ( ). The MAPE loop enables a dynamic and automated way of memory
1: Input: Set of serverless functions F
management in serverless computing systems, where the system can
2: Output: Initialized policy network πθ, value network Vϕ, and initial state S0 learn and change with new situations on a continuous basis, eventually
3: Initialize policy network πθ with random parameters θ resulting in enhanced performance and resource efficiency, as shown in
4: Initialize value network Vϕ with random parameters ϕ Algorithm 2.
5: Define initial system state S0 including:
6: - Memory allocation for each fi ∈ F
7: - Performance metrics: Latency Li(Mi), Cost Ci(Mi), Utilization Ui(Mi), QoS Qi(Mi) 4.3.3. Training loop
8: - Incoming request rate The training loop is an important part of the Auto Opt Mem algo­
9: Return (πθ, Vϕ, S0) rithm, which uses deep reinforcement learning to improve how memory
10
Z. Shojaee Rad et al. Computer Standards & Interfaces 97 (2026) 104098
is allocated for serverless functions. The process starts by resetting the Although AWS Lambda and CloudWatch were used conceptually to
environment to its initial state, called S0. For each time step t, the al­ structure the resource and metric model, all experiments were executed
gorithm goes through the MAPE loop. It begins by checking the current in a simulated environment. As shown in Table 5.
state St, gathering data about memory usage, performance metrics, and The experiments involved tuning a set of hyperparameters. The pa­
other relevant information. After collecting this data, it analyzes it to see rameters were:
how well the system is performing and calculates a reward that helps Learning rate: Different learning rates like 0.01, 0.001 and 0.0001
guide future learning. Next, the algorithm plans the next action by were attempted during training.
choosing the best way to allocate memory using its policy network. This Discount Factor (γ): Discounting coefficient of 0.99 was chosen to
decision is based on the insights gained from the analysis. Once it de­ give higher importance to long-term rewards for the reinforcement
cides on an action, the system carries it out, adjusting the memory for learning.
each function as needed. Finally, the algorithm moves to the next state Batch size: A batch size of 32 was utilized in training the deep
St+1 and prepares to start the loop again. This ongoing process allows the learning models, a trade-off between training time and model accuracy.
system to learn and adapt continuously, refining its memory allocation Number of episodes: Training was carried out over 100 episodes,
strategies based on real-time feedback. Each cycle helps improve the where the agent learns through experience interacting with the envi­
overall performance and efficiency of memory management in the ronment and tunes its memory allocation policies.
serverless environment. In summary, it can be said: Both the policy and value networks are implemented as multilayer
perceptrons (MLPs). The input layer receives the state vector, which
1. Environment reset: Reset the environment to its initial state S0. includes memory allocation, latency, cost, utilization, QoS, and request
2. MAPE execution: For each time step t: rate. Each network has two fully connected hidden layers with 128 and
• Monitor: Observe the current state St. 64 neurons, respectively, and ReLU activation functions. The policy
• Analyze: Evaluate performance and calculate reward. network ends with a softmax output layer producing a probability dis­
• Plan: Select an action using the policy network. tribution over possible memory allocations, while the value network has
• Execute: Execute the action and move to the next state St+1. a single linear output estimating the state value. The generated experi­
ence data was split into 70 % for training and 30 % for evaluation, which
Algorithm 3 shows the execution phase of the MAPE loop during the is standard in DRL-based optimization studies.
training process.
5. Performance evaluation 5.2. Performance metrics
In this section, we present the performance evaluation of the novel To evaluate the effectiveness of the proposed approach, we utilize
automatic deep learning-based approach (Auto Opt Mem) for memory several performance metrics, including:
setting in serverless computing. We describe the experimental setup, Latency: The time taken for a function to execute, measured in
performance metrics, and the result of the experiments. milliseconds. Lower latency indicates better performance. Which is
expressed by Eq. (11):
5.1. Experimental setup Tend Tstart
L= (11)
N
We carried out experimental analysis in this study on a Windows
1164-bit computer with an Intel Core i7 processor. The evaluation uses Where and are the Tstart and Tend times of execution, and N is the
a serverless simulation environment, where memory sizes between 128 number of function invocations.
MB and 2048 MB are modeled to study their impact on latency, cost, and Cost: The total cost incurred during function execution, measured in
quality of service (QoS). A virtual CPU (vCPU) model with burst US dollars. Our goal is to minimize this cost. Which is expressed by Eq.
behavior and dynamic workload scaling is incorporated into the simu­ (12):
lation. Python was utilized to implement deep reinforcement learning N
algorithms in machine learning. Proximal Policy Optimization algo­ C= (Mi × Pmem + Ti × Pexec ) (12)
i=1
rithm was utilized to train the deep reinforcement learning agent as it is
known to be stable and efficient in policy optimization procedures. Also, Where Mi is the allocated memory, Pmem is the price per MB, Ti is the
the reason for choosing the Proximal Policy Optimization (PPO) algo­ execution time, and Pexec is the execution price per second.
rithm is that it is widely suitable for continuous control and policy Quality of Service (QoS): A composite score reflecting the reli­
optimization problems in environments with large dynamic states such ability and user satisfaction of the service. A composite metric based on
as Serverless environments. And it has higher stability in training than latency and availability, defined as, and is expressed by Eq. (13):
algorithms such as REINFORCE or Vanilla Policy Gradient. It has the
1
ability to control the trade-off between exploration and exploitation in QoS = (13)
L + δ(1 A)
dynamic environments where the workload changes randomly. It is
suitable and scalable in environments with high-dimensional state Where A is the availability factor (percentage of successful execu­
spaces where multiple parameters such as latency, cost, utilization, and tion) and helps the quality of service to take into account the impact of
quality of service (QoS) need to be optimized simultaneously. The availability on the overall system performance. δ is a weighting
workloads used in our evaluation consisted of four modeled serverless parameter.
scenarios: ML inference, API aggregation, data preprocessing (ETL), and Utilization: The efficiency of memory usage during function
video processing. These workloads are designed to emulate the perfor­ execution, aiming for optimal allocation without wastage. Which is
mance behavior of serverless applications while being executed in a fully expressed by Eq. (14):
simulated environment. Furthermore, all experiments are implemented
Mused
in Python language, and the source code of simulation can be down­ U= × 100 (14)
Mallocated
loaded at the GitHub repository.1
Where Mused is the actual memory usage and Mallocated is the assigned
memory. A higher value indicates optimal management of memory
1
https://github.com/zahrashj-rad/Auto-Opt-Mem resources.
11
Z. Shojaee Rad et al. Computer Standards & Interfaces 97 (2026) 104098
Algorithm 2 the models weights. We tested different learning rates to examine their
Pseudo code for MAPE loop phase ( ). impact on Auto Opt Mems efficiency.
1: Input: Current state St, networks (πθ, Vϕ) In this experiment, different learning rates were implemented for the
2: Output: Updated state St+1 problem of prediction and memory regulation in serverless environ­
3: Monitor: ments, and a reinforcement learning agent based on policy gradient was
4: Collect metrics (Latency Li,t, Cost Ci,t, Utilization Ui,t, QoS Qi,t, Requests)
implemented. The goal was to show the impact of learning rate on
5: Analyze:
6: Compute reward Rt using the reward function memory allocation policy learning and ultimately on the performance
7: Rt = (αLi(Mi,t) + βCi(Mi,t) γUi(Mi,t)) + δQi(Mi,t) metrics Latency, Cost, QoS, and Utilization. Three learning rate values
8: Normalize all components (cost, latency, utilization, QoS) were tested: 0.01, 0.001, and 0.0001; each experiment was performed
9: using: on 100 episodes. The environment model and the relationships between
10: X = (X Xmin) / (Xmax Xmin)
memory and metrics were implemented by a noisy function to represent
11: Plan:
12: Select action At = πθ (St) the overall system behavior and the impact of memory selection on
13: Update policy network πθ using policy gradient metrics.
14: ∇θ J(πθ) = E₍s-dπ, a-πθ₎ [ ∇θ log πθ(a|s) Qπθ(s,a) ] The learning rate can affect the performance metrics as follows:
15: Update value network Vϕ using Bellman equation
Latency: A very high learning rate may destabilize training, pre­
16: Vπ(S) = E a-π [ Rₜ + γ Vπ (S + 1) | Sₜ = s, Aₜ = a ]
17: Execute:
venting the model from converging to an optimal policy. This instability
18: Apply action At (update memory allocation Mi,t + 1) leads to fluctuations in memory allocation and, consequently, higher
19: Transition to next state St+1 execution latency. Conversely, a moderate learning rate supports stable
20: Return St+1 convergence, resulting in lower latency.
Cost: If memory allocation decisions are unstable due to an exces­
sively high learning rate, resource usage can increase, raising the
Algorithm 3 execution cost. However, in some cases, a higher learning rate can
Pseudo code for MAPE Execution phase ( ). accelerate convergence to an efficient allocation, thus reducing costs.
1: Input: Environment, networks (πθ, Vϕ) The impact on cost is therefore dual, depending on whether training
2: Output: Optimized memory allocation policy stabilizes.
3: Initialize environment and obtain initial state S0 Quality of Service (QoS): High learning rates may cause instability
4: For each episode do
in memory allocation policies, leading to inconsistent performance and
5: For each time step t do
6: Run MAPE Loop with input St degraded QoS. In contrast, a well-tuned moderate learning rate achieves
7: Observe reward Rt and next state St+1 more stable optimization and improved QoS.
8: Store tuple (St, At, Rt, St+1) Utilization: Rapid but unstable adjustments caused by a high
9: Update πθ and Vϕ based on (St, At, Rt, St+1) learning rate can lead to inefficient memory allocations, resulting in
10: Policy network update
11: ∇θ J(πθ) = E₍s-dπ, a-πθ₎ [ ∇θ log πθ(a|s) Qπθ(s,a) ]
either under-utilization or over-utilization of resources. A balanced
12: Value network update learning rate is more likely to achieve efficient utilization. Fig. 4 shows
13: Vπ(S) = E a-π [ Rₜ + γ Vπ (S + 1) | Sₜ = s, Aₜ = a ] the learning rates and results.
14: Set St ← St+1 The cost and quality of service results obtained from training the
15: End For
agent with different learning rates showed:
16: End For
17: Return optimized policy πθ The intermediate learning rate LR = 0.001 shows the lowest latency
and highest QoS among the three values. In contrast, the cost increases
significantly and the utilization decreases. In fact, the agent with LR =
0.001 learns faster and more significantly policies that favor higher
Table 5
Tools and technologies.
memory (or policies that make allocations that reduce latency) this leads
to improved QoS but increases the cost per execution unit. The inter­
Component Description
mediate rate allows the agent to accept weight changes strongly enough
Cloud Provider AWS Lambda (conceptual model), implemented as a to reach regions with lower latency (but may be cost-ineffective).
simulated environment in code. The larger learning rate LR = 0.01 shows very low cost and very high
Programming Language Python, for implementing the deep reinforcement
learning algorithms.
utilization, but latency and QoS remain at moderate levels. In fact, high
Deep Learning Framework TensorFlow/Keras, for building and training the deep learning rates usually make large updates; In this simulation, the agent
learning models. has arrived at a policy that keeps the cost low (e.g., choosing low or
Reinforcement Learning Proximal Policy Optimization (PPO) average memories) while maintaining high utilization. This could mean
Algorithm
learning a cost-saving policy; however, this policy may cause fluctua­
Monitoring Tools Simulated monitoring module (MAPE-K), no real
CloudWatch data used. tions and not reach the optimal latency. It is also possible for the agent to
Data Management Pandas, for data manipulation and analysis of the get stuck in a local boundary (with low cost) under noisy gradients.
performance metrics. A smaller learning rate of LR = 0.0001 had intermediate results
Development Environment Jupyter Notebook / Google Colab for interactive (latency and QoS close to LR=0.01 but average cost). In fact, a too small
development and experimentation.
rate leads to slow and stable learning; the agent may not have fully
converged yet and not have seen significant improvement by the end.
5.3. Experimental results Table 6 shows impact of learning rate effects on performance
metrics.
We evaluated our proposed approach with baseline methods,
including machine learning-based approaches, the impact of learning 5.3.2. Second scenario: reward function formula based on MAPE loop
rate and reward function. To further analyze the impact of Auto Opt Mem, we conducted eight
experiments and calculated the reward function. As explained in the
5.3.1. First scenario: impact of learning rate on optimization Section 4, the reward function is calculated from the following formula.
One of the important hyperparameters in deep reinforcement
learning is the learning rate (LR), which controls the extent of updates to
12
Z. Shojaee Rad et al. Computer Standards & Interfaces 97 (2026) 104098
( ( ( ) ( ) ( ) ))
∑ Li Mi,t Lmin Ci Mi,t Cmin Ui Mi,t Umin We calculate the reward function for each experiment.
Rt = α γ Experiment 1:
i∈f
Lmax Lmin Cmax Cmin Umax Umin
( )
( )
Qi Mi,t Qmin 90 80 0.17 0.15 75 60 66 40
R1 = + +
Qmax Qmin 100 80 0.50 0.15 80 60 70 40
Where α, β, γ, and δ are the weights for each variable. The negative at R1 = (0.5 + 0.1333 0.75) + 0.8667 = ( 0.1167) = 0.1167
the beginning of the formula shows that we want to include the negative The reward function for the remaining experiments is calculated
impact of cost and latency in the reward function. Quality of service (Qi) similarly.
is considered positive and its positive impact is included in the reward These findings strengthen the ability of Auto Opt Mem to optimize
function. Adding up the metrics, we would like to consider the sum of execution time while reducing costs, and it has better performance. The
the influence of all functions. To calculate the reward function, the reward function is analyzed as follows.
minimum and maximum values are given below:
Lmin = 80, Lmax = 100 • Increasing QoS and Utilization have increased the reward, because
the system has better efficiency.
Cmin = 0.15, Cmax = 0.50 • Reducing latency and cost has a positive effect on the reward, which
indicates more optimal performance.
Umin = 60, Umax = 80 • The highest reward value is observed in Experiment 8 (0.91), which
indicates the optimal balance between QoS, Utilization, latency, and
Qmin = 40, Qmax = 70 cost.
• The lowest reward value is recorded in Experiment 6 (0.03), which is
The reward function for different data in the experiment is shown in
due to the increase in latency and the decrease in system efficiency
Table 7, (Latency in ms, Cost in USD, QoS %, Utilization %, Reward
(Utilization).
score).
Fig. 4. Impact of learning rate on performance metrics.
13
Z. Shojaee Rad et al. Computer Standards & Interfaces 97 (2026) 104098
Table 6
Comparison of learning rate effects on performance metrics.
Learning Rate (LR) Latency Cost QoS Utilization
• 0.01 (high) • Relatively high, unstable, converges to moderate level • Lowest cost • Moderate QoS • Highest utilization
• 0.001 (medium) • Lowest latency (best) • Highest cost • Highest QoS • Reduced utilization
• 0.0001 (low) • Moderate, slow convergence • Moderate cost • Moderate QoS • Moderate utilization
Fig. 5 shows the reward function graph. X-axis: Experiment ID (18). but lacking real-time optimization capabilities. Auto Opt Mem in­
Y-axis: Reward value (normalized) tegrates reinforcement learning to dynamically adjust memory alloca­
Fig. 6 shows Comparison of reward function and other metrics in tions, ensuring optimal performance across diverse execution
different experiments. environments without requiring manual intervention.
This section assesses the proposed Auto Opt Mem in a realistic
• Comparison of latency and reward: Fig. 6a shows that decreasing serverless simulation environment. Four workloads were modeled as
latency generally leads to increasing reward. The reward value is representatives, including ML inference, API gateway, data processing,
higher at lower latencies (such as 84 and 85 milliseconds), but de­ and video processing, each independently defined by its memory range,
creases at higher latencies. baseline latency, and cost functions. A neural network with two hidden
• Comparison of cost and reward: Fig. 6b shows that while costs may layers (32 and 16 neurons) was used to train the PPO agent for 80 epi­
be decreasing, the rewards are increasing, indicating a potentially sodes. The reward function incorporates normalized latency, cost, and
favorable outcome for the experiments. QoS terms that drive the policy to achieve a balanced optimization. An
• Comparison of quality of service and reward: Fig. 6c shows that autonomic MAPE-K (Monitor, Analyze, Plan, Execute Knowledge) loop
increasing QoS usually increases reward. Quality of service in the was implemented at runtime to make the system self-adaptive. MAPE
range of 7072 % has the highest reward value. continuously monitors the recent performance, analyzes QoS/cost
• Comparison of utilization and reward: Fig. 6d shows that higher trends, plans corrective actions such as tuning the memory, and executes
utilization usually leads to increased reward. The reward value in­ them through the adjustment of PPO policy parameters. The results are
creases sharply for values of 80 % and above. summarized in Table 8.
Auto Opt Mem demonstrates superior improvements across all
These graphs show that the optimized system tends to reduce la­ metrics compared to existing research, establishing its effectiveness.
tency, reduce cost, increase quality of service, and improve efficiency or Fig. 7 shows the Comparison with previous studies.
utilization to obtain maximum reward. The suggested reward function, Table 8 reports the average performance achieved by Auto Opt Mem
which incorporates delay, cost, utilization, and QoS, can effectively across all benchmark functions in terms of latency reduction, cost sav­
balance the systems various goals. Experiment 8 with the highest ings, and QoS improvement. These values are directly compared with
reward of 0.91 depicts the optimal balance between all the objectives. the results of Sizeless [17] and FnCapacitor [18]. Results demonstrate
Experiment 6 with the lowest reward of 0.03 is worse due to the high that, on average, the proposed Auto Opt Mem achieves 2530 % less
delay and low utilization. The comparison graphs in Fig. 6 clearly show latency, 1518 % less cost, and 1012 % more QoS when compared to
that the improvement in each of the criteria, such as reducing delay and the Sizeless [17] baseline, while outperforming FnCapacitor [18] in all
cost, or increasing QoS and utilization, leads to an increase in reward. key metrics. This proves that PPO-based MAPE-K autonomic loop can
This means that the reward function design is suitable and can be used as dynamically adjust for the workload variability and provide optimized
a criterion for multi-objective optimization. resource allocation in serverless environments.
The results of 8 experiments conducted with different metrics are
5.3.3. Third scenario: comparison with previous studies presented in Table 9. Synthetic yet reproducible data were used, and the
To validate our results, we compared Auto Opt Mem with two papers dataset is openly provided for replication.
[17] and [18] that tackled memory optimization in serverless Table 10 summarizes the statistical comparison of Auto Opt Mem
computing. When compared to the works of Eismann et al. [17] and with the baseline methods. For each metric, the mean, standard devia­
Jindal et al. [18], Auto Opt Mem achieves a more substantial reduction tion, and 95 % confidence interval were calculated. For instance, Auto
in execution latency, a greater cost reduction, and a significant Opt Mem has a latency of 267.92 ms ± 1.65 with a 95 % CI of (266.74,
improvement in QoS. Unlike previous studies that primarily relied on 269.10), outperforming both Sizeless [17] and FnCapacitor [18].
static function profiling or statistical estimations, Auto Opt Mem For statistical significance, we use an independent two-sample t-test.
continuously learns and adapts to varying workloads, and thus is more Indeed, our results indicate that the improvements of Auto Opt Mem
adaptive and scalable. Eismann et al. [17] developed a model for over Sizeless [17] are statistically significant for all metrics (p < 0.02).
resource prediction based on monitoring a single memory size, For FnCapacitor [18], the cost differences are significant with p = 0.035.
achieving performance gains but with limited adaptability to dynamic The statistical analysis confirms that Auto Opt Mem provides
workloads. Jindal et al. [18] introduced a statistical and deep learning consistently better performance, with several improvements being sig­
approach to estimate function capacity, improving resource efficiency nificant at the 95 % confidence level.
The average latency, average cost, and average quality of service in
different methods are shown in the graph in Fig. 8.
Table 7 In Fig. 8a, Auto Opt Mem (green line) has the lowest latency in all
Reward function values in different experiments. tests, indicating better optimization in memory allocation and faster
Experiment Latency (ms) Cost ($) QoS ( %) Utilization ( %) Reward execution of serverless functions.
1 90 0.17 66 75 0.11 In Fig. 8b, right, Auto Opt Mem minimizes the cost of executing
2 88 0.16 63 78 0.43 functions in all experiments. Jindal et al. [18] method reduces the cost
3 92 0.18 67 74 0.8 compared to Eismann et al. [17] but is still higher than Auto Opt Mem.
4 87 0.15 70 80 0.65
In Fig. 8c, Auto Opt Mem (green line) has the highest Quality of
5 89 0.17 64 76 0.21
6 91 0.16 68 73 0.03 Service (QoS), indicating increased reliability and optimal performance
7 86 0.15 71 79 0.65 under different workload conditions. Jindal et al. [18] method provides
8 85 0.14 72 82 0.91 better QoS than Eismann et al. [17] but still falls short of Auto Opt
14
Z. Shojaee Rad et al. Computer Standards & Interfaces 97 (2026) 104098
Fig. 5. Reward function values for eight experimental runs.
Mems. Table 11 further illustrates the differences of our approach from with regards to, for example, memory-to-vCPU coupling and pricing
previous methods. scheme, which closely aligns with Azure Functions and Google Cloud
As compared to static allocation strategies, Auto Opt Mem saves a lot Functions. Auto Opt Mem is provider-agnostic since platform-specific
of resource wastage by memory allocation according to actual function constraints, such as memory ranges and CPU scaling rules, are
requirements rather than over-provisioning. Such dynamic allocation embedded in the state representation. Thus, even though execution was
leads to cost saving in a large extent, especially in the case of varying simulated, the framework is transferable to real AWS Lambda de­
workload or mixed kinds of functions. Moreover, the ability of Auto Opt ployments and other cloud providers as well.
Mem to minimize latency leads to improved application performance
and user experience. By efficient memory management and reducing 6.2. Potential application scenarios
cold start time, Auto Opt Mem promises that functions will execute both
quickly and reliably even in high-demand situations. In contrast to Due to resource and space limitations, we used controlled micro-
heuristic methods that typically operate on oversimplification assump­ benchmarks. Still, Auto Opt Mem is not restricted to this setup and
tions and struggle with dynamic states, Auto Opt Mem employs deep can also be used in real applications. For example, it helps with machine
reinforcement learning to discover decisions from real current states. learning inference tasks, such as running image or text classification
Such responsiveness is important in numerous serverless computing models where reducing cost and latency is important. It is also useful for
applications. While machine learning-based methods are somewhat video transcoding, since converting formats with tools such as ffmpeg
flexible, Auto Opt Mem is better at balancing conflicting objectives, e.g., usually takes a lot of CPU and memory. Another case is data pre­
cost minimization and optimal performance. By means of a reward processing, where large datasets need to be read, compressed, or
function that considers several parameters, Auto Opt Mem gives an filtered. Auto Opt Mem also works well in API aggregation, when several
overall optimization of the system. external services are called at the same time and the process is mostly I/
O-bound.
6. Discussion These examples illustrate that Auto Opt Mem is workload-agnostic
and can be applied to real-world scenarios. A full-scale practical eval­
In this section, we discuss about the AWS lambda platform, potential uation is considered for future research. Table 12 shows real-world
application scenarios, and then provide explanations about the scenarios where Auto Opt Mem can be used.
comparative analysis on the memory configuration in serverless
computing.
6.3. Comparative analysis
6.1. Use of AWS Table 13 summarizes recent research in intelligent cloud computing
that focuses on automation, optimization, and deep learning. This table
The experiments were conducted in a simulated environment; the shows how the proposed Auto Opt Mem framework aligns with these
workload behavior, latency patterns, and cost models were derived from studies and focuses on dynamic memory and performance optimization
documented AWS Lambda configuration rules. This ensures that the in serverless environments.
characteristics of a real AWS serverless environment are captured while
still allowing full reproducibility. 7. Conclusions
AWS Lambda serves as the conceptual reference platform due to its
very high market share and representative resource allocation model Memory configuration in serverless computing can be challenging
15
Z. Shojaee Rad et al. Computer Standards & Interfaces 97 (2026) 104098
Fig. 6. Comparison of reward function and other metrics.
as Auto Opt Mem. The results show that Auto Opt Mem optimizes
Table 8
resource utilization, decreases operation costs and latency, and en­
Comparison with previous studies.
hances quality of service (QoS), and hence it can be a perfect fit for
Approach Latency Cost Reduction QoS Improvement developers in serverless systems. Auto Opt Mem shows noticeable im­
Reduction ( %) ( %) ( %)
provements over previous methods. Compared to Sizeless, it reduces
Auto Opt Mem vs 2530 % 1518 % 1012 % latency by 2530 %, lowers cost by 1518 %, and improves QoS by
Sizeless [17]
1012 %. Against FnCapacitor, it achieves 57 % latency reduction, 68
Auto Opt Mem vs 57 % 68 % 23 %
FnCapacitor [18] % cost reduction, and 23 % QoS improvement. Our experiments
demonstrate that Auto Opt Mem On average provides 16.8 % lower la­
tency, 11.8 % cost reduction, and 6.8 % QoS improvement across both
due to the ephemeral nature of serverless functions, which are short- methods.
lived and stateless. This research examines memory configuration In our approach one of the important hyperparameters in deep
mechanisms and then classifies these mechanisms in serverless reinforcement learning is the learning rate, which controls the extent of
computing into three main approaches: machine learning-based, updates to the models weights. A high learning rate typically causes the
exploration-based, and framework-based approaches. The advantages model to update its weights more aggressively. A higher learning rate
and disadvantages of each mechanism, as well as the challenges and may cause inefficient memory usage because of unstable learning,
performance metrics affecting their effectiveness, are discussed. Mem­ leading to suboptimal allocation. But, a very small learning rate could
ory configuration is one of the important challenges in serverless lead to very slow training, Long-term training and slow convergence,
computing; In this paper, we propose an autonomous deep learning- leading to late optimization of memory. It can be said that a high
based serverless computing memory optimization system, referred to learning rate can achieve convergence quickly, but it may pass the
16
Z. Shojaee Rad et al. Computer Standards & Interfaces 97 (2026) 104098
Fig. 7. Comparison with previous studies.
real-time adaptability under more diverse and large-scale workloads.
Table 9
Additionally, the scalability of Auto Opt Mem can also be researched on
Comparison of approaches across 8 experiments.
other serverless frameworks except AWS Lambda.
Approach Latency (ms) Cost ($) QoS ( %)
Sizeless [17] 362.85, 370.14, 0.00243, 0.00236, 80.15, 81.24, Funding
355.92, 380.18, 0.00248, 0.00244, 79.87, 80.45,
364.77, 359.60, 0.00241, 0.00237, 81.06, 80.83,
Funding was received for this work.
372.31, 368.54 0.00246, 0.00242 79.92, 80.61
FnCapacitor 278.42, 271.66, 0.00256, 0.00252, 89.44, 90.22,
All of the sources of funding for the work described in this publica­
[18] 282.10, 276.74, 0.00257, 0.00255, 88.97, 89.75, tion are acknowledged below:
274.89, 280.13, 0.00251, 0.00253, 90.13, 89.61, [List funding sources and their role in study design, data analysis, and
273.80, 275.55 0.00259, 0.00254 89.88, 90.06 result interpretation]
Auto Opt 267.91, 270.34, 0.00221, 0.00219, 91.08, 91.46,
No funding was received for this work.
Mem 265.10, 268.05, 0.00223, 0.00220, 90.83, 91.21,
266.83, 269.14, 0.00218, 0.00222, 91.37, 90.97,
267.54, 266.41 0.00221, 0.00220 91.32, 91.15 Intellectual property
We confirm that we have given due consideration to the protection of
Table 10 intellectual property associated with this work and that there are no
Statistical comparison of Auto Opt Mem with baseline methods. impediments to publication, including the timing of publication, with
respect to intellectual property. In so doing we confirm that we have
Approach Metric Average Std. 95 % CI p-value
Dev. followed the regulations of our institutions concerning intellectual
property.
vs. Auto
Opt Mem
Sizeless [17] Latency 366.29 8.18 (360.43, 0.002 Research ethics
(ms) 372.15)
Cost ($) 0.00243 0.00004 (0.00240, 0.018 We further confirm that any aspect of the work covered in this
0.00246)
manuscript that has involved human patients has been conducted with
QoS ( %) 80.52 0.43 (80.22, 0.001
80.82) the ethical approval of all relevant bodies and that such approvals are
FnCapacitor Latency 276.29 3.29 (273.97, 0.120 acknowledged within the manuscript.
[18] (ms) 278.61) IRB approval was obtained (required for studies and series of 3 or
Cost ($) 0.00255 0.00003 (0.00253, 0.035
more cases)
0.00257)
QoS ( %) 89.88 0.41 (89.59, 0.280
Written consent to publish potentially identifying information, such
90.17) as details or the case and photographs, was obtained from the patient(s)
Auto Opt Latency 267.92 1.65 (266.74, or their legal guardian(s).
Mem (ms) 269.10)
Cost ($) 0.00221 0.00002 (0.00220,
Authorship
0.00222)
QoS ( %) 91.17 0.24 (91.00,
91.34) The International Committee of Medical Journal Editors (ICMJE)
recommends that authorship be based on the following four criteria:
optimal point. As a result, it will lead to irregular updates and require 1. Substantial contributions to the conception or design of the work; or
additional iterations to correct errors, which will increase memory the acquisition, analysis, or interpretation of data for the work; AND
consumption. And a low learning rate helps in more stable and gradual 2. Drafting the work or revising it critically for important intellectual
convergence, but may require more periods to reach convergence, which content; AND
can increase the memory load due to long-term storage of intermediate 3. Final approval of the version to be published; AND
states. Future research directions include extending Auto Opt Mem to
multi-cloud environments, extending and stress-testing Auto Opt Mems
17
Z. Shojaee Rad et al. Computer Standards & Interfaces 97 (2026) 104098
Fig. 8. Compare performance metrics.
4. Agreement to be accountable for all aspects of the work in ensuring been approved by all named authors.
that questions related to the accuracy or integrity of any part of the
work are appropriately investigated and resolved. Contact with the editorial office
All those designated as authors should meet all four criteria for The Corresponding Author declared on the title page of the manu­
authorship, and all who meet the four criteria should be identified as script is:
authors. For more information on authorship, please see https://www. [Mostafa Ghobaei-Arani]
icmje. This author submitted this manuscript using his/her account in
org/recommendations/browse/roles-and-responsibilities/defining-th EVISE.
e-role-of-authors-and-contributors.html#two. We understand that this Corresponding Author is the sole contact for
All listed authors meet the ICMJE criteria. We attest that all authors the Editorial process (including EVISE and direct communications with
contributed significantly to the creation of this manuscript, each having the office). He/she is responsible for communicating with the other
fulfilled criteria as established by the ICMJE. authors about progress, submissions of revisions and final approval of
One or more listed authors do(es) not meet the ICMJE criteria. proofs.
We believe these individuals should be listed as authors because: We confirm that the email address shown below is accessible by the
[Please elaborate below] Corresponding Author, is the address to which Corresponding Authors
We confirm that the manuscript has been read and approved by all EVISE account is linked, and has been configured to accept email from
named authors. the editorial office of American Journal of Ophthalmology Case Reports:
We confirm that the order of authors listed in the manuscript has Someone other than the Corresponding Author declared above
18
Z. Shojaee Rad et al. Computer Standards & Interfaces 97 (2026) 104098
Table 11 submitted this manuscript from his/her account in EVISE:
Differentiation of our approach from previous studies. [Insert name below]
Aspect Sizeless [17] FnCapacitor [18] Our Work We understand that this author is the sole contact for the Editorial
process (including EVISE and direct communications with the office).
(Auto Opt Mem)
Methodology • Multi-target • Sandboxing, • Deep He/she is responsible for communicating with the other authors,
regression using performance tests, Reinforcement including the Corresponding Author, about progress, submissions of
monitoring data and statistical/ Learning with MAPE revisions and final approval of proofs.
from a single DNN modeling control loop
memory size
Decision • Predicts execution • Estimates • Learns and selects CRediT authorship contribution statement
Type time and cost for function capacity optimal memory
other memory sizes (max concurrency configuration Zahra Shojaee Rad: Resources, Methodology, Investigation, Fund­
under SLO)
ing acquisition, Formal analysis, Data curation, Conceptualization.
Adaptivity • Static once • Requires offline • Dynamic and
trained, no profiling for continuous Mostafa Ghobaei-Arani: Writing review & editing, Writing original
continuous learning changes adaptation at draft, Visualization, Validation, Supervision, Software, Resources,
runtime Project administration. Reza Ahsan: Writing review & editing,
Focus • Memory- • Function • Memory Writing original draft.
performance trade- capacity and optimization
offs concurrency balancing latency,
cost, QoS, and
utilization Declaration of competing interest
Limitation • No runtime • Re-profiling —
adaptability needed for new Potential conflict of interest exists:
workloads
Innovation • Efficient • Accurate FC • Self-adaptive and
We wish to draw the attention of the Editor to the following facts,
prediction with estimation for autonomous which may be considered as potential conflicts of interest, and to sig­
limited input functions optimization nificant financial contributions to this work:
The nature of potential conflict of interest is described below:
No conflict of interest exists.
Table 12 The authors declare that they have no known competing financial
Application Scenarios for Auto Opt Mem. interests or personal relationships that could have appeared to influence
Scenario Type of Workload Role of Auto Opt Mem
the work reported in this paper.
• ML inference • CPU-bound • Optimizes latency and cost by
Data availability
adjusting memory/CPU
• Video transcoding • CPU & memory- • Balances higher memory cost with
intensive faster execution time Data will be made available on request.
• Data preprocessing • Mixed (CPU + I/ • Adapts memory allocation based on
(ETL) O) input size
• API aggregation • I/O-bound • Keeps memory low while ensuring References
QoS in parallel API calls
[1] Ioana Baldini, Paul Castro, Kerry Chang, Perry Cheng, Stephen Fink,
Vatche Ishakian, Nick Mitchell, et al., Serverless computing: current trends and
open problems, Res. Adv. Cloud Comput. (2017) 120.
[2] A. Ebrahimi, M. Ghobaei-Arani, H. Saboohi, Cold start latency mitigation
Table 13
mechanisms in serverless computing: taxonomy, review, and future directions,
Comparative analysis with recent studies in the field of intelligent cloud J. Syst. Archit. 151 (2024) 103115.
computing. [3] AWS, “Serverlessvideo: connect with users around the world!.” https://serverless
land.com/, 2023.
Ref Focus Area Technique Key Relation to
[4] AWS, “Serverless case study - netflix.” https://dashbird.io/blog/serverless-case-s
Contribution Present Work
tudy-netflix/, 2020.
[59] Secure data Convergent Reduces Our DRL approach [5] CapitalOne, “Capital one saves developer time and reduces costs by going
deduplication encryption redundancy and similarly targets serverless on aws.” https://aws.amazon.com/solutions/case-studies/capital-on
ensures secure resource e-lambda-ecs-case-study/, 2023.
cloud storage efficiency but in [6] E. Johnson, “Deploying ml models with serverless templates.” https://aws.amazon.
com/blogs/compute/deploying-machine-learning-models-with-serverless-tem
serverless memory
plates/, 2021.
optimization
[7] A. Sojasingarayar, “Build and deploy llm application in aws.” https://medium.
[60] Cloud key Machine Intelligent key Both emphasize com/@abonia/build-and-deploy-llm-application-in-aws-cca46c662749, 2024.
management learning- lifecycle intelligent [8] A. Gholami, M. Ghobaei-Arani, A trust model based on quality of service in cloud
based security management automation for computing environment, Int. J. Database Theor. Appl. 8 (5) (2015) 161170,
framework secure and https://doi.org/10.14257/ijdta.2015.8.5.13.
efficient cloud [9] DataDog. 2020. The State of Serverless. https://www.datadoghq.com/state-of-se
operations rverless/.
[61] Deep learning Deep learning Provides insight Inspires our DL- [10] M. Tari, M. Ghobaei-Arani, J. Pouramini, M. Ghorbian, Auto-scaling mechanisms in
for cloud/ models survey into intelligent driven resource serverless computing: a comprehensive review, Comput. Sci. Rev. 53 (2024)
edge/fog/IoT distributed optimization 100650, https://doi.org/10.1016/j.cosrev.2024.100650.
learning framework [11] M. Ghorbian, M. Ghobaei-Arani, R. Asadolahpour-Karimi, Function placement
paradigms approaches in serverless computing: A survey, J. Syst. Archit. 157 (2024) 103291.
[12] B. Jacob, R. Lanyon-Hogg, D.K. Nadgir, A.F. Yassin, A Practical Guide to the IBM
[62] Cloud security Deep Enhances Our work extends
Autonomic Computing toolkit. IBM, International Technical Support Organization,
and privacy learning- privacy and this intelligence
2004.
based attack adaptive threat toward
[13] Michael Maurer, Ivan Breskovic, Vincent C. Emeakaroha, Ivona Brandic, Revealing
detection response performance the MAPE loop for the autonomic management of cloud infrastructures, in: 2011
optimization and IEEE Symposium on Computers and Communications (ISCC), IEEE, 2011,
QoS improvement pp. 147152.
in serverless [14] Russell, Stuart J., and Peter Norvig. Artificial intelligence: a modern approach.
systems pearson, 2016.
[15] Leslie Pack Kaelbling, Michael L. Littman, Andrew W. Moore, Reinforcement
learning: a survey, J. Artif. Intell. Res. 4 (1996) 237285.
19
Z. Shojaee Rad et al. Computer Standards & Interfaces 97 (2026) 104098
[16] Rajkumar Rajavel, Mala Thangarathanam, Adaptive probabilistic behavioural [40] Zahra Shojaee rad, Mostafa Ghobaei-Arani, Reza Ahsan, Memory orchestration
learning system for the effective behavioural decision in cloud trading negotiation mechanisms in serverless computing: a taxonomy, review and future directions,
market, Fut. Gener. Comput. Syst. 58 (2016) 2941. Cluster. Comput. (2024) 127.
[17] Simon Eismann, Long Bui, Johannes Grohmann, Cristina Abad, Nikolas Herbst, [41] R. Wolski, C. Krintz, F. Bakir, G. George, W.-T. Lin, Cspot: portable, multi-scale
Samuel Kounev, Sizeless: predicting the optimal size of serverless functions, in: functions-as-a-service for iot, in: Proceedings of the 4th ACM/IEEE Symposium on
Proceedings of the 22nd International Middleware Conference, 2021, pp. 248259. Edge Computing (SEC 19). Association for Computing Machinery, New York,
[18] Anshul Jindal, Mohak Chadha, Shajulin Benedict, Michael Gerndt, Estimating the 2019, pp. 236249, https://doi.org/10.1145/3318216.3363314.
capacities of function-as-a-service functions, in: In Proceedings of the 14th IEEE/ [42] V. Yussupov, U. Breitenbucher, F. Leymann, M. Wurster, A systematic mapping
ACM International Conference on Utility and Cloud Computing Companion, 2021, study on engineering function-as-a-service platforms and tools, in: Proceedings of
pp. 18. the 12th IEEE/ACM International Conference on Utility and Cloud Computing
[19] Djob Mvondo, Mathieu Bacou, Kevin Nguetchouang, Lucien Ngale, (UCC19). Association for Computing Machinery, New York, 2019, pp. 229240,
Stéphane Pouget, Josiane Kouam, Renaud Lachaize, et al., OFC: an opportunistic https://doi.org/10.1145/3344341.3368803.
caching system for FaaS platforms, in: Proceedings of the Sixteenth European [43] Zahra Shojaee Rad, Mostafa Ghobaei-Arani, Federated serverless cloud approaches:
Conference on Computer Systems, 2021, pp. 228244. a comprehensive review, Comput. Electric. Eng. 124 (2025) 110372.
[20] Myung-Hyun Kim, Jaehak Lee, Heonchang Yu, Eunyoung Lee, Improving memory [44] Ioana Baldini, Paul Castro, Kerry Chang, Perry Cheng, Stephen Fink,
utilization by sharing DNN models for serverless inference, in: 2023 IEEE Vatche Ishakian, Nick Mitchell, et al., Serverless computing: current trends and
International Conference on Consumer Electronics (ICCE), IEEE, 2023, pp. 16. open problems, Res. Adv. Cloud Comput. (2017) 120.
[21] Agarwal, Siddharth, Maria A. Rodriguez, and Rajkumar Buyya. "Input-based [45] M. Elsakhawy, M. Bauer, Faas2f: a framework for autoining execution-sla in
ensemble-learning method for dynamic memory configuration of serverless serverless computing, in: 2020 IEEE Cloud Summit, 2020, pp. 5865, https://doi.
computing functions." arXiv preprint arXiv:2411.07444 (2024). org/10.1109/IEEECloudSummit48914.2020.00015.
[22] Gor Safaryan, Anshul Jindal, Mohak Chadha, Michael Gerndt, SLAM: sLO-aware [46] A.U. Gias, G. Casale, Cocoa: cold start aware capacity planning for function-as-a-
memory optimization for serverless applications, in: 2022 IEEE 15th International service platforms, in: 2020 28th International Symposium on Modeling, Analysis,
Conference on Cloud Computing (CLOUD), IEEE, 2022, pp. 3039. and Simulation of Computer and Telecommunication Systems (MASCOTS), 2020,
[23] Robert Cordingly, Sonia Xu, Wes Lloyd, Function memory optimization for pp. 18, https://doi.org/10.1109/MASCOTS50786.2020.9285966.
heterogeneous serverless platforms with cpu time accounting, in: 2022 IEEE [47] C.K. Dehury, S.N. Srirama, T.R. Chhetri, Ccodamic: a framework for coherent
International Conference on Cloud Engineering (IC2E), IEEE, 2022, pp. 104115. coordination of data migration and computation platforms, Futur. Gener. Comput.
[24] Tetiana Zubko, Anshul Jindal, Mohak Chadha, Michael Gerndt, Maff: self-adaptive Syst. 109 (2020) 116, https://doi.org/10.1016/j.future, 2020.03.029.
memory optimization for serverless functions, in: European Conference on Service- [48] A. Tariq, A. Pahl, S. Nimmagadda, E. Rozner, S. Lanka, Sequoia: enabling quality-
Oriented and Cloud Computing, Cham: Springer International Publishing, 2022, of-service in serverless computing, in: Proceedings of the 11th ACM Symposium on
pp. 137154. Cloud Computing (SoCC 20). Association for Computing Machinery, New York,
[25] Josef. Spillner, Resource management for cloud functions with memory tracing, 2020, pp. 311327, https://doi.org/10.1145/3419111.3421306.
profiling and autotuning, in: Proceedings of the 2020 Sixth International Workshop [49] J. Manner, S. Kolb, G. Wirtz, Troubleshooting serverless functions: a combined
on Serverless Computing, 2020, pp. 1318. monitoring and debugging approach, SICS Softw.-Intensiv. Cyber-Phys. Syst. 34 (2)
[26] Zengpeng Li, Huiqun Yu, Guisheng Fan, Time-cost efficient memory configuration (2019) 99104, https://doi.org/10.1007/s00450-019-00398-6.
for serverless workflow applications, Concurr. Comput.: Pract. Exp. 34 (27) (2022) [50] J. Nupponen, D. Taibi, Serverless: what it is, what to do and what not to do, in:
e7308, no. 2020 IEEE International Conference on Software Architecture Companion (ICSA-
[27] Andrea Sabbioni, Lorenzo Rosa, Armir Bujari, Luca Foschini, Antonio Corradi, C), 2020, pp. 4950, https://doi.org/10.1109/ICSA-C50368.2020.00016.
A shared memory approach for function chaining in serverless platforms, in: 2021 [51] G. Cordasco, M. DAuria, A. Negro, V. Scarano, C. Spagnuolo, Fly: a domain-
IEEE Symposium on Computers and Communications (ISCC), IEEE, 2021, pp. 16. specific language for scientific computing on faas, in: U. Schwardmann, C. Boehme,
[28] Aakanksha Saha, Sonika Jindal, EMARS: efficient management and allocation of B. Heras D, V. Cardellini, E. Jeannot, A. Salis, C. Schifanella, R.R. Manumachu,
resources in serverless, in: 2018 IEEE 11th International Conference on Cloud D. Schwamborn, L. Ricci, O. Sangyoon, T. Gruber, L. Antonelli, S.L. Scott (Eds.),
Computing (CLOUD), IEEE, 2018, pp. 827830. Euro-Par 2019: Parallel Processing Workshops, Springer, Cham, 2020,
[29] Amit Samanta, Faraz Ahmed, Lianjie Cao, Ryan Stutsman, Puneet Sharma, pp. 531544.
Persistent memory-aware scheduling for serverless workloads, in: 2023 IEEE [52] B. Jambunathan, K. Yoganathan, Architecture decision on using microservices or
International Parallel and Distributed Processing Symposium Workshops serverless functionswith containers, in: 2018 International Conference on Current
(IPDPSW), IEEE, 2023, pp. 615621. Trends Towards Converging Technologies (ICCTCT), 2018, pp. 17, https://doi.
[30] Meenakshi Sethunath, Yang Peng, A joint function warm-up and request routing org/10.1109/ICCTCT.2018.8551035.
scheme for performing confident serverless computing, High-Confidence Comput. [53] A. Keshavarzian, S. Sharifian, S. Seyedin, Modified deep residual network
2 (3) (2022) 100071 no. architecture deployed on serverless framework of iot platform based on human
[31] Anisha Kumari, Manoj Kumar Patra, Bibhudatta Sahoo, Ranjan Kumar Behera, activity recognition application, Futur. Gener. Comput. Syst. 101 (2019) 1428,
Resource optimization in performance modeling for serverless application, Int. J. https://doi.org/10.1016/j.future.2019.06.009.
Inf. Technol. 14 (6) (2022) 28672875, no. [54] Gerald. Tesauro, Temporal difference learning and TD-gammon, Commun. ACM 38
[32] Vahldiek-Oberwagner, Anjo, and Mona Vij. "Meshwa: the case for a memory-safe (3) (1995) 5868.
software and hardware architecture for serverless computing." arXiv preprint [55] Eric Rutten, Nicolas Marchand, Daniel Simon, Feedback control as MAPE-K loop in
arXiv:2211.08056 (2022). autonomic computing. Software Engineering for Self-Adaptive Systems III.
[33] Divyanshu Saxena, Tao Ji, Arjun Singhvi, Junaid Khalid, Aditya Akella, Memory Assurances: International Seminar, Dagstuhl Castle, Germany, December 15-19,
deduplication for serverless computing with medes, in: Proceedings of the 2013, Revised Selected and Invited Papers, Springer International Publishing,
Seventeenth European Conference on Computer Systems, 2022, pp. 714729. Cham, 2018, pp. 349373.
[34] Jie Li, Laiping Zhao, Yanan Yang, Kunlin Zhan, Keqiu Li, Tetris: memory-efficient [56] Evangelina Lara, Leocundo Aguilar, Mauricio A. Sanchez, Jesús A. García, Adaptive
serverless inference through tensor sharing, in: 2022 USENIX Annual Technical security based on mape-k: a survey. Applied Decision-Making: Applications in
Conference (USENIX ATC 22), 2022. Computer Sciences and Engineering, Springer International Publishing, Cham,
[35] Dmitrii Ustiugov, Plamen Petrov, Marios Kogias, Edouard Bugnion, Boris Grot, 2019, pp. 157183.
Benchmarking, analysis, and optimization of serverless function snapshots, in: [57] Jeffrey O. Kephart, David M. Chess, The vision of autonomic computing, Computer.
Proceedings of the 26th ACM International Conference on Architectural Support (Long. Beach. Calif) 36 (1) (2003) 4150.
for Programming Languages and Operating Systems, 2021, pp. 559572. [58] Alistair McLean, Roy Sterritt, Autonomic Computing in the Cloud: an overview of
[36] Ao Wang, Jingyuan Zhang, Xiaolong Ma, Ali Anwar, Lukas Rupprecht, past, present and future trends, in: The 2023 IARIA Annual Congress on Frontiers
Dimitrios Skourtis, Vasily Tarasov, Feng Yan, Yue Cheng, {InfiniCache}: exploiting in Science, Technology, Services, and Applications: Technical Advances and
ephemeral serverless functions to build a {cost-effective} memory cache, in: 18th Human Consequences, 2023.
USENIX conference on file and storage technologies (FAST 20), 2020, pp. 267281. [59] Shahnawaz Ahmad, Mohd Arif, Javed Ahmad, Mohd Nazim, Shabana Mehfuz,
[37] Anurag Khandelwal, Yupeng Tang, Rachit Agarwal, Aditya Akella, Ion Stoica, Jiffy: Convergent encryption enabled secure data deduplication algorithm for cloud
elastic far-memory for stateful serverless analytics, in: Proceedings of the environment, Concurr. Computat.: Pract. Exp. 36 (21) (2024) e8205.
Seventeenth European Conference on Computer Systems, 2022, pp. 697713. [60] Shahnawaz Ahmad, Shabana Mehfuz, Shabana Urooj, Najah Alsubaie, Machine
[38] Nikolos, Orestis Lagkas, Chloe Alverti, Stratos Psomadakis, Georgios Goumas, and learning-based intelligent security framework for secure cloud key management,
Nectarios Koziris. "Fast and efficient memory reclamation for serverless Cluster. Comput. 27 (5) (2024) 59535979.
MicroVMs." arXiv preprint arXiv:2411.12893 (2024). [61] Shahnawaz Ahmad, Iman Shakeel, Shabana Mehfuz, Javed Ahmad, Deep learning
[39] Zahra Shojaee Rad, Mostafa Ghobaei-Arani, Data pipeline approaches in serverless models for cloud, edge, fog, and IoT computing paradigms: survey, recent
computing: a taxonomy, review, and research trends, J. Big. Data 11 (1) (2024) advances, and future directions, Comput. Sci. Rev. 49 (2023) 100568.
142, no. [62] Shahnawaz Ahmad, Mohd Arif, Shabana Mehfuz, Javed Ahmad, Mohd Nazim,
Deep learning-based cloud security: innovative attack detection and privacy
focused key management, IEEE Trans. Comput. (2025).
20