Files
opaque-lattice/papers_txt/A-multi-criteria-process-for-IT-project-success-evaluat_2026_Computer-Standa.txt
2026-01-06 12:49:26 -07:00

943 lines
113 KiB
Plaintext
Raw Blame History

This file contains invisible Unicode characters
This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
Computer Standards & Interfaces 97 (2026) 104122
Contents lists available at ScienceDirect
Computer Standards & Interfaces
journal homepage: www.elsevier.com/locate/csi
A multi-criteria process for IT project success evaluationAddressing a
critical gap in standard practices
João Carlos Lourenço a , João Varajão b,*
a
CEGIST, Instituto Superior Técnico, Universidade de Lisboa, Av. Rovisco Pais 1, 1049-001 Lisboa, Portugal
b
Centro ALGORITMI, Universidade do Minho, Campus de Azurém, 4804-533 Guimarães, Portugal
A R T I C L E I N F O A B S T R A C T
Keywords: The evaluation of project success is widely recognised as valuable for improving IT (Information Technology)
Project success project performance and impact. However, many processes fail to adequately address the requirements for a
Project evaluation sound evaluation due to their inherent complexity or by not complying with fundamental practical and theo­
Multi-criteria evaluation
retical concepts. This paper presents a process that combines a problem structuring method with a multi-criteria
MACBETH
Process
decision analysis approach to evaluate the success of IT projects. Put into practice in the context of a software
Methodology development project developed for a leading global supplier of technology and services, it offers a new way of
creating a model for evaluating project success and tackling uncertainty, bringing clarity and consistency to the
overall assessment process. A strong advantage of this process is that it is theoretically sound and can be easily
applied to other evaluation problems involving other criteria. It also serves as a call to action for the development
of formal standards in evaluation processes. Practical pathways to achieve such standardization include
collaboration through industry consortia, development and adoption of ISO frameworks, and embedding eval­
uation processes within established maturity models. These pathways can foster consistency, comparability, and
continuous improvement across organizations, paving the way for more robust and transparent evaluation
practices.
1. Introduction Additionally, several errors identified by decision analysis literature
[12,13] are often made, generating meaningless project success evalu­
The sustainable success of virtually any organisation is strongly ations [14]. Some common mistakes involve not including relevant
associated with the success of its projects [1]. A key factor for project criteria in the evaluation model, not distinguishing the performance of a
success is that project managers clearly understand what success means project from its value, assigning weights to evaluation criteria without
[2], which is usually not the case [3]. Despite different notions about considering the ranges of variation of their performance scales, and
what constitutes “project success” and the many criteria that can be used making calculations that violate measurement scales properties. In
for evaluation (e.g., cost, time, and performance, among others) [4], a other words, such evaluations are inconsistent with multi-attribute
project must satisfy its clients to be considered successful [58]. value theory (MAVT) and value measurement foundations.
Given the importance and complexity of the evaluation of projects, Considering these limitations, this research proposes a process that
companies should define and implement systematic processes for eval­ combines a problem structuring method with a multi-criteria approach
uating success to improve project management performance and the for evaluating the success of information technology (IT) projects sup­
impact of deliverables [9]. However, despite the models and techniques ported by a real-world case. This process was developed and applied in
that are currently available for assessing project success, they are typi­ the context of a project of GlobalSysMakers (for confidentiality reasons,
cally challenging to implement for a variety of reasons, notably the the name of the company herein is anonymized), a leading global sup­
complexity caused by using multiple and often conflicting objectives (e. plier of technology and services.
g., minimise cost and maximise quality), the scarcity of empirical studies In the GlobalSysMakers project, the need for a new process arose
reporting their genuine use in projects [10], and the fact that practices because the project management team felt that the scoring model
employed in companies are generally informal and simplistic [11]. initially defined for success assessment, while helpful, lacked accuracy.
* Corresponding author.
E-mail address: varajao@dsi.uminho.pt (J. Varajão).
https://doi.org/10.1016/j.csi.2025.104122
Received 12 August 2025; Received in revised form 7 November 2025; Accepted 23 December 2025
Available online 24 December 2025
0920-5489/© 2025 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
J.C. Lourenço and J. Varajão Computer Standards & Interfaces 97 (2026) 104122
Following an appraisal of several methodological alternatives, a new weights of several stakeholders without a discussion obliterates their
multi-criteria approach combined with a problem structuring method individual differences [26]. Additionally, the “importance of the
was shown to be the best solution, providing the required precision and criteria” should consider their respective performance ranges; other­
transparency to the process, along with a better understanding of the wise, the resulting weights would be arbitrary [27].
real meaning of the relative importance of each evaluation criterion. Basar [28] proposes a methodology to evaluate the performance of IT
This paper describes the process developed in detail so that it can be projects in a fuzzy environment. She first identifies the evaluation
replicated in other projects. Also, the results are presented and dis­ criteria using the balanced scorecard method. Second, she determines
cussed, including contributions to theory and practice. the criteria weights with expert judgments and hesitant fuzzy weights.
The proposed process, which combines a problem structuring Then, the weights are used to evaluate the performance of IT projects in
method with a multi-criteria approach for evaluating IT project success, a Turkish company. The weighting process described in this paper is
offers several theoretical implications. First, it advances the conceptu­ difficult for a non-expert evaluator to understand. Additionally, the
alization of project success by integrating both subjective stakeholder quantitative performances of projects on the criteria are systematically
perspectives and objective performance criteria, addressing the multi­ normalised to scores between 0 and 1 with a linear transformation that
dimensional and context-dependent nature of success in IT projects. may not correspond to the preferences of evaluators (which may be
Second, it contributes to decision theory and project management non-linear). The paper does not explain how to address the evaluation of
literature by demonstrating how problem structuring methods—typi­ the qualitative criteria.
cally underutilized in IT evaluation—can enhance the clarity and rele­ Ismail [29] applies the Delphi method and conducts a seminar with
vance of criteria selection and prioritization. Third, the integration of experts to identify a construction projects potential evaluation criteria
these methodologies provides a foundation for developing more robust, and group them into clusters. A relative importance index is calculated
transparent, and adaptable evaluation frameworks, which can inform for each criterion with a weighted average of the responses to a survey
future theoretical models and empirical studies. Ultimately, this expressed on a Likert scale. In a subsequent step, the experts 1) reduced
research supports the movement toward standardization by offering a the number of clusters and criteria and 2) assigned the same weight to
replicable and theoretically grounded process that can be refined and the latter. Then, a priority index was calculated for each criterion with
generalized across different organizational and project contexts. the Priority Evaluation Model (PEM) [30], which combines the “satis­
The remainder of this paper is organised as follows. Section 2 briefly faction” rate (assigned by the experts) and the “importance” of the cri­
reviews previous related work on project evaluation methods, cases, and terion. The overall project success is obtained with a weighted sum of
multi-criteria evaluation methods. Section 3 describes the case context the averages of the priority indexes obtained on each cluster and the
and the development of the success evaluation model using a process clusters weights. However, the paper does not explain how these
that combines a problem structuring model with a multi-criteria deci­ weights were assessed. Additionally, the Likert scale classifications
sion analysis approach. Section 4 discusses the results obtained. Finally, cannot be used for calculating averages or other arithmetic calculations.
Section 5 presents the conclusions and avenues for further work. Nguvulu et al. [31] use a Deep Belief Network (DBN) to evaluate eight
IT projects performances after training the DBN with five projects of 12
2. Previous related work months duration. The DPN automatically assigned weights and scores to
the criteria, considering possible interactions between them. The au­
2.1. Success of projects thors stress the advantage of this approach by not considering human
subjectivity. However, from our point of view, this is a weakness
Evaluation can be defined as the assessment and analysis of the ef­ because the subjective preferences of project managers, clients, and
ficiency and effectiveness of the projects activities and results. The other stakeholders should be considered in an evaluation process to
evaluation looks at what is planned to do, what has been achieved, and avoid arbitrary results generated by inadequate analytical approaches.
how it has been achieved [15]. Kahan and Goodstadt [16] conceive Wohlin and Andrews [32] apply principal component analysis and
evaluation as a set of questions and methods properly articulated to subjective evaluation factors to estimate which projects are successful or
review processes, activities, and strategies to achieve better results. unsuccessful out of a set of projects. This statistical approach may be
Therefore, the purpose of an evaluation is not just to find out what used to identify key project characteristics, but it does not allow for
happened but to use that information to make the project better [17,18]. evaluating the projects success according to stakeholders preferences.
There are several evaluation approaches in the literature, some Yan [33] suggests the combined use of the balanced scorecard (BSC)
considerably complex regarding their practical operationalisation and [34], the Analytic Hierarchy Process (AHP), and the Fuzzy Comprehensive
use. Varajão et al. [10] present a comprehensive review of models and Analysis method (FCA), respectively, to construct a performance criteria
methods for evaluating information systems project success. Some ex­ system, assess the criteria weights, and obtain an overall evaluation
amples are described and analysed next. score. The author explains how to obtain the performance criteria sys­
Bannerman and Thorogood [19] propose a framework for defining IT tem, but does not explain the weighting and scoring components.
project success that provides a common language for communication Yang et al. [35] apply a multi-criteria model for evaluating a soft­
and compares what stakeholders perceive as important. The authors list ware development projects success using the Analytical Network Process
the criteria that should be used to assess the success of a project within (ANP) [36] to assess the criteria weights at several hierarchical levels.
five domains (process, project management, product, business, and The scores of a project on a given criterion were obtained by calculating
strategy). However, they do not explain how to consider these domains the average of the scores assigned by five experts using a 5-point Likert
and criteria together. scale. Note that, as mentioned above, averages should not be calculated
Barclay and Osei-Bryson [20] describe a structured framework with ordinal scales. In addition, ANP is based on AHP, a method with
named Project Objectives Measurement Model (POMM) to identify the known issues that affect the validity of the criteria weights (see, e.g.,
criteria for evaluating an information system (IS) project and assigning a [3739]).
performance measure to each criterion. POMM applies value-focused Section 2.2 reviews important concepts and methods related to
thinking principles [21] and goal question metric methods [22]. An multi-criteria evaluation that are needed to create a proper value mea­
illustrative case is presented in which the importance of each criterion is surement model [40,41] to assess the success of a project.
directly assessed using an average of the stakeholders answers based on
a 5-point Likert scale. However, despite its virtues, this operation is 2.2. Multi-criteria evaluation
neither quantitatively nor substantively meaningful [23], respectively,
because a Likert scale is an ordinal scale [24,25] and averaging the In a multi-criteria value model, the measure of success of a project is
2
J.C. Lourenço and J. Varajão Computer Standards & Interfaces 97 (2026) 104122
given by the additive value function model: generates a proposal of weights compatible with the inputted qualitative
n n
judgments by solving the linear programming problem described in
∑ ( ) ∑
V(x1 , x2 , …, xn ) = wj vj xj , with wj = 1 and wj > 0, ∀j (1) Bana e Costa et al. [52]. The evaluators should validate the proposed
j=1 j=1 weighting scale and adjust it if needed.
Where V is the overall value score of the success of the project, wj is the 2.2.2. Methods to build value scales
weight of criterion j, vj(xj) is the value score on criterion j of the per­ We must assign fixed scores to the previously defined references to
formance xj, and nrepresents the number of evaluation criteria. build a criterion value scale. For example, we may assign 100 and
Despite being straightforward in form, this model is often poorly 0 value units to the “best” and the “worst” performances in each crite­
applied. We highlight that the criteria weights wj are scaling constants rion, respectively, although two other scores could be used so that the
[42], which represent trade-offs between criteria and not the erroneous highest score is assigned to the most preferred reference. Though this
notion of criterias measures of importance [21]. In addition, vj is a arbitrary assignment of scores leads to obtaining interval value scales
measurable value function, which represents both a preference order [25]. Additionally, the score of a project on a given criterion should
between performances on criterion j and a strength-of-preference order consider the preferences expressed by the evaluators upon performance
on differences of performances [43]. Moreover, the model requires the ranges within the criterion [43] (e.g., the difference in value between
criteria to be mutually preferentially independent [44], which entails performances A and B is worth twice the difference between C and D).
special care during the model structuring phase. Hereinafter, we present two numerical scoring methods and a qualita­
There are some fundamental aspects to note regarding the desired tive one.
properties for each evaluation criterion and also for the whole set of Edwards [53] presents the direct rating method. This numerical
criteria [45]. Each criterion should be essential for the evaluation and procedure first requires evaluators to rank the project performances in
controllable in the sense that the performance of the project influences order of decreasing attractiveness. The highest score (100 units) is
the degree to which the criterion is satisfied, independently of other assigned to the “best” performance and the lowest score (0 units) to the
additional decisions. Also, a family of evaluation criteria should be: “worst”. Intermediate scores are assigned to other performance levels
complete (the set of criteria should represent all of the relevant conse­ considering the intensities of preferences between each two of them,
quences of the project); nonredundant (the criteria should not repeat the knowing that the difference between the “best” and “worst” is worth 100
same concerns); concise (the number of criteria should be kept to the value units. This method allows scoring a project directly or indirectly
necessary minimum to evaluate the project); specific (each criterion using a performance measure (e.g., quantitative continuous, quantita­
should be able to assess the consequences of the project, instead of being tive discrete, or qualitative). von Winterfeldt and Edwards [54] describe
so broad that it compromises this purpose); and understandable (the the bisection method, also known as the mid-value splitting technique [55],
evaluation criteria should be clear in the eyes of any interested to create a value scale for a criterion. This numerical method assigns the
individual). highest score to the “best” performance (100) on the criterion and the
Depending on the ability to use appropriate numerical principles and lowest score (zero) to the “worst”. Then, it is asked which performance p
fluency to express oneself in words, an evaluator may prefer to apply a has a value equally distant from the “best” and the “worst” perfor­
numerical method or a non-numerical one [46]. In light of this, the mances, which means that the ranges “ptobest” and “ptoworst” have
remainder of this section focuses on quantitative and qualitative tech­ the same strength-of-preference. Therefore, the performance p would get
niques tailored for these two types of evaluators. Specifically, we delve a midpoint score of 50. Similar midpoint questions are asked to identify
into methods for criteria weighting and building a value scale for each other points that can be used to form a piecewise linear value function or
criterion. a curve. This method allows the creation of value functions upon a
quantitative and continuous performance measure on the criterion.
2.2.1. Weighting methods Bana e Costa and Vansnick [50] developed MACBETH [51] to create
A theoretically sound weighting method must consider the perfor­ a value scale for a criterion (and to weight criteria, as described in the
mance ranges defined by two fixed references on each criterion. Com­ preceding section). Still, contrary to the above-mentioned methods, it
mon references are, for example, the “worst” and the “best” needs only to elicit qualitative judgments. An evaluator judges the dif­
performances [39] or “neutral” and “good” performances [47]. Below, ference in attractiveness between two performances at a time, using the
we briefly describe two quantitative weighting procedures and one qualitative scale presented in the previous section, and inputs them into
qualitative. the software tool M-MACBETH. This tool verifies the consistency of the
Keeney and Raiffa [48] developed the trade-off procedure, which is a inputted judgments and generates a proposal of a value scale compatible
numerical method that requires establishing indifferences between two with them and with the scores assigned to the reference performances
fictitious projects using two criteria at each time. After establishing n 1 “best” and “worst” (or “good” and “neutral”) [52]. In the final step, the
indifference relationships for the n criteria, a system of equations is evaluator must validate and adjust the proposed value scale if needed.
solved, including one equation in which the sum of the weights equals 1, As in direct rating, this method allows scoring a project directly or
to obtain the criteria weights. indirectly using any performance measure.
Edwards and Barron [49] created the swing weighting method, which
is a numerical method that involves measuring the relative importance 2.3. Review summary
of the improvements (swings) that can be achieved on the criteria,
considering a change from the “worst” to the “best” performance on In the project success literature reviewed, most papers address the
each of them. identification of IT criteria (e.g., Lobato et al. [4] and Assalaarachchi
Bana e Costa and Vansnick [50] developed MACBETH [51] to weight et al. [56]) or success factors (e.g., Pinheiro et al. [57] and Jayakody and
the criteria. This procedure requires ranking the worstbest swings and Wijayanayake [58]), but only a few present an evaluation approach. In
judging them using the qualitative scale of difference in attractiveness: addition, the evaluation methods identified suffer from one or more
no (difference), very weak, weak, moderate, strong, very strong, or extreme. theoretical errors (e.g., weights used as indicators of importance, aver­
This qualitative scale is also used to judge the difference in attractive­ ages calculated with ordinal scales, application of techniques with
ness between two swings at a time. The elicited judgments are used to fill known flaws, and normalisation procedures that do not consider
in the upper triangular part of a matrix in the software tool non-linear preferences). Furthermore, as far as we know, there is no
M-MACBETH, which validates each judgments consistency with those description of a formal process that may guide the evaluators from
previously inputted (see [52], pp. 425443). Then, the software tool beginning to end, i.e., from identifying the evaluation criteria until
3
J.C. Lourenço and J. Varajão Computer Standards & Interfaces 97 (2026) 104122
reaching an overall measure of project success. Therefore, a gap in the IT different roles in the project; all of them were somehow interested in the
project literature needs to be addressed, which will be done by applying projects outcomes. The group had three members: two from TEAMGSM
multi-criteria evaluation principles. and TEAMUNI, and one external consultant. The team members were
Given the characteristics of the evaluators, the simplicity of use of selected considering their managerial responsibilities and to ensure
the MACBETH method and its software tool M-MACBETH, including its representativeness of all the involved parties. All the members agreed to
ability to validate the consistency of the value judgments expressed by be involved in the model development tasks. Note that larger groups
evaluators and to work with any performance measure (be it qualitative require different group processes, typically having separate meetings
or quantitative, continuous or discrete), this was the approach selected with stakeholders of different areas of interest to develop parts of the
to weight the criteria and build a value function for each criterion in the model, and with merge meetings gathering higher-level representatives
real-world case described in this paper. of the client to validate the work done by the stakeholders and to finish
the overall model [63].
3. Model development Fig. 1 depicts the model development tasks. The first task involves
identifying the aspects of interest for evaluating the projects success
3.1. Research setting (“problem structuring”, described in Section 3.3). This is a critical task
because it is not possible to develop a proper evaluation model without
GlobalSysMakers develops solutions in four business areas: mobility understanding the problem, which is the reason why several publica­
solutions, industrial technology, consumer goods, and energy and tions have been devoted to identifying the fundamental evaluation
building technology. It has several divisions, including automobile concerns to be addressed (e.g., [28,64]). Second, all the relevant eval­
multimedia, automobile accessories, electric tools, heating and hot uation criteria should be included in the model, and a descriptor of
water, and home appliances. It employs roughly 410,000 associates performance should be identified for each of them, enabling the
worldwide, has about 440 subsidiaries and regional companies in 60 assessment of the extent to which each criterion is met (“model struc­
countries, and employs nearly 70,000 associates in research and devel­ turing”, Section 3.4). Third, the evaluation component of the model must
opment at 125 locations. be built (“value model building”, Section 3.5), which includes the con­
The target project, here identified as PROJRD, was part of an R&D struction of a value function for each criterion to transform the perfor­
program that had the participation of GlobalSysMakers and a university. mances of the project into value scores (Section 3.5.1), and weighting
The project had as its primary goal the development of a software tool to the criteria to depict their trade-offs (Section 3.5.2). Last, the evaluation
automate the assessment of printed circuit boards (PCBs) design. PCBs model should be tested for adequacy and consistency (Section 4.1).
are essentially boards that connect electronic components used in all
(but the simplest) electronic products, such as household appliances or
vehicles. In addition to the software tool, the project deliverables 3.3. Problem structuring
included technical specifications, prototypes, and presentations.
The software development process adopted was based on a hybrid/ The problem structuring task aims to identify the fundamental ob­
agile methodology supported by SCRUM [59]. Agile methods for soft­ jectives [45] that determine the projects success from the clients
ware development have been increasingly used in the IT sector [60] and perspective. Such objectives are essential reasons for the projects suc­
are now mainstream [61]. In this project, agility enabled greater cess. Therefore, they should be used as criteria in the evaluation model.
adaptability of the development phases according to the companys However, the identification of these objectives in ill-structured
needs and requirements, which evolved along with the project lifecycle. problems may not be easy, which is why we opted to apply a problem
Thus, it was possible to deal with changes in the requirements that were structuring method (PSM) known as group map [65], which can be used in
reflected in the final deliverables during the project development. In a combination with a multi-criteria decision analysis approach [66].
later phase of the project, the SCRUM was coupled with a waterfall To begin structuring the problem, the decision-making group was
process since the objectives stabilised without needing a periodic up­ asked to say which aspects or concerns were relevant to evaluate the
date. The project team was multidisciplinary, incorporating engineers projects success. Then, for each of the concerns expressed, it was asked,
from GlobalSysMakers (TEAMGSM) and researchers from the university “Why is that important?” or “What would be the consequences of doing
(TEAMUNI). Together, the teams (TEAMGSM and TEAMUNI) had that?”, which allowed us to identify other aspects.
electronics, software engineering, and project management skills. Fig. 2 depicts the complete group causal map built with the answers
On average, the team allocated 1040 h per month to the project
(approximately 6.5 Full-Time Equivalent), distributed by the different
tasks of the project and according to the functions performed by each
element (three of the team members were not full-time in the project).
The project had a duration of 36 months.
The projects overall success was first assessed using a simple grid
scoring model built by non-specialists in evaluation, which directly
scored the project on several criteria and assigned importance weights.
However, the project management team felt the need for a more
advanced model to improve confidence in the evaluation. More in-depth
research on multi-criteria evaluation revealed some misinterpretations
in that process, which ultimately led to the development of a new model
in line with decision analysis principles. This paper describes the new
evaluation model.
3.2. Development tasks
The model development process started by asking the project man­
ager to identify the members who should form the decision-making
group [62], i.e., the group in charge of developing the model to eval­
uate the projects success. It was recommended to select members with Fig. 1. Model development tasks.
4
J.C. Lourenço and J. Varajão Computer Standards & Interfaces 97 (2026) 104122
Fig. 2. Group map.
of the elements of the group using the software tool “Decision Explorer”
(from Banxia Software Ltd., https://banxia.com/dexplore), which
automatically numbered the concerns for identification purposes. This
map results from several iterations, adding some aspects and removing
others. Note that a specific concern may be expressed by one statement
(e.g., “(33) good requirements definition”) or by two statements sepa­
rated by an ellipsis, which depicts a positive pole and a negative one to
clarify the meaning of the concern (e.g., “15 time fulfilment… time
exceeded”). An arrow between two concerns indicates the direction of
causality. When an arrow points to a concern with two poles, it means
that the concern affected is the one at the positive pole (e.g., a “(29) good Fig. 3. Projects success evaluation criteria.
contract management” contributes to the positive pole of “(1) cost
fulfilment… cost exceeded”; in the reverse case, the arrow would have a problem structuring task.
negative sign near its head). The concerns represented by these criteria are as follows:
In Fig. 2, it is possible to identify chains of means-ends objectives. For
example, an “(31) effective change management” contributes to the • Scope/quality fulfilment (ScoQual)—the extent to which the planned
“(36) deliverables use”, which respectively allows to “(41) reduce users (functional and non-functional) requirements were fulfilled (this
repetitive work”, which contributes to “increase users satisfaction”. criterion resulted from concern 14 in Fig. 2).
Although the “(41) reduce users repetitive work” is a means-objective
to the end-objective “(39) increase users satisfaction”, the group The prime deliverable of the project is a software tool to support the
considered the former a fundamental objective because it is important in PCBs design assessment, the other deliverables being subsidiary to this
itself and not because of its contribution to the latter. Therefore, “(41) tool. In the end, if the software tool does not comply with a minimum set
reduce users' repetitive work” will be used as an evaluation criterion. of planned requirements, it will not be able to assess the PCBs design
Objective “(39) increase users' satisfaction” was considered too broad to and will compromise the investment objectives.
evaluate the projects success and thus will not be used.
• Cost fulfilment (Cost)—the extent to which the planned cost was
fulfilled (this criterion resulted from concern 1 in Fig. 2).
3.4. Model structuring
The budget defined for the project needs to be carefully managed due
3.4.1. Evaluation criteria to being financed by an external R&D entity with a very narrow margin
Fig. 3 depicts the seven evaluation criteria that emerged from the of deviation.
concerns highlighted in bold in the group causal map developed in the
5
J.C. Lourenço and J. Varajão Computer Standards & Interfaces 97 (2026) 104122
• Time fulfilment (Time)—the extent to which the planned time was direct (the descriptor levels should directly describe the performances on
fulfilled (this criterion resulted from concern 15 in Fig. 2). the corresponding criterion), operational (the information concerning
the performances of the project can be obtained and value judgments
Since this project is part of a large program, time fulfillment is a can be made), understandable (performances and value judgments made
significant management aspect because all the programs projects must using the descriptor can be clearly understood and communicated).
be finished simultaneously due to the programs constraints. In other Table 1 presents the list of all the descriptors created to measure the
words, not meeting the deadline in this project would mean completing performance of the project, as well as two reference performance levels,
it in whatever form it is in when the program reaches its end, complying “neutral” and “good”, for each of them. Note that the definition of two
or not with the scope, and delivering or not what was planned. reference performance levels is required to weigh the criteria, allowing
comparisons between criteria preference ranges and defining two fixed
• Increase of the number and type of errors identified in each verification anchors for the value scales (see Section 2.2). Furthermore, the use of a
cycle (IncNoType)—the extent to which the number and type of errors “neutral” performance level (which corresponds to a performance that is
identified in each PCBs verification cycle increase (this criterion neither positive nor negative on the criterion) and of a “good” perfor­
resulted from concern 43 in Fig. 2). mance level (which corresponds to a very positive performance on the
criterion) allows to increase the understandability of the criterion, and
Before the project was implemented in the company, the PCB designs are thus preferable to the “worst” and the “best” references used as ex­
had been checked mainly in a semi-automatic way by specialised engi­ amples in Section 2.2.
neers. Due to the many PCB components, details, and rules to review, it As shown in Table 1, the criteria scope/quality fulfilment and increase
was virtually impossible to check all of the required features. The in the number and type of errors identified in each verification cycle do not
consequence was the late detection of some errors in more advanced have direct descriptors of performance. For these criteria, constructed
stages of the projects, or, in other words, in later verification cycles. This descriptors were developed combining the characteristics inherent to
accounts for the importance of the new software tool to increase the those criteria, as explained next (Bana e Costa et al. [67] describe a
number and type of errors identified early on in each verification cycle, detailed procedure for creating constructed descriptors).
thereby reducing the design costs. To measure the performance of the project on the scope/quality
fulfilment criterion, several requirements that deliver different contri­
• Reduction of the number of verification cycles (RNVC)—the extent to butions to the projects success were considered, following the MoSCoW
which the number of verification cycles is reduced (this criterion method principles [68]. These requirements were classified into three
resulted from concern 37 in Fig. 2). types (“must have”, “important to have”, and “nice to have”) and
combined to obtain the performance levels of the descriptor presented in
A PCB typically needs to go through several verification cycles until Table 2.
it is free from errors and ready for production. When errors are detected To measure the performance of the project on the increase of the
in a verification cycle, the PCB design needs to be corrected and tested number and type of errors identified in each verification cycle criterion,
again, possibly requiring a new verification cycle. Each verification several combinations of the number and type of errors identified at each
cycle of a PCB design implies high costs. Furthermore, there is the risk of verification cycle (based on a past project) need to be considered (see
detecting errors only at the production stage, with even more severe Table 3). For example, a “5 % increase in the number of identified er­
consequences. A primary expected result of the new software tool is to rors” and a “10 % increase in the type of identified errors” is a perfor­
reduce the number of verification cycles by enabling the early detection mance depicted as level “E5 T10”. A verification cycle includes a series
of errors. of tests to check for errors in the PCBs design or if it is ready for pro­
duction (free from errors).
• Improve efficiency (ImpEff)—the extent to which the number of We note that the indicators used in the constructed scales presented
verified rules increases in each verification cycle without increasing in Tables 2 and 3 cannot be considered in isolation, as they are mutually
the involved human resources (this criterion resulted from concern preferentially dependent. For example, in Table 3, an increase of 10 % in
42 in Fig. 2).
Since the process for verifying the PCBs design rules is semi- Table 1
automatic, with a substantial part of manual labour, the current num­ Descriptors of performance.
ber of specialised engineers can only check some of the relevant aspects. Criterion Descriptor Neutral Good
With the new software tool, it is expected that the same number of en­
Scope/quality fulfilment Constructed L2 L3
gineers can check a greater number of design rules, not spending more (ScoQual) descriptor (see
time doing it. Table 2)
Cost fulfilment (Cost) Cost of the project Planned 95 % of the
• Reduction of the repetitive work of the users (RRWU)—the extent to (k€) cost planned cost
(k€ 500) (k€ 450)
which the number of rules manually verified is reduced in each Time fulfilment (Time) Project duration Planned 95 % of the
verification cycle (this criterion resulted from concern 41 in Fig. 2). (weeks) time planned time
(96 (90 weeks)
In the semi-automatic verification of PCBs design rules, manual la­ weeks)
Increase in the number and Constructed E5 T0 E10 T5
bour is repetitive and prone to errors due to the fatigue of specialists.
type of errors identified in descriptor (see
Automating most of the rules assessment is expected to reduce the re­ each verification cycle Table 3)
petitive work of these specialists and free them to perform other tasks. (IncNoType)
Reduction of the number of Number of 1 cycle 2 cycles
3.4.2. Descriptors of performance verification cycles verification cycles
(RNVC) decreased
In this task, we associate a descriptor of performance with each Improve efficiency (ImpEff) Number of verified 0% 40 %
evaluation criterion to measure how much the project satisfies the cri­ rules increased ( %)
terion. According to Keeney [45], a descriptor should be unambiguous (to Reduction of the repetitive Number of rules 0% 10 %
describe the performances on the associated criterion clearly), compre­ work of the users (RRWU) manually verified
reduced ( %)
hensive (to cover the range of possible performances on the criterion),
6
J.C. Lourenço and J. Varajão Computer Standards & Interfaces 97 (2026) 104122
Table 2 scope/quality fulfilment criterion with a discrete descriptor, and time
Scale for “scope/quality fulfilment” criterion. fulfilment criterion with a continuous descriptor.
Performance levels Fig. 4 presents the matrix of judgments for the scope/quality fulfilment
criterion. Table 2 shows the constructed descriptor for this criterion
The project…
…satisfied all the requirements “must have” and “important to have” L1 where: L1 means “the project satisfied all the requirements must have
and most of the “nice to have” and important to have and the majority of the nice to have”, L2 means
…satisfied all the requirements “must have” and at least 85 % of the L2 = Good “the project satisfied all the requirements must have and at least 85 %
“important to have” and at least 20 % of the “nice to have” (or an of the important to have and at least 20 % of the nice to have (or an
equivalent performance on the requirements “important to have”
and “nice to have”)
equivalent performance)”, and L3 means “the project satisfied all the
…satisfied all the requirements “must have” and at least 60 % of the L3 = requirements must have and at least 60 % of the important to have
“important to have” and at least 20 % of the “nice to have” (or an Neutral and at least 20 % of the nice to have (or an equivalent performance)”.
equivalent performance on the requirements “important to have” We can see in Fig. 4 that the difference in attractiveness between “L1”
and “nice to have”)
and “L2 = Good” was deemed weak by the evaluators, whereas the
…did not satisfy one requirement “must have”, or satisfied less than 60 L4
% of the requirements “important to have” difference in attractiveness between “L2 = Good” and “L3 = Neutral”
…did not satisfy more than one requirement “must have” L5 was considered moderate. Therefore, the difference in value between
“L1” and “L2 = Good” should be lower than the difference between “L2
= Good” and “L3 = Neutral”, which can be confirmed in the value scale
Table 3 presented in Fig. 6a, where the former difference corresponds to 65
Constructed scale for “increase of the number and type of errors identified in value units and the latter to 100.
each verification cycle” criterion. The time fulfilment criterion has the descriptor of performance
Increase in the number of Increase in the type of Level
“project duration (in weeks)” with the references “96 weeks = Neutral”
identified errors (E) identified errors (T) and “90 weeks = Good”. To build a value function for this criterion, first,
we created three more equally spaced performance levels: one worse
10 % 10 % E10 T10
10 % 5% E10 T5 = than “neutral” (99 weeks), one between “neutral” and “good” (93
Good weeks), and one better than “good” (87 weeks). Then, the evaluators
10 % 0% E10 T0 judged the differences in attractiveness between each two of these
5% 10 % E5 T10 levels, together with the “neutral” and the “good” levels, resulting in the
5% 5% E5 T5
5% 0% E5 T0 =
matrix of judgments presented in Fig. 5.
Neutral Looking at the diagonal (above the grey shaded cells) of the matrix in
0% 0% E0 T0 Fig. 5 we see that the intensities of the differences in attractiveness
between each two consecutive levels increase more when the number of
weeks exceeds 93 weeks: the evaluators considered weak the differences
the number of identified errors (E) is valued more highly when the per­
in attractiveness between “87” and “90 = Good” (and also between “90
centage increase in the type of identified errors (T) is greater. Otherwise, the
= Good” and “93”), whereas they considered moderate the difference in
number and the type of identified errors could have been used as in­
attractiveness between “93” and “96 = Neutral”, and very strong the
dicators for two separate evaluation criteria.
difference between “96 = Neutral” and “99”. Therefore, the difference in
After the seven criteria had been clearly identified and their de­
value between “87” and “90 = Good” (and also between “90 = Good”
scriptors of performance established, the decision-making group was
and “93”) should be lower than the difference in value between “93” and
asked whether there was any additional aspect that might be considered
“96 = Neutral”, and the latter should also be lower than the difference in
in assessing the projects success. The negative response indicated that
value between “96 = Neutral” and “99”, which can be confirmed in the
this set of criteria was exhaustive and, consequently, that the value tree
value function presented in Fig. 6c (each of the first two intervals cor­
presented in Fig. 3 could be considered complete.
responds to 40 value units, whereas the third and fourth equal 60 value
units and 160, respectively). Therefore, this function shows that the
3.5. Value model building evaluators considered that increments in time after 93 weeks are
increasingly penalizing for the projects success.
3.5.1. Value functions We emphasize that the decision group made these judgments for
As previously described, a descriptor of performance provides a way each criterion independently of the performance levels or the differences
of measuring the projects performance on its associated criterion. in attractiveness on the remaining criteria, thereby supporting the
However, to build a value model, we also need to obtain the value of assumption of mutual preferential independence between criteria.
each plausible performance of the project (in the form of a value scale or Fig. 6 (6a6g) presents the value functions of all the evaluation
value function), which requires knowing the preferences of the evalua­ criteria.
tors upon differences in performances on the corresponding criterion.
For that purpose, we applied the MACBETH method [51]. As 3.5.2. Criteria weighting
described in Section 2.2, the questioning procedure of MACBETH re­ Weighting requires establishing trade-offs between criteria, which is
quires the evaluators to answer questions of difference in attractiveness typically demanding because it implies comparing performance im­
between two performance levels at each time, using the qualitative provements on different criteria. The improvements (swings) are defined
scale: no (difference in attractiveness), very weak, weak, moderate, between the two predefined performance references, “neutral” and
strong, very strong, and extreme. The answers provided are used for “good”, in each criterion.
filling in a matrix of judgments in the M-MACBETH software tool, which According to the MACBETH weighting procedure, the first step was
analyses the consistency of the answers as soon as they are inserted, and to rank the “neutralgood” swings in order of decreasing preference
then generates (by linear programming) a proposal of value scale which (Fig. 7). The evaluators considered the swing from “1 to 2 verification
is compatible with the answers provided, given the fixed value scores cycles decreased” as the most important one (1st in Fig. 7), which im­
assigned to the “neutral” and the “good” performances (0 and 100 value plies that the criterion “reduction of the number of verification cycles
units, respectively). (RNVC)” will have the highest weight. In contrast, the criterion
We present two examples of applying the MACBETH method to build “reduction of repetitive work of the users (RRWU)” will obtain the
value functions for criteria with different descriptors of performance: lowest weight because it has the least important “neutralgood” swing
7
J.C. Lourenço and J. Varajão Computer Standards & Interfaces 97 (2026) 104122
Fig. 4. MACBETH judgment matrix for the “Scope/quality fulfilment” criterion.
Fig. 5. MACBETH judgment matrix for the “time fulfilment” criterion.
(7th in Fig. 7). criteria, because their performances are not worse than “neutral” in any
In the second step, the improvements provided by the criteria swings of the criteria and are better than it in several criteria. Therefore, both
were judged qualitatively using the MACBETH semantic scale (Fig. 8), scenarios dominate [69] a “neutral project”. Additionally, we may see
which allowed filling in the rightmost column in Fig. 9. For example, the that scenario “PCB red 2 cycles” has an overall score very close to that of
improvement provided by the most important swing [RNVC] was a “good project” (100 units), whereas the value of scenario “PCB red 1
considered extreme, whereas the least important “neutralgood” swing cycle” is almost mid-distance from a “neutral project” and a “good
[RRWU] was judged weak. project”.
Then, the differences in attractiveness between each two “neu­ However, it is not robust to say that the scenario “PCB no red of
tralgood” swings were assessed to fill in the remaining cells of the first cycles” corresponds to an unsuccessful project, looking only at its overall
row of the weighting matrix and fill in the diagonal above the shaded value score. We must determine if its overall result will always be worse
cells in Fig. 9. For example, Fig. 10 depicts the comparison of the than that of a “neutral project” when in the face of the uncertainty
“neutralgood” swings in the reduction of the number of verification cycles defined for the model parameters (i.e., the value scores and criteria
(RNVC) criterion and in the increase in the number and type of errors weights). In fact, the evaluators considered it plausible that: a) each
identified in each verification cycle (IncNoType) criterion, which was criterion weight (wj,j = 1, …, 7) may vary within an interval defined by
( )
deemed as very strong (v. strong in Fig. 9). The other cells with no the lower and upper limits wj ≤ wj ≤ wj , j = 1, …, 7 shown in Table 6;
judgments were filled in automatically (by transitiveness) with “P” and b) the value scores of the scenario “PCB no red of cycles” may have
(positive) judgments by M-MACBETH. ( ) ( )
plus or minus 5 value units (respectively denoted by vj yj and vj yj ,
Finally, the software tool applied the linear programming model
described in Bana e Costa et al. [51] to generate a proposal of a j = 1,…,7) in all the criteria for which this scenario has a performance
weighting scale consistent with the qualitative judgments expressed in different from “neutral” and “good”, otherwise it will keep 0 and 100,
the weighting matrix, which were subsequently validated by the eval­ respectively.
uators (with some minor adjustments), resulting in the weights pre­ The linear programming (LP) problem (2) was then used to test
sented in Fig. 11. whether a “neutral project” additively dominates [70] the scenario “PCB
no red of cycles”, which would require a negative maxD. The result
maxD = 9.575denotes that there is at least one combination of plausible
4. Results and discussion
scores and weights for which scenario “PCB no red of cycles” has a
higher overall value than that of a “neutral project”.
4.1. Model testing and results
The worst possible overall value for scenario “PCB no red of cycles”
was also calculated, with the LP problem (3), resulting in minD =
At this point, the actual performances of the project are already
14.10. Therefore, in the face of the uncertainty, the overall value score
known for most of the criteria, but not for the reduction of the number of
of scenario “PCB no red of cycles” may vary between 14.10 and 9.575.
verification cycles (RNVC) criterion, which will only be identified in the
long term. Therefore, three alternative scenarios were created with 7
∑ [ ( ) ( )]
hypothetical future performances on RNCV: no reduction at all (PCB no maxD = wj vj yj vj neutralj (2)
j=1
red cycles), a decrease of one verification cycle (PCB red 1 cycle), and a
decrease of two verification cycles (PCB red 2 cycles). The performances Subject to:
of these scenarios are shown in Table 4.
7
Applying the value functions previously defined for each criterion to wj = 1
the performances presented in Table 4, we obtain the partial and the j=1
overall value scores of the three scenarios shown in Table 5 using the
previously assessed criteria weights. wj ≤ wj ≤ wj , j = 1, …, 7
As seen in Table 5, the most advantageous scenario corresponds to
[ ( ) ]
“PCB red 2 cycles” with 94.60 overall value units, followed by “PCB red 7
∑ ( )
1 cycle” with 49.60, and “PCB no red of cycles” with 6.65. minD = wj vj yj vj neutralj (3)
j=1
Scenarios “PCB red 2 cycles” and “PCB red 1 cycle” undoubtedly
denote a successful project independently of the weights assigned to
8
J.C. Lourenço and J. Varajão Computer Standards & Interfaces 97 (2026) 104122
Fig. 6. Value functions of criteria: (a) scope/quality fulfilment, (b) cost fulfilment, (c) time fulfilment, (d) increase in the number and type of errors identified in each
verification cycle, (e) reduction of the number of verification cycles, (f) improve efficiency, (g) reduction of the repetitive work of the users.
9
J.C. Lourenço and J. Varajão Computer Standards & Interfaces 97 (2026) 104122
Fig. 7. Neutralgood swings ranking.
Fig. 8. Neutralgood swings weighting judgments.
Fig. 9. MACBETH weighting matrix (the P and I within the matrix respectively mean positive difference in attractiveness and indifference).
subject to:
10
J.C. Lourenço and J. Varajão Computer Standards & Interfaces 97 (2026) 104122
members. Therefore, the model has a form and content sufficient to
evaluate the projects success [71].
5. Discussion
The absence of a formal evaluation of project success results in the
waste of relevant lessons that can be used to enhance project manage­
ment practices [9,72]. This is a strong reason for implementing
well-structured processes to evaluate project success.
Any evaluation process should start by identifying the success
criteria according to the decision-makers preferences and systems of
values, which are inherently subjective. We underscore that an evalua­
tion model has an objective component (factual data) and a subjective
one (value judgments), which should be independently addressed.
Therefore, subjectivity is a key component in an evaluation process, but
it should not be confused with ambiguity, which should be avoided. That
is why the success evaluation criteria should be carefully identified, and
a measure of the performance of a project on each of those criteria must
be operationalised. The “neutral” and “good” references of intrinsic
value allow identifying the projects success level.
Fig. 10. Assessment of the difference in attractiveness between the “neu­ Throughout the development of the evaluation model, the members
tralgood” swings in RNVC and IncNoType. of the decision-making group were encouraged to engage in open dis­
cussion whenever differences of opinion arose. This approach enabled a
better understanding of their points of view and helped the group reach
an agreement on the way forward.
In the case described herein, the success of the project may depend
on the future performance of the reduction of the number of verification
cycles (RNVC) criterion. With “no reduction of verification cycles”, the
project may be unsuccessful, with 6.65 overall value units, caused by
its low performance and corresponding negative score (125 value
units) on this criterion. However, as we have seen, given the uncertainty
defined for the partial value scores and the criteria weights, this scenario
is not guaranteed to correspond to a negative evaluation. In fact, its
overall value may vary between 14.10 and 9.575 units.
With a “reduction of 1 verification cycle”, the project would obtain
49.60 overall value units, which is nearly a mid-distance evaluation
between a “good project” and a “neutral project”. With a “reduction of 2
verification cycles”, the project would obtain 94.60 overall value units,
Fig. 11. Criteria weights. which is very close to that of a “good project”.
Developing a transparent evaluation process, such as the one
7
∑ described here, will promote the decision-making groups understand­
wj = 1 ing and acceptance of the results. The participation of the decision-
j=1 makers in all of the process phases is a key element for this purpose,
which will allow them to develop a sense of ownership of the model
wj ≤ wj ≤ wj , j = 1, …, 7 [63]. However, this is not a practice found in the literature related to
After concluding the robustness analysis, the evaluation group evaluating project success, which offers an opportunity for
revisited the model and considered that it could deal with all the plau­ improvement.
sible performances and adequately considered the value judgments of its The proposed process, which integrates a problem structuring
Table 4
Performance profiles of the projects success for the three scenarios.
Scenario / Criterion ScoQual Cost (k€) Time IncNoType RNVC ImpEff RRWU
(weeks) ( %) ( %)
PCB no red of cycles L2 480 96 E10 T10 No decrease 60 15
PCB red 1 cycle L2 480 96 E10 T10 Decrease 1 cycle 60 15
PCB red 2 cycles L2 480 96 E10 T10 Decrease 2 cycles 60 15
Table 5
Value scores of the project success for the three scenarios.
Scenario / Criterion ScoQual Cost Time IncNoType RNVC ImpEff RRWU Overall value score
(15 %) (5 %) (8 %) (22 %) (45 %) (3 %) (2 %)
PCB no red of cycles 100 40 0 115 125 150 140 6.65
PCB red 1 cycle 100 40 0 115 0 150 140 49.60
PCB red 2 cycles 100 40 0 115 100 150 140 94.60
11
J.C. Lourenço and J. Varajão Computer Standards & Interfaces 97 (2026) 104122
Table 6
Plausible intervals for the criteria weights.
Criterion ScoQual Cost Time IncNoType RNVC ImpEff RRWU
Index (j) 1 2 3 4 5 6 7
Current weight (wj) 15 % 5% 8% 22 % 45 % 3% 2%
( )
Upper limit wj 18 % 7% 10 % 25 % 45 % 4% 2.5 %
Lower limit (wj ) 12 % 5% 8% 19 % 40 % 3% 2%
method with a multi-criteria decision analysis (MCDA) approach for encouraging future research to refine, validate, and extend the proposed
evaluating the success of information technology (IT) projects, offers framework. Ultimately, this work not only enriches theoretical under­
several significant theoretical contributions to the fields of project standing but also provides a foundation for more consistent, transparent,
management, decision sciences, and IS. First, it advances the conceptual and stakeholder-aligned evaluation practices in the IT project domain.
understanding of IT project success by addressing its inherently multi­
dimensional and context-dependent nature. Traditional models often 6. Conclusions
rely on narrow success criteria—such as time, cost, and scope—while
this research introduces a more holistic and stakeholder-sensitive Evaluating the success of IT projects should be a mandatory project
framework. By incorporating problem structuring methods, the pro­ management activity. However, this is not observed in the practice [11,
cess facilitates the elicitation and organization of the stakeholder per­ 72]. There are several contributions given by the process herein
spectives, which are often overlooked or underrepresented in described, which can be easily adapted to other evaluation problems:
conventional evaluation models. This contributes to theory by empha­
sizing the social and interpretive dimensions of project success, aligning • It shows how a multi-criteria approach may be used to evaluate IT
with contemporary views that success is not an objective outcome but a (software development) projects while avoiding committing critical
negotiated construct [73]. mistakes.
Second, the integration of MCDA techniques provides a rigorous and • It offers a transparent process.
transparent mechanism for prioritizing and aggregating evaluation • It involves the decision-makers in all of the model development
criteria, thereby enhancing the methodological robustness of success tasks.
assessment. This methodological synthesis bridges a gap in the literature • It identifies the fundamental objectives of decision-makers with the
by demonstrating how qualitative insights from problem structuring can help of a problem structuring method, avoiding ending up solving
be systematically translated into quantitative decision models. Theo­ the wrong problem [76].
retically, this supports the development of hybrid evaluation frame­ • It allows establishing quantitative and substantive meaningful [23]
works that are both contextually grounded and analytically sound. trade-offs between criteria (i.e., mathematically valid and unam­
Third, the application of the proposed process in a real-world case adds biguously understood).
empirical depth to the theoretical model, offering evidence of its prac­ • It allows the management of the project to focus on what matters for
tical relevance and adaptability. This empirical grounding strengthens the projects success.
the external validity of the framework and encourages further theoret­ • It can be implemented to evaluate the success of other projects, in
ical exploration across different organizational and project contexts. similar or different contexts.
The MACBETH approach has been successfully employed, with • The use of descriptors of performance clarifies what is intended to be
different nuances and across various processes, to evaluate projects or achieved in each criterion.
decision alternatives in diverse problem settings and for a wide range of • It distinguishes performance from value, instead of directly attrib­
organizations [74]. The process described in this paper, which combines uting scores to the project, mixing these two components.
problem structuring with the MACBETH approach and robustness • And, it allows creating value scales adjusted to the preferences of
analysis, may also be applied in other contexts, subject to the necessary evaluators, upon different types of performance (e.g., qualitative or
adjustments. quantitative, continuous or discrete).
Our proposed process can also be scaled to the program or portfolio
level, although this should be done with caution. In the case presented Additionally, it enables the identification of alternative scenarios to
here, we applied an additive value function model, which is compen­ deal with unknown future performances and to test the robustness of the
satory—meaning that poor performance on one criterion can be offset conclusions considering uncertainties on the model parameters.
by good performance on others. However, this assumption may not al­ In the target organization, given the shortcomings recognised in a
ways hold. In a program or portfolio context, for instance, if a key previous “grid scoring model”, the multi-criteria evaluation model of the
project performs poorly, that alone may render the entire program or real-world case described in this paper was built during an advanced
portfolio unsuccessful, regardless of the performance of the remaining stage of the projects development. This late development can be
projects. In such cases, a mixed model should be adopted, combining considered a threat to internal validity regarding consistency and a
classification rules to address the non-compensatory criteria with an limitation since the evaluation model should be built during the plan­
additive component for the compensatory ones. ning phase of a project and revisited during the project development to
Moreover, the research highlights the absence of standardized ap­ be improved, if needed, or adjusted to possible changes to the project
proaches for evaluating IT project success, which has long been a limi­ aim. Another threat to external validity should also be disclosed.
tation in both academic and professional domains. Standardization Namely, concerning scalability, further research is needed to test if the
facilitates the dissemination of knowledge and enhances predictability, proposed process can be scaled or adapted for different project sizes or
thereby minimizing uncertainty and reducing risk [75]. By proposing a types.
replicable and adaptable process, the study lays the groundwork for the In future work, it would be interesting to create a process capable of
development of formalized evaluation standards. This has implications dealing with all project phases, allowing the evaluation of its develop­
for theory-building, as it suggests a pathway toward unifying frag­ ment and evolution at several milestones, from the project initiation
mented evaluation practices under a coherent, theoretically informed until its termination. The process described in this paper may be
model. In doing so, it contributes to the ongoing discourse on stan­ extended to evaluate project success throughout the project lifecycle.
dardization in project management and information systems evaluation, This requires developing a model that includes both final and
12
J.C. Lourenço and J. Varajão Computer Standards & Interfaces 97 (2026) 104122
intermediate objectives (criteria) for measuring project success. The [10] J. Varajão, J.C. Lourenço, J. Gomes, Models and methods for information systems
project success evaluationa review and directions for research, Heliyon 8 (12)
intermediate objectives should be used during project development and
(2022), https://doi.org/10.1016/j.heliyon.2022.e11977.
later deactivated by setting their weights to zero and rescaling the [11] J. Varajão, J.Á. Carvalho, Evaluating the success of IS/IT projects: how are
remaining criteria weights so that they sum to one. Monitoring the companies doing it?, in: Proceedings of the 13th Pre-ICIS International Research
evolution of a projects success against a well-defined set of criteria will Workshop on IT Project Management (IRWITPM 2018), San Francisco, USA, 2018.
[12] R.L. Keeney, Common mistakes in making value trade-offs, Oper. Res. 50 (6)
allow identifying problems sooner and taking proper measures in time. (2002) 935945, https://doi.org/10.1287/opre.50.6.935.357.
Furthermore, the integration of the proposed evaluation process in the [13] J.E. Russo, P.J.H. Schoemaker, Decision Traps: The Ten Barriers to Brilliant
success management process [77] will add value to the management Decision-Making and How to Overcome Them, Doubleday, 1989.
[14] S. Lipovetsky, A. Tishler, D. Dvir, A. Shenhar, The relative importance of project
efforts. success dimensions, R&D Manag. 27 (2) (1997) 97106, https://doi.org/10.1111/
Finally, since artificial intelligence technology, especially with the 1467-9310.00047.
rise of Large Language Models (LLMs), has shown great potential in [15] Shapiro, J. (2005). Monitoring and evaluation. C.-W. A. f. C. Participation. htt
ps://www.civicus.org/view/media/Monitoring%20and%20Evaluation.pdf.
revolutionizing the automation of various complex tasks [78], it is [16] Kahan, B., & Goodstadt, M. (2005). The IDM manual: basics. http://sites.utoronto.
imperative to explore it in the context of success evaluation. ca/chp/download/IDMmanual/IDM_basics_dist05.pdf.
[17] V. Arumugam, J. Antony, M. Kumar, Linking learning and knowledge creation to
project success in Six Sigma projects: an empirical investigation, Int. J. Prod. Econ.
CRediT authorship contribution statement 141 (1) (2013) 388402, https://doi.org/10.1016/j.ijpe.2012.09.003.
[18] R. Linzalone, G. Schiuma, A review of program and project evaluation models,
João Carlos Lourenço: Writing review & editing, Writing orig­ Meas. Bus. Excell. 19 (3) (2015) 9099, https://doi.org/10.1108/MBE-04-2015-
0024.
inal draft, Visualization, Validation, Software, Methodology, Investiga­ [19] P.L. Bannerman, A. Thorogood, Celebrating IT projects success: a multi-domain
tion, Formal analysis, Conceptualization. João Varajão: Writing analysis, in: Proceedings of the 45th Hawaii International Conference on System
review & editing, Writing original draft, Validation, Methodology, Sciences, Maui, HI, 2012.
[20] C. Barclay, K. Osei-Bryson, Determining the contribution of IS projects: an
Investigation, Data curation, Conceptualization.
approach to measure performance, in: Proceedings of the 42nd Hawaii
International Conference on System Sciences, Waikoloa, HI, 2009.
[21] R.L. Keeney, Value-Focused Thinking: A Path to Creative Decisionmaking, Harvard
Declaration of competing interest
University Press, 1992.
[22] R. Solingen, E. Berghout, The Goal/Question/Metric Method: A Practical Guide for
The authors declare that they have no known competing financial Quality Improvement of Software Development, McGraw-Hill, 1999.
[23] S. French, Decision Theory: An Introduction to the Mathematics of Rationality,
interests or personal relationships that could have appeared to influence
Ellis Horwood, 1986.
the work reported in this paper. [24] R. Göb, C. McCollin, M. Ramalhoto, Ordinal methodology in the analysis of Likert
scales, Qual. Quant. 41 (5) (2007) 601626, https://doi.org/10.1007/s11135-007-
9089-z.
Acknowledgement
[25] S.S. Stevens, On the theory of scales of measurement, Science 103 (2684) (1946)
677680, https://doi.org/10.1126/science.103.2684.677.
This work has been supported by FCT Fundação para a Ciência e [26] W. Edwards, J.R. Newman, Multiattribute evaluation, in: T. Connolly, H.R. Arkes,
Tecnologia within the R&D Unit Project Scope UID/00319/2025 - K.R. Hammond (Eds.), Judgment and Decision Making: An Interdisciplinary
Reader, 2nd ed, Cambridge University Press, 2000, pp. 1734.
Centro ALGORITMI (ALGORITMI/UM). João C. Lourenço acknowledges [27] R. von Nitzsch, M. Weber, The effect of attribute ranges on weights in
the financial support of Portuguese funds through FCT Fundação para multiattribute utility measurements, Manag. Sci. 39 (8) (1993) 937943, https://
a Ciência e a Tecnologia, I.P., under the project UID/97/2025 (CEGIST). doi.org/10.1287/mnsc.39.8.937.
[28] A. Basar, A novel methodology for performance evaluation of IT projects in a fuzzy
João C. Lourenço acknowledges the financial support of Portuguese environment: a case study, Soft Comput. 24 (14) (2020) 1075510770, https://doi.
funds through FCT Fundação para a Ciência e a Tecnologia, I.P., under org/10.1007/s00500-019-04579-y.
the project UID/97/2025 (CEGIST). [29] H.N. Ismail, Measuring success of water reservoir project by using delphi and
priority evaluation method, in: Proceedings of the IOP Conference Series: Earth
and Environmental Science 588, 2020 042021, https://doi.org/10.1088/1755-
Data availability 1315/588/4/042021.
[30] J.H. Yu, H.R. Kwon, Critical success factors for urban regeneration projects in
Korea, Int. J. Proj. Manag. 29 (7) (2011) 889899, https://doi.org/10.1016/j.
The data is presented in the article. ijproman.2010.09.001.
[31] A. Nguvulu, S. Yamato, T. Honma, Project performance evaluation using deep
References belief networks, IEEJ Trans. Electron. Inf. Syst. 132 (2) (2012) 306312, https://
doi.org/10.1541/ieejeiss.132.306.
[32] C. Wohlin, A.A. Andrews, Assessing project success using subjective evaluation
[1] R. Colomo-Palacios, I. González-Carrasco, J.L. López-Cuadrado, A. Trigo, J.
factors, Softw. Qual. J. 9 (1) (2001) 4370, https://doi.org/10.1023/a:
E. Varajao, I-Competere: using applied intelligence in search of competency gaps in
1016673203332.
software project managers, Inf. Syst. Front. 16 (4) (2014) 607625, https://doi.
[33] X. Yan, Utilizing the BSC method for IT performance evaluation of construction
org/10.1007/s10796-012-9369-6.
companies, in: Proceedings of the First International Conference on Information
[2] M.A. Kafaji, Interchange roles of formal and informal project management on
Science and Engineering, Nanjing, China, 2009.
business operational success, Prod. Plan. Control (2022) 121, https://doi.org/
[34] R.S. Kaplan, D.P. Norton, The balanced scorecardmeasures that drive
10.1080/09537287.2022.2089265.
performance, Harv. Bus. Rev. 70 (1) (1992) 7179.
[3] L.A. Ika, J.K. Pinto, The “re-meaning” of project success: updating and recalibrating
[35] C.L. Yang, R.H. Huang, M.T. Ho, Multi-criteria evaluation model for a software
for a modern project management, Int. J. Proj. Manag. 40 (7) (2022) 835848,
development project, in: Proceedings of the IEEE International Conference on
https://doi.org/10.1016/j.ijproman.2022.08.001.
Industrial Engineering and Engineering Management, Hong Kong, China, 2009.
[4] B. Lobato, J. Varajão, C. Tam, A.A. Baptista, CrEISPSa framework of criteria for
[36] T.L. Saaty, The Analytic Hierarchy Process: Planning, Priority Setting, Resource
evaluating success in information systems projects, Procedia Comput. Sci. 256
Allocation, McGraw-Hill, 1980.
(2025) (2025) 18211835, https://doi.org/10.1016/j.procs.2025.02.323.
[37] C.A. Bana e Costa, J.C. Vansnick, A critical analysis of the eigenvalue method used
[5] N. Agarwal, U. Rathod, Defining success for software projects: an exploratory
to derive priorities in AHP, Eur. J. Oper. Res. 187 (3) (2008) 14221428, https://
revelation, Int. J. Proj. Manag. 24 (4) (2006) 358370, https://doi.org/10.1016/j.
doi.org/10.1016/j.ejor.2006.09.022.
ijproman.2005.11.009.
[38] J.S. Dyer, Remarks on the analytic hierarchy process, Manag. Sci. 36 (3) (1990)
[6] R. Atkinson, Project management: cost, time and quality, two best guesses and a
249258, https://doi.org/10.1287/mnsc.36.3.249.
phenomenon, its time to accept other success criteria, Int. J. Proj. Manag. 17 (6)
[39] P. Goodwin, G. Wright, Decision Analysis for Management Judgment, 5th ed., John
(1999) 337342, https://doi.org/10.1016/S0263-7863(98)00069-6.
Wiley & Sons, 2014.
[7] H. Landrum, V.R. Prybutok, X. Zhang, The moderating effect of occupation on the
[40] V. Belton, T.J. Stewart, Multiple Criteria Decision Analysis: An Integrated
perception of information services quality and success, Comput. Ind. Eng. 58 (1)
Approach, Kluwer Academic Publishers, 2002.
(2010) 133142, https://doi.org/10.1016/j.cie.2009.09.006.
[41] R.L. Keeney, D. von Winterfeldt, Practical value models, in: W. Edwards, R.
[8] J.K. Pinto, D.P. Slevin, Project success: definitions and measurement techniques,
F. Miles Jr., D. von Winterfeldt (Eds.), Advances in Decision Analysis: From
Proj. Manag. J. 19 (1) (1988) 6772.
Foundations to Applications, Cambridge University Press, 2007, pp. 232252.
[9] J. Varajão, L. Magalhães, L. Freitas, P. Rocha, Success managementfrom theory to
practice, Int. J. Proj. Manag. 40 (5) (2022) 481498, https://doi.org/10.1016/j.
ijproman.2022.04.002.
13
J.C. Lourenço and J. Varajão Computer Standards & Interfaces 97 (2026) 104122
[42] J.S. Dyer, J.E. Smith, Innovations in the science and practice of decision analysis: [61] V. Henriquez, J.A. Calvo-Manzano, A.M. Moreno, T. San Feliu, Agile governance
the role of management science, Manag. Sci. 67 (9) (2020) 53645378, https://doi. practices by aligning CMMI V2.0 with portfolio SAFe 5.0, Comput. Stand.
org/10.1287/mnsc.2020.3652. Interfaces 91 (2025) (2025) 103881, https://doi.org/10.1016/j.csi.2024.103881.
[43] J.E. Smith, J.S. Dyer, On (measurable) multiattribute value functions: an [62] V. Ferretti, G. Montibeller, Key challenges and meta-choices in designing and
expository argument, Decis. Anal. 18 (4) (2021) 247256, https://doi.org/ applying multi-criteria spatial decision support systems, Decis. Support Syst. 84
10.1287/deca.2021.0435. (2016) 4152, https://doi.org/10.1016/j.dss.2016.01.005.
[44] J.S. Dyer, R.K. Sarin, Measurable multiattribute value functions, Oper. Res. 27 (4) [63] L.D Phillips, Decision conferencing, in: W. Edwards, R.F. Miles Jr., D. von
(1979) 810822, https://doi.org/10.1287/opre.27.4.810. Winterfeldt (Eds.), Advances in Decision Analysis: From Foundations to
[45] R.L Keeney, Developing objectives and attributes, in: W. Edwards, R.F. Miles Jr., Applications, Cambridge University Press, 2007, pp. 375399.
D. von Winterfeldt (Eds.), Advances in Decision Analysis: From Foundations to [64] T.Y. Chen, H.F. Chang, Critical success factors and architecture of innovation
Applications, Cambridge University Press, 2007, pp. 104128. services models in data industry, Expert Syst. Appl. 213 (2023) 119014, https://
[46] B. Fasolo, C.A. Bana e Costa, Tailoring value elicitation to decision makers' doi.org/10.1016/j.eswa.2022.119014.
numeracy and fluency: expressing value judgments in numbers or words, Omega [65] C.M. Smith, D. Shaw, The characteristics of problem structuring methods: a
44 (0) (2014) 8390, https://doi.org/10.1016/j.omega.2013.09.006. literature review, Eur. J. Oper. Res. 274 (2) (2019) 403416, https://doi.org/
[47] C.A. Bana e Costa, E.C. Corrêa, J.M. De Corte, J.C. Vansnick, Facilitating bid 10.1016/j.ejor.2018.05.003.
evaluation in public call for tenders: a socio-technical approach, Omega 30 (3) [66] M. Marttunen, J. Lienert, V. Belton, Structuring problems for multi-criteria
(2002) 227242, https://doi.org/10.1016/S0305-0483(02)00029-4. decision analysis in practice: a literature review of method combinations, Eur. J.
[48] R.L. Keeney, H. Raiffa, Decisions With Multiple Objectives: Preferences and Value Oper. Res. 263 (1) (2017) 117, https://doi.org/10.1016/j.ejor.2017.04.041.
Tradeoffs, John Wiley & Sons, 1976. [67] C.A. Bana e Costa, J.C. Lourenço, M.P. Chagas, J.C. Bana e Costa, Development of
[49] W. Edwards, F.H. Barron, SMARTS and SMARTER: improved simple methods for reusable bid evaluation models for the Portuguese Electric Transmission Company,
multiattribute utility measurement, Organ. Behav. Hum. Decis. Process. 60 (3) Decis. Anal. 5 (1) (2008) 2242, https://doi.org/10.1287/deca.1080.0104.
(1994) 306325, https://doi.org/10.1006/obhd.1994.1087. [68] D. Clegg, R. Barker, Case Method Fast-Track: A RAD Approach, Addison-Wesley
[50] C.A. Bana e Costa, J.C. Vansnick, MACBETH An interactive path towards the Longman Publishing, 1994.
construction of cardinal value functions, Int. Trans. Oper. Res. 1 (4) (1994) [69] M. Weber, Decision making with incomplete information, Eur. J. Oper. Res. 28 (1)
489500, https://doi.org/10.1016/0969-6016(94)90010-8. (1987) 4457, https://doi.org/10.1016/0377-2217(87)90168-8.
[51] C.A. Bana e Costa, J.M. De Corte, J.C. Vansnick, MACBETH, Int. J. Inf. Technol. [70] C.A. Bana e Costa, P. Vincke, Measuring credibility of compensatory preference
Decis. Mak. 11 (2) (2012) 359387, https://doi.org/10.1142/ statements when trade-offs are interval determined, Theory Decis. 39 (2) (1995)
S0219622012400068. 127155, https://doi.org/10.1007/BF01078981.
[52] C.A. Bana e Costa, J.M. De Corte, J.C. Vansnick, On the mathematical foundations [71] L.D. Phillips, A theory of requisite decision models, Acta Psychol. 56 (13) (1984)
of MACBETH, in: S. Greco, M. Ehrgott, J.R. Figueira (Eds.), Multiple Criteria 2948, https://doi.org/10.1016/0001-6918(84)90005-2.
Decision Analysis: State of the Art Surveys, Springer, 2016, pp. 421463, https:// [72] J. Pereira, J. Varajão, N. Takagi, Evaluation of information systems project
doi.org/10.1007/978-1-4939-3094-4_11. successinsights from practitioners, Inf. Syst. Manag. (2021) 118, https://doi.org/
[53] W. Edwards, How to use multiattribute utility measurement for social 10.1080/10580530.2021.1887982.
decisionmaking, IEEE Trans. Syst. Man Cybern. 7 (5) (1977) 326340, https://doi. [73] N. Takagi, J. Varajão, ISO 21502 and Success Management: A Required Marriage in
org/10.1109/TSMC.1977.4309720. Project Management, SAGE Open, 2025, pp. 111, https://doi.org/10.1177/
[54] D. von Winterfeldt, W. Edwards, Decision Analysis and Behavioral Research, 21582440251355046. July-September.
Cambridge University Press, 1986. [74] F.A.F. Ferreira, S.P. Santos, Two decades on the MACBETH approach: a
[55] C.W. Kirkwood, Strategic Decision Making: Multiobjective Decision Analysis with bibliometric analysis, Ann. Oper. Res. 296 (1) (2021) 901925, https://doi.org/
Spreadsheets, Duxbury Press, 1997. 10.1007/s10479-018-3083-9v.
[56] L.I. Assalaarachchi, M.P.P. Liyanage, C. Hewagamage, A framework of critical [75] J. Varajão, L. Lopes, A. Tenera, Framework of standards, guides and methodologies
success factors of cloud-based project management software adoption, Int. J. Inf. for project, program, portfolio, and PMO management, Comput. Stand. Interfaces
Syst. Proj. Manag. 13 (2) (2025) e4, https://doi.org/10.12821/ijispm130204. 92 (2025) (2025) 103888, https://doi.org/10.1016/j.csi.2024.103888.
[57] N. Pinheiro, J. Vrajão, I. Moura, Success factors of public sector information [76] I.I. Mitroff, T.R. Featheringham, On systemic problem solving and the error of the
systems projects in developing countries, Sustain. Futures 10 (2025) (2025) third kind, Behav. Sci. 19 (6) (1974) 383393, https://doi.org/10.1002/
101095, https://doi.org/10.1016/j.sftr.2025.101095. bs.3830190605.
[58] J. Jayakody, W. Wijayanayake, Critical success factors for DevOps adoption in [77] J. Varajão, Success Management as a PM knowledge area work-in-progress,
information systems development, Int. J. Inf. Syst. Proj. Manag. 11 (3) (2023) Procedia Comput. Sci. 100 (2016) (2016) 10951102, https://doi.org/10.1016/j.
6082, https://doi.org/10.12821/ijispm110304. procs.2016.09.256.
[59] K. Schwaber, J. Sutherland, The Scrum Guide - The Definitive Guide to Scrum: The [78] Y. Kong, N. Zhang, Z. Duan, B. Yu, Collaboration with generative AI to improve
Rules of the Game, scrumguides.org, 2020. https://scrumguides.org/docs/sc requirements change, Comput. Stand. Interfaces 94 (2025) (2025) 104013, https://
rumguide/v2020/2020-Scrum-Guide-US.pdf. doi.org/10.1016/j.csi.2025.104013.
[60] M. Jovanovic, A.L. Mesquida, A. Mas, R. Colomo-Palacios, Agile transition and
adoption frameworks, issues and factors: a systematic mapping, IEEE Access 8
(2020) (2020) 1571115735, https://doi.org/10.1109/ACCESS.2020.2967839.
14