Terminology

This webpage collects definition of scientific terms used in software evolution research.

A

Abstract syntax tree. Compilers often construct an abstract syntax tree (AST) for the semantic analysis. Its nodes are programming language constructs and its edges express the hierarchical relation between these constructs. From \cite{Koschke98}: "The structure of an AST is a simplification of the underlying grammar of the programming language, e.g., by generalization or by suppressing chain rules. (...) This structure can be generalized so that it can be used to represent programs of different languages.

Advice. Aspect definitions consist of pointcuts and advices. Advices are the code that crosscuts the dominant decomposition of a software system.

Agile software development. According to Scott W. Ambler, respected authority in the agile methods community, agile software development "is an iterative and incremental (evolutionary) approach to software development which is performed in a highly collaborative manner with "just enough ceremony that produces high quality software which meets the changing needs of its stakeholders. Agile methods refer to a collection of "ligthweight software development methodologies that are basically aimed at minimising risk and achieving customer satisfaction through a short feedback loop.

Anti-regressive work. Term introduced by Lehman \cite{LehmanBelady1985} to describe the work done to decrease the complexity of a program without altering the functionality of the system as perceived by users. Anti-regressive work includes activities such as code rewriting, refactoring, reengineering, restructuring, redocumenting, and so on.

Architecture. The architecture of a software system is the set of principal design decisions about the system. It is the structural and behavioural framework on which all other aspects of the system depend. It is the organisational structure of a software system including components, connections, constraints, and rationale.

Architectural style. David Garlan states that an architectural style "defines constraints on the form and structure of a family of architectural instances.

Aspect. A modular unit designed to implement a (crosscutting) concern. In other words, an aspect provides a solution for abstracting code that would otherwise be spread throughout (i.e., cross-cut) the entire program. Aspects are composed of pointcuts and advices.

Aspect exploration. The activity of locating opportunities for introducing aspects in non aspect-oriented software. A distinction can be made between manual exploration supported by special-purpose browsers and source-code navigation tools, and aspect mining techniques that try to automate this process of aspect discovery and propose the user one or more aspect candidates.

Aspect extraction. The activity that turns potential aspects into actual aspects in some aspect-oriented language, after a set of potential aspects have been identified in the aspect exploration phase.

Aspect evolution. The process of progressively modifying the elements of an aspect-oriented software system in order to improve or maintain its quality over time, under changing contexts and requirements.

Aspect migration. The process of migrating a software system that is written in a non aspect-oriented way into an aspect-oriented equivalent of that system.

Aspect mining. The activity of semi-automatically discovering those crosscutting concerns that potentially could be turned into aspects, from the source code and/or run-time behaviour of a software system.

Aspect-oriented software development. An approach to software development that addresses limitations inherent in other approaches, including object-oriented programming. The approach aims to address crosscutting concerns by providing means for systematic identification, separation, representation and composition. Crosscutting concerns are encapsulated in separate modules, known as aspects, so that localization can be promoted. This results in better support for modularization hence reducing development, maintenance and evolution costs.

Aspect weaving. The process of composing the core functionality of a software system with the aspects that are defined on top of it, thereby yielding a working system.

B

Bad smell. According to Kent Beck \cite{Fowler1999} a bad smell is a structure in the code that suggests, and sometimes even scream for, opportunities for refactoring.

C

Case study. According to \cite{Fenton&Pfleeger1997}, a case study is a research technique where you identify key factors that may affect the outcome of an activity and then document the activity: its inputs, constraints, resources, and outputs. Case studies usually look at a typical project, rather than trying to capture information about all possible cases; these can be thought of as "research in the typical". Formal experiments, case studies and surveys are three key components of empirical investigation in software engineering.

However, the term case study is also often used in an engineering sense of the word. Testing a given technique or tool on a representative case against a predefined list of criteria and reporting about the lessons learned.

CASE tool. A software tool that helps software designers and developers specify, generate and maintain some or all of the software components of an application. Many popular CASE tools provide functions to allow developers to draw database schemas and to generate the corresponding code in a data description language (DDL). Other CASE tools support the analysis and design phases of software development, for example by allowing the software developer to draw different types of UML diagrams.

Change charter. This term is used sometimes when developing a new system (or evolving an existing system) to refer to what can be potentially changed. It may be used as a synonym of "scope.

Change log.

Record with some of the information related to one or several amendments (i.e., changes) made to the code or to another software artefact. The record generally includes the responsible, the date and some explanation (e.g., reasons for which a change was made).

Clone. A software clone is a special kind of software duplicate. It is a piece of software (e.g., a code fragment) that has been obtained by cloning (i.e., duplicating via the copy-and-paste mechanism) another piece of software and perhaps making some additional changes to it. This primitive kind of software reuse is more harmful than it is beneficial. It actually makes the activities of debugging, maintenance and evolution considerably more difficult.

Clone detection. The activity of locating duplicates or fragments of code with a high degree of similarity and redundancy.

Component. In \cite{Shaw&Garlan1996}, Mary Shaw and David Garlan define software components as "the loci of computation and state. Each component has an interface specification that defines its properties, which include the signatures and functionality of its resources together with global relations, performance properties, and so on. (...)

Complexity. Structural complexity refers to the degree to which a program is difficult to understand by human developers in order to, for example, inspect the program, or modify it. There are other types of complexity (e.g., algorithmic complexity). Different measures of software complexity exist. One of the best known is McCabe's cyclomatic complexity.

Connector. In \cite{Shaw&Garlan1996}, Mary Shaw and David Garlan state that connectors are "the loci of relations among components. They mediate interactions but are not things to be hooked up (rather, they do the hooking up). Each connector has a protocol specification that defines its properties, which include rules about the types of interfaces it is able to mediate for, assurances about properties of the interaction, rules about the order in which things happen, and commitments about the interaction (...).

Consistency. Consistency is the absence of inconsistencies in or between software artefacts. If the software artefact under consideration is a program, we talk about program (in)consistency, if the software artefact is a model, we talk about model (in)consistency. If the software artefact is a (program or model) transformation, we talk about transformation (in)consistency.

Crosscutting concerns. Concerns that do not fit within the dominant decomposition of a given software system, and as such have an implementation that cuts across that decomposition. Aspect-oriented programming is intended to be a solution to modularise such crosscutting concerns.

D

Database conversion. In database migration, we can distinguish between two migration strategies.

Physical database conversion is a straightforward migration strategy according to which each source schema object (e.g., a record type or a data field) is converted to the closest construct of the target DMS model (e.g., a table or a column). This strategy is sometimes called one-to-one migration. This approach is fast, simple and inexpensive but generally yields a database with poor performance and weak maintainability.
Conceptual database conversion is a migration strategy that transforms the source data\-base into a clean and normalized target database that exploits the expressive power of the target DMS. This strategy comprises a reverse engineering phase, through which the conceptual schema of the database is recovered, followed by a forward engineering towards the new DMS. This approach is slow, expensive and relies on skilled developers, but its output is a high quality database that will be easy to maintain and to evolve.

Database reverse engineering. A special kind of reverse engineering. It is the process through which the logical and conceptual schemas of a legacy database, or of a set of files, are recovered from various information sources such as DDL code, data dictionary contents, database contents, or the source code of application programs that use the database.

Database model and database schema. In the database realm, a model M is a formal system comprising a closed set of abstract object categories and a set of assembly rules that states which arrangements of objects are valid. Since M is supposed to describe the structure, the properties and the behaviour of a class S of external systems, the semantics of M is specified by a mapping of M onto S. Any arrangement m of objects which is valid according to M describes a specific system s of class S. m is called a schema while s is the application domain or the universe of discourse. Among the most popular conceptual models we can mention the Entity-Relationship models, Object-Role models, relational models and UML class models. Among DBMS models, the SQL, CODASYL, IMS, Object-relational and XML models are curently the most widely used. We can essentially distinguish three types of database schemas:

Conceptual schema. A structured technology-independent description of the information about an application domain such as a company or a library. By extension, it is also an abstract representation of the existing or project database that is made up of the data of this domain.
Logical schema. The description of the data structures of a database according to the model of a specific technology, e.g., a RDBMS. The logical schema of a database is the implementation of its conceptual schema. Application programs know the database through its logical schema.
Physical schema. The technical description of a database where all the physical constructs (such as indexes) and parameters (such as page size or buffer management policy) are specified. The physical schema of a database is the implementation of its logical schema.

Decay. Decay is the antithesis of evolution. While the evolution process involves progressive changes, the changes are degenerative in the case of decay.

Dominant decomposition. The dominant decomposition is the principle decomposition of a program into separate modules. The tyranny of the dominant decomposition \cite{TarrEtAl1999.icse} refers to restrictions imposed by the dominant decomposition on a software engineer's ability to represent particular concerns in a modular way. Many kinds of concerns do not align with the chosen decomposition, so that the concerns end up scattered across many modules and tangled with one another.

Duplicate. A software duplicate is a code fragment that is redundant to another code fragment; often due to copy and paste. A negative consequence of duplication is that if one fragment is changed, each duplicate may need to be adjusted, too.
Note that a the term software duplicate is preferred over software clone. In English, clone suggests that one fragment is derived/copied from the other one. However, this is just one special type of software redundancy. Code fragments could also be similar by accident.

E

E-type system. One of the three types of software described by Lehman in his SPE program classification \cite{LehmanBelady1985}. The distinctive properties of E-type systems are:

the problem that they address cannot be formally and completely specified
the program has an imperfect model of the operational domain embedded in it
the program reflects an unbounded number of assumptions about the real world
the installation of the program changes the operation domain
the process of developing and evolving E-type system is driven by feedback

Error. An error is the part of the system state that is liable to lead to the subsequent failure \cite{selfAdaptiveSA2002}.

Evolution. According to Lehman and Ramil (chapter 1 of \cite{MadhavjiEtAl2006}), the term evolution reflects "a process of progressive, for example beneficial, change in the attributes of the evolving entity or that of one or more of its constituent elements. What is accepted as progressive must be determined in each context. It is also appropriate to apply the term evolution when long-term change trends are beneficial even though isolated or short sequences of changes may appear degenerative. For example, an entity or collection of entities may be said to be evolving if their value or fitness is increasing over time. Individually or collectively they are becoming more meaningful, more complete or more adapted to a changing environment.

Evolutionary process model. A software process model that explicitly takes into account the iterative and incremental nature of software development. A typical example is the so-called spiral software process model.

Externality. Term used mainly in economics to refer to the break-down of markets due to external influences. In open source software, network externalities have been used to refer to code importing, replication, tailoring or code sharing between projects which can lead to superlinear functional growth.

Extreme programming. Extreme programming (XP) is a specific instance of agile software development that aims to simplify and expedite the process of developing new software in a volatile environment of rapidly-changing requirements. XP is a lightweight process that offers a set of values, principles and practices for developing software that provides the highest value for the customer in the fastest way possible.

F

Fault. A fault is the adjudged or hypothesized cause of an error \cite{selfAdaptiveSA2002}.

Failure. A failure occurs when a system service deviates from the behaviour expected by the user \cite{selfAdaptiveSA2002}.

Feedback. In engineering, feedback refers to the case when at least some part of the output(s) of the system are fed back to the input, normally for control purposes. In systems thinking and related disciplines (e.g., system dynamics), feedback describes a property of many complex systems in which the outputs determine the inputs.

Forward engineering. Forward engineering is the traditional process of moving from high-level abstractions and logical, implementation-independent designs to the physical implementation of a system \cite{Demeyer&al2002}.

Fragile pointcut problem. This problem arises in aspect-oriented software development when pointcuts unintentionally capture or miss particular joinpoints as a consequence of their fragility with respect to seemingly safe modifications to the base program.

Free software. A popular mode of software distribution as a common good in which users can access, modify and re-distribute the code, under the terms of the license and some parts (e.g., notices) that should not been modified.

G

Graph transformation. Graph transformation (also known as graph rewriting or graph grammars) is a theory and set of associated tools that allows to modify graph-based structures by means of transformation rules, and to reason about the formal properties of these rules. It is an extension of the theory of term rewriting. One of its many useful applications is to formalize model transformations in the contex of model-driven software engineering.

Graph transformation rule. A graph transformation rule is composed of a Left-Hand Side (LHS) and a Right-Hand Side (RHS). The LHS of the rule specifies the pre-conditions that must be satisfied so that the rule can be applied. The RHS corresponds to the post-conditions of applying the rule. Executing a graph transformation rule consists of finding an occurrence (or match) of the LHS and transforming it into the RHS.

H

I

Implicit construct. In a database, a data structure or an integrity constraint that holds, or should hold, among the data, but that has not been explicitly declared in the DDL code of the database. Implicit compound and multivalued fields as well as implicit foreign keys are some of the most challenging constructs to chase when recovering the logical schema of a database.

Inconsistency. Paraphrased from \cite{SpanoudakisZisman2001}, an inconsistency is a situation in which two or more overlapping elements of one or different software artefacts make assertions about aspects of the system they describe that are not jointly satisfiable.

Information system. The subsystem of an organization aimed at collecting, memorizing, processing and distributing the information that is necessary to support the business and management processes of this organization. According to a limited meaning, an information system is a business software system comprising a database and the programs that use it.

J

Joinpoint. A joinpoint is a well-defined place in the structure or execution flow of a program where additional behaviour can be attached.

K

L

Legacy software. According to \cite{BrodieStonebraker1995}, a legacy system is any system that significantly resists modifications and change.

According to \cite{Demeyer&al2002}, legacy software is valuable software that you have inherited. It may have been developed using an outdated programming language or an obsolete development method. Most likely it has changed hands several times and shows signs of many modifications and adaptations.

M

Maintenance. According to the ISO Standard 12207 (1995), the software product undergoes modification to code and associated documentation due to a problem or the need for improvement. The objective of software maintenance is to modify the existing software while preserving its integrity.
According to the IEEE Standard 1219 (1999), software maintenance is the modification of a software product after delivery to correct faults, to improve performance or other attributes, or to adapt the product to a modified environment. In the ISO/IEC Standard 14764 (1999), maintenance is further subdivided into four categories:

Perfective maintenance is any modification of a software product after delivery to improve performance or maintainability.
Corrective maintenance is the reactive modification of a software product performed after delivery to correct discovered faults.
Adaptive maintenance is the modification of a software product performed after delivery to keep a computer program usable in a changed or changing environment.
Preventive maintenance refers to software modifications performed for the purpose of preventing problems before they occur. This type of maintenance, that does not alter the system functionality, is also referred to as anti-regressive work.

Metamodel. According to the Meta-Object Facility (MOF) standard, a metamodel is a model that defines the language for expressing a model.

Metric. According to the IEEE Standard 610-12 (1999), a metric is a quantitative measure of the degree to which a system, component or process possesses a given attribute.

Migration. Migration is a particular variant of re-engineering. In the context of software systems, migration refers to the process of moving a software system from one technological environment to another one that is, for some reason, considered to be better. Migrations can be very diverse in nature: changing the hardware infrastructure, changing the underlying operating system, moving data to another kind of database (database migration), changing the programming language in which the software has been written, and so on.

Model. A model is a simplified representation of a system on a higher level of abstraction. It is an abstract view on the actual system emphasizing those aspects that are of interest to someone. Depending on the system under consideration, we talk about software models (for software systems), database models (for database systems), and so on.

Model-driven engineering. A software engineering approach that promotes the use of models and transformations as primary artifacts throughout the software development process. Its goal is to tackle the problem of developing, maintaining and evolving complex software systems by raising the level of abstraction from source code to models. As such, model-driven engineering promises reuse at the domain level, increasing the overall software quality.

N

O

Open source software. Software of which the source code is available for users and third parties to be inspected and used. It is made available to the general public with either relaxed or non-existent intellectual property restrictions. It is generally used as a synonym of free software even though the two terms have different connotations. Open emphasises the accessibility to the source code, while free emphasises the freedom to modify and redistribute under the terms of the original license.

Outlier. An entity's metric value that is beyond a predefined threshold.

P

Pointcut. Aspect definitions consist of pointcuts and advices. Pointcuts define those points in the source code of a program where an advice will be applied (i.e., where crosscutting code will be "woven).

Precision. In data mining or information retrieval, precision is defined as the proportion of retrieved and relevant data or documents to all the data or documents retrieved. Precision is a measure of how well the technique performs in not returning nonrelevant items. Precision is 100\% when every document returned to the user is relevant to the query. Being very precise usually comes at the risk of missing documents that are relevant, hence precision should be combined with recall.

Program representation. A program representation consists of properties of a program specified in an alternate means to source code. Kontogiannis, in his article \cite{Kontogiannis93}, states that "Program representation is a key aspect for design recovery as it serves as the basis for any subsequent analysis chosen. Some of the most common program representation methods include (a) abstract syntax trees (...); (b) Prolog rules (...); (c) code and concept objects (...); (d) code action frames (...); (e) attributed data flow graphs (...); (f) control and data flow graphs (...). Most of these approaches represent and refer to the structural properties of a program."

Program understanding. Program understanding or program comprehension is "the task of building mental models of an underlying software system at various abstraction levels, ranging from models of the code itself to ones of the underlying application domain, for software maintenance, evolution, and re-engineering purposes" \cite{Muller1996}.

Q

R

Recall. In data mining or information retrieval, recall is defined as the proportion of relevant data or documents retrieved, out of all relevant data or documents known or available.

Recall is 100\% when every relevant item is retrieved. In theory, it is easy to achieve good recall: simply return every item in the collection, thus recall by itself is not a good measure and should be combined with precision.

Redesign. Redesign, in the context of software engineering, is the transformation of a system's structure to comply to a given set of constraints. Architectural redesign is a transformation at model level with the goal of achieving conformance to a specific architectural style.

Redundancy. Software redundancy is the superfluous repitition of code or data. Note that there is also "healthy redundancy. For example, many programming languages force us to specify an interface of a module, the declarations in the module body are then redundant to the interface items, and this is a desirably property.

Re-engineering. According to \cite{ChikofskyCross1990}, re-engineering is the examination and alteration of a subject system to reconstitute it in a new form and the subsequent implementation of the new form. Re-engineering generally includes some form of reverse engineering (to achieve a more abstract description) followed by some form of forward engineering or restructuring. This may include modifications with respect to new requirements not met by the original system.

Refactoring. Refactoring is the object-oriented equivalent of restructuring. According to \cite{Fowler1999}, refactoring is [the process of making] a change to the internal structure of software to make it easier to understand and cheaper to modify without changing its observable behaviour. If applied to programs, we talk of program refactoring. If applied to models, we talk of model refactoring. If applied to aspects, we talk of aspect refactoring.

Release.

A release is a version of a software system that has been approved and distributed to users outside the development team.

Restructuring. According to \cite{ChikofskyCross1990}, restructuring is the transformation from one representation form to another at the same relative abstraction level, while preserving the system's external behaviour.

Reverse engineering. According to \cite{ChikofskyCross1990}, reverse engineering is the process of analyzing a subject system to identify the system's components and their interrelationships and create representations of the system in another form or at a higher level of abstraction. Reverse engineering generally involves extracting design artefacts and building or synthesizing abstractions that are less implementation-dependent.

S

Scattering and tangling. Occurs when the code needed to implement a given concern is spread out (scattered) over and clutters (is tangled with) the code needed to satisfy one or more other concern. Scattering or tangling are typically the result of a program's inability to handle what is called a crosscutting concern.

Schema refinement. The process within database reverse engineering that attempts to recover all, or at least most, implicit constructs (data structures and integrity constraints) of a physical or logical schema.

Schema conceptualisation. The process within database reverse engineering that aims at deriving a plausible conceptual schema from the logical schema of a legacy database. Also called schema interpretation.

Schema transformation. A rewriting rule that replaces a set of constructs of a database schema with another set of constructs. Such a transformation comprises two parts: a schema rewriting rule (structural mapping) and a data conversion rule (instance mapping). The latter transforms the data according to the source schema into data complying with the target schema.

Service-oriented architecture. According to Thomas Erl \cite{Erl05}, SOA is "a model in which automation logic is decomposed into smaller, distinct units of logic. Collectively, these units comprise a larger piece of business automation logic. Individually, these units can be distributed. (...) (SOA) encourages individual units of logic to exist autonomously yet not isolated from each other. Units of logic are still required to conform to a set of principles that allow them to evolve independently, while still maintaining a sufficient amount of commonality and standardization. Within SOA, these units of logic are known as services."

Some of the key principles of service-orientation are: loose coupling, service contract, autonomy, abstraction, reusability, composability, statelessness and discoverability.

Software engineering. The term software engineering was defined for the first time during a conference of the NATO Science Committee \cite{Naur&al1969} as "the establishment and use of sound engineering principles in order to obtain economically software that is reliable and works efficiently on real machines. Alternatively, the IEEE standard 610-12 (1999) defines software engineering as "the application of a systematic, disciplined, quantifiable approach to the development, operation, and maintenance of software; that is, the application of engineering to software."

T

Testability. The ISO/IEC standard 9126 defines testability as "attributes of software that bear on the effort needed to validate the software product" \cite{iso9126}.

Testing. We can distinguish different kinds of software testing~\cite{Binder2000}:

Regression testing. Tests which seek to reveal cases where software functionality that previously worked as desired, stops working or no longer works in the same way that was previously planned.
Developer testing. Preliminary testing performed by the software engineers who design and/or implement the software systems. Stands in contrast with independent testing, or testing performed by software engineers who are not directly involved with designing or implementing the software system.
Black box testing. The use of specified or expected responsibilities of a unit, subsystem, or system to design tests. Synonymous with specification-oriented, behavioral, functional, or responsibility-based test design.
Acceptance testing. Formal testing conducted to determine whether or not a system satisfies its acceptance criteria and to enable the customer to determine whether or not to accept the system.
White box testing. The use of source code analysis to develop test cases. Synonymous with structural, glass box, clear box, implementation-based test design.
Unit testing. Testing of individual software units, or groups of related units. A test unit may be a module, a few modules, or a complete computer program.

Traceability. The property of software design and development that makes it possible to link any abstract artefact to the technical artefacts that implement it, and conversely. In addition, this link explains how and why this implementation has been chosen.
In the database realm, traceability allows a programmer to know exactly which conceptual object a definite column is an implemention of. Conversely, it informs on how a conceptual object type has been implemented.

Transformation rule. A rewriting rule through which the instances of some pattern of an abstract or concrete specification are replaced with instances of another pattern. Depending on the type of artefact that needs to be transformed, different types of transformation can be considered: schema transformation (for database schemas), term rewriting (for tree-based structures), graph transformation (for graph-based structures), and so on.

Transformational software engineering. A view of software engineering through which the production and evolution of software can be modelled, and practically carried out, by a chain of transformations which preserves some essential properties of the source specifications. Program compilation, but also transforming tail recursion into an iterative pattern are popular examples. This approach is currently applied to software evolution, reverse engineering and migration. The transformational paradigm is one of the most powerful approaches to formally guarantee traceability.

Threshold. A fixed value (typically an upper bound or lower bound) that distinguishes normal values from abnormal metric values. Typically used when applying software metrics to detect anomalies.

U

Uniqueness. Uniqueness is the property of model or program transformations to deliver a unique result upon termination.

V

Version. A version is a snapshot of a certain software system at a certain point in time. Whenever a change is made to the software system, a new version is created. The version history is the collection of all versions and their relationships.

Version repository. A kind of database, file system or other kind of repository in which the version history of a software system are stored. The repository may be used to store source code, executable code, documentation or any other type of software artefact of which different versions may exist over time (or even at the same time).

W

Web service. The World Wide Web Consortium (W3C), in \cite{W3CGlossary}, states that "A web service is a software system designed to support interoperable machine-to-machine interaction over a network. It has an interface described in a machine-processable format (specifically WSDL). Other systems interact with the web service in a manner prescribed by its description using SOAP-messages, typically conveyed using HTTP with an XML serialization in conjunction with other web-related standards."

Well-definedness. Well-definedness is the property of model or program transformations to terminate with a unique and correct result when given a consistent input.

Wrapper. A software component that encapsulates a system component (a procedure, a program, a file, an API) in order to transform its interface with its environment. For instance, a wrapper associated with a legacy program can give the latter an object-oriented interface.
In a database setting, a data wrapper is a software component that encapsulates a database or a set of files in order to change its model and the API through which the data can be manipulated. For example, a data wrapper built on top of a standard file can allow application programs to access the contents of the file as if it were a relational table or a collection of XML documents.