Welcome to Extended Selfish Gene’s documentation!¶
Introduction¶
What is SGX?¶
The Selfish Gene optimization algorithm (SG) is a population-less evolutionary algorithm loosely inspired by the interpretation of the Darwinian theory given by the English biologist Richard Dawkins and popularized as the Selfish Gene theory. It enables a user to efficiently find the list parameters, either discrete symbols or real numbers, that maximizes a given target function.
The original SG was an almost straightforward implementation of a though experiment, only able to handle binary values; it was published in SAC98 as “The selfish gene algorithm: a new evolutionary optimization strategy” (F. Corno, M.S. Reorda, G. Squillero, 1998) and few month later, with some modifications, in ICEC98 as “A new evolutionary algorithm inspired by the selfish gene theory” (F. Corno, M.S. Reorda, G. Squillero, 1998). The base algorithm was later discovered surprisingly similar to the Equilibrium Genetic Algorithm, developed by Ari Juels, Shumeet Baluja, and Alistair Sinclair in 1993 and never published – see “Lost gems of EC: the equilibrium genetic algorithm and the role of crossover” (Fernando G. Lobo, 2007). Even more surprisingly, Georges Harik, Fernando Lobo, and David Golberg proposed a quite similar, yet completely unrelated, algorithm in the very same ICEC98: “The Compact Genetic Algorithm” (G.R. Harik, F.G. Lobo, D.E. Goldberg, 1998). Comprehensive background information on “Estimation of Distribution Algorithms (EDAs) (Martin Pelikan, Mark W. Hauschild, Fernando G. Lobo, 2015) can be found in an introduction by Lobo et al.
Since its appearance, the SG was demonstrated more robust than pure hill climbing, reasonably efficient, and quite easy to implement. It was immediately exploited by practitioners in many real-world applications, CAD problems; and by scholars for various test benches. Moreover, the SG framework enabled the inclusions of tricks that made it effective in quite a wider range of situations. The enhanced SG-Clans added to the basic SG a sort of recursive evolution, inspired by the concept of allopatric speciation, to escape local optima. Results were published in “Optimizing deceptive functions with the SG-Clans algorithm” (F. Corno, M.S. Reorda, G. Squillero, 1999). Non-binary encodings were eventually added in 2000s. Indeed, real-valued parameters was never included as they never worked properly, although a draft paper titled “A population-less evolutionary algorithm for real and integer optimization” mysteriously crawled its way up to semantic scholar.
Over the years, the algorithm was reimplemented by different researchers in different languages, and a few brand new approaches derived from it (see Google Scholar’s up-to-date references). In 2016, a comprehensive review was published on IOPScience “Selfish Gene Algorithm Vs Genetic Algorithm: A Review” (Ariff, Norharyati Md, Khalid, Noor Elaiza Abdul, Hashim, Rathiah, Noor, Noorhayati Mohamed, 2016).
Audience¶
The expected audience for SGX, a ‘Quick n’ Dirty’ numerical optimization, includes computer scientists, engineers and practitioners.
This evolutionary algorithm provides a Sub-Optimal result, which is better than a “hill-climbing” algorithm.
It is a real, industrial application where fitness function is computationally intensive.
Real-time application.
SGX also provides an easy-to-use and standard interface.
The code is parallelizable, which means that it can run in parallel multiple threads. There is no need to wait for everything to be completed. Some implementations might be Embarrassingly parallel (see https://en.wikipedia.org/wiki/Embarrassingly_parallel).
SGX is available as a PyPi package and it can be easily installed using pip.
The modular design allows scholars to extend SGX for custom application.
Installation¶
Importance of FOSS¶
Personal control, customizability and freedom:
Users of FOSS benefit from the Four Essential Freedoms to make unrestricted use of, and to study, copy, modify, and redistribute such software with or without modification. If they would like to change the functionality of software they can bring about changes to the code and, if they wish, distribute such modified versions of the software or often − depending on the software’s decision making model and its other users − even push or request such changes to be made via updates to the original software.
Privacy and security:
Manufacturers of proprietary, closed-source software are sometimes pressured to building in backdoors or other covert, undesired features into their software. Instead of having to trust software vendors, users of FOSS can inspect and verify the source code themselves and can put trust on a community of volunteers and users. As proprietary code is typically hidden from public view, only the vendors themselves and hackers may be aware of any vulnerabilities in them while FOSS involves as many people as possible for exposing bugs quickly.
Low costs or no costs:
FOSS is often free of charge although donations are often encouraged. This also allows users to better test and compare software.
Quality, collaboration and efficiency:
FOSS allows for better collaboration among various parties and individuals with the goal of developing the most efficient software for its users or use-cases while proprietary software is typically meant to generate profits. Furthermore, in many cases more organizations and individuals contribute to such projects than to proprietary software. It has been shown that technical superiority is typically the primary reason why companies choose open source software.
The default branch is always the more stable and the only one tested through Travis CI.
The experimental branches exp/*
contain code and comments that some programmers may find disturbing — Viewers discretion advised.
Before trying to contribute read this paper and this style guide.
It may be wise to send Giovanni an email [@] before digging into the project.
Source Code¶
SGX is available as a PyPi package from https://pypi.org/project/sgx/ and installing it is as simple as
pip install sgx
and then
>>> import sgx
Caveat: on some systems the package manager is pip3
.
Fitness Function¶
Fitness class also redefines the relational operator in order to handle different types of optimization (eg. maximization, minimization) and to provide limited support to more complex scenarios (eg. multi-objective optimization).
How do we handle a different scenario¶
When subclassing, one’s fitness should only redefine ‘is_fitter’, and optionally ‘is_distinguishable’ and ‘is_dominant’; ‘is_dominant’ must be changed if ‘is_fitter’ is randomized (the result is uncertain).
The idea of several, different scenarios is the following:
`a == b`
In this case, fitness a
cannot be distinguished from fitness b
.
`a != b`
In this case, fitness a
is distinguishable from fitness b
.
`a > b`
In this case, fitness a
is fitter than fitness b
.
(may not always be the case, see lexicographic)
`a >= b`
In this case, fitness a
is fitter or not distinguishable from fitness b
.
`a < b`
In this case, fitness b
is fitter than fitness a
, respectively.
(may not always be the case, see lexicographic)
`a <= b`
In this case, fitness b
is fitter or not distinguishable from fitness a
, respectively.
`a >> b`
In this case, fitness a
dominates fitness b
which is a certain case.
`a << b`
In this case, fitness a
is dominated by fitness b
, accordingly.
Multi-Objective Evolutionary Algorithm¶
The problem becomes more interesting in case that there exist more than one characteristic that should be compared in order to decide which individual is “better”. It is very important to find rules that describe characteristics with respect to a property of interest. The MOEA approach, a method of combining the traditional genetic algorithm (TGA) with the multi-objective method, can consider the relation between the parameters and the objective spaces in the same time then explore the optimum solution. Our multi-objective evolutionary algorithm uses a ‘helper’ function which can decide the best individual when there are two comparable characteristics.
For this reason, SGX implements a specific function called Lexicase which can be found at
sgx.fitness.multi_objective
. Lexicase is a selection method that considers training cases individually, rather than in aggregate, when performing individual selection.
This special Multi-Objective scenario can be illustrated with an airplaine ticket purchase. Let us consider the example of buying a flight ticket where the price of ticket and travel time are the decision-making criteria.

The points A,B,C,D,E and F represent the options for flying between two cities. We assume that difference in travel time is due to the waiting time for connecting flight at transit. Option A is the most expensive with ticket price of $4000, but with least travel time of 16 hours. The cheapest ticket is of $2000 with travel time of 32 hours if one takes flying Option E. Here the decision-making process of flight booking is not a single objective of either price or travel time. The traveler has few options to choose from with some trade-off between travel time and price. If one selects Option B instead of Option A, he will be saving on his ticket price by spending more time on transit. Again, if the traveler selects Option D instead of Option E, he has to sell out more money for buying the ticket, but he can save a few hours. If someone chooses Option F, he is definitely losing. He can go for Option B at same price with less travel time, or Option D of same travel duration at lower price. The points A,B,C,D and E are called Pareto Optimal Points, named after the famous Italian economist Vilfredo Pareto. They are also called Non-Dominated solutions. The example of flight options is for a two-objective optimization. In multi-objective optimization there can be more than two objectives.
Allele¶
An allele can be one of the following types:
a
sgx.allele.boolean
which is either “1” or “0”.a
sgx.allele.categorical
(for example “R”, “G”, “B”)
Base Allele class¶
-
class
sgx.allele.base.
Allele
¶ Abstract class for Allele.
An allele must be Hashable (ie. non modifiable)
-
abstract
describe
() → str¶ Pretty describes the current allele.
-
property
mode
¶ Returns the most frequent allele.
-
property
possible_values
¶ Possible values of the allele. None if not reasonably applicable (eg. a float)
-
abstract
sample
(sample_type: Optional[str] = 'sample') → Hashable¶ Sample.
- Parameters
sample_type – ‘sample’ (default): random value according to the current probability distribution ‘uniform’: random value according to a uniform probability distribution (ie. completely random) ‘mode’: most common value according to the current probability distribution
-
abstract
update
(winner: Hashable, loser: Hashable) → None¶ Updates the winner and the loser Genotype, so as to modify the new Learning Rate.
- Parameters
winner – The genotype of the better solution.
loser – The genotype of the worse solution.
-
abstract
Boolean Allele class¶
-
class
sgx.allele.boolean.
Boolean
(learning_rate: float = 0.001)¶ -
describe
() → str¶ Pretty describes the current boolean allele.
-
is_valid
(value: Hashable) → bool¶ Checks if the allele is boolean.
-
run_paranoia_checks
() → bool¶ Returns True, if all tests are passed.
-
sample
(sample_type: Optional[str] = 'sample') → Hashable¶ Sample.
- Parameters
sample_type – The type of sample (sample (default), uniform, mode).
-
static
sigmoid
(x: float, k: Optional[float] = 1) → float¶ Logistic function with given logistic growth (
k
). See https://en.wikipedia.org/wiki/Logistic_function- Parameters
x – A float number.
k – Logistic Growth.
- Returns
The result probability of the sigmoid function.
-
update
(winner: Hashable, loser: Hashable) → None¶ Updates the winner and the loser Genotype, so as to modify the new Learning Rate.
- Parameters
winner – The genotype of the better solution.
loser – The genotype of the worse solution.
-
Categorical Allele class¶
-
class
sgx.allele.categorical.
Categorical
(alternatives: Sequence[Hashable], weights: Optional[Union[Sequence[float], dict]] = None, learning_rate: Optional[float] = None)¶ -
describe
() → str¶ Pretty describes the current categorical allele.
-
is_valid
(value: Hashable) → bool¶ Checks if the allele is categorical.
-
property
possible_values
¶ Possible values of the allele. None if not reasonably applicable (eg. a float)
-
run_paranoia_checks
() → bool¶ Returns True, if all tests are passed.
-
sample
(sample_type: Optional[str] = 'sample') → Hashable¶ Sample.
- Parameters
sample_type – The type of sample (sample (default), uniform, mode).
-
update
(winner: Hashable, loser: Hashable) → None¶ Updates the winner and the loser Genotype, so as to modify the new Learning Rate.
- Parameters
winner – The genotype of the better solution.
loser – The genotype of the worse solution.
-
Fitness¶
Base fitness class¶
-
class
sgx.fitness.base.
Fitness
¶ Fitness of a phenotype, handle multiple formats (eg. scalar, tuple).
The class also redefines the relational operator in order to handle different types of optimization (eg. maximization, minimization) and to provide limited support to more complex scenarios (eg. multi-objective optimization)
Equalities (‘==’ and ‘!=’) are based on
is_distinguishable
.Single angular-bracket operators (‘>’, ‘<’, ‘>=’, and ‘<=’) are based on
is_fitter
and may be randomized (the result may not be reproducible).Double angular-bracket operators (‘>>’ and ‘<<’) are based on
is_dominant
and the result is stable. By defaultis_dominant
is defined asis_fitter
.When subclassing, one should only redefine
is_fitter
, and optionallyis_distinguishable
andis_dominant
;is_dominant
must be changed ifis_fitter
is randomized (the result is uncertain).Additional sanity checks should be added to
check_comparable
. Subclasses may redefine thedecorate
method to change the values appearance.-
__eq__
(other: sgx.fitness.base.Fitness) → bool¶ Returns True if one fitness is equal (==) to the other.
-
__ge__
(other: sgx.fitness.base.Fitness) → bool¶ Returns True if one fitness is greater or equal (>=) to the other.
-
__gt__
(other: sgx.fitness.base.Fitness) → bool¶ Returns True if one fitness is greater (>) than the other.
-
__le__
(other: sgx.fitness.base.Fitness) → bool¶ Returns True if one fitness is less or equal (<=) to the other.
-
__lshift__
(other: sgx.fitness.base.Fitness) → bool¶ Returns True if one fitness does not dominates (<<) the other.
-
__lt__
(other: sgx.fitness.base.Fitness) → bool¶ Returns True if one fitness is less (<) than the other.
-
__ne__
(other: sgx.fitness.base.Fitness) → bool¶ Returns True if one fitness is not equal (!=) to the other.
-
__rshift__
(other: sgx.fitness.base.Fitness) → bool¶ Returns True if one fitness dominates (>>) the other.
-
check_comparable
(other: sgx.fitness.base.Fitness)¶ Checks if the fitness is able to be compared.
-
decorate
() → str¶ Represent the individual fitness value with a nice string.
-
is_distinguishable
(other: sgx.fitness.base.Fitness) → bool¶ Check whether some differences from the other Fitness may be perceived.
-
is_dominant
(other: sgx.fitness.base.Fitness) → bool¶ Check whether dominates the other (result is certain).
-
is_fitter
(other: sgx.fitness.base.Fitness) → bool¶ Check whether fitter than the other (result may be accidental).
-
is_valid
(fitness: sgx.fitness.base.Fitness) → bool¶ Returns True if the fitness is able to be compared, otherwise Raises an Assertion Error.
-
run_paranoia_checks
() → bool¶ Returns True if checks are successful.
-
-
sgx.fitness.base.
reversed
(fitness_class: sgx.fitness.base.Fitness) → sgx.fitness.base.Fitness¶ Reverse fitness class turning a maximization problem into a minimization one.
Fitness Function class¶
Multi-Objective class¶
-
class
sgx.fitness.multi_objective.
Lexicase
(value: Sequence, fitness_type: Type[sgx.fitness.base.Fitness] = <class 'sgx.fitness.simple.Scalar'>, **kwargs)¶ Pseudo-MO through Lexicase selection (DOI:10.1109/TEVC.2014.2362729).
-
is_fitter
(other: sgx.fitness.multi_objective.Lexicase) → bool¶ Check whether fitter than the other.
-
-
class
sgx.fitness.multi_objective.
MultiObjective
(value: Sequence, fitness_type: Type[sgx.fitness.base.Fitness] = <class 'sgx.fitness.simple.Scalar'>, **kwargs)¶ Abstract class for handling Multi-Objective problems.
-
is_dominant
(other: sgx.fitness.multi_objective.Lexicase) → bool¶ Check whether dominates the other (result is certain).
-
abstract
is_fitter
(other: sgx.fitness.multi_objective.Lexicase) → bool¶ Check whether fitter than the other.
-
Simple class¶
-
class
sgx.fitness.simple.
Approximate
(argument, rel_tol: float = 1e-09, abs_tol: float = 0)¶ A single, floating-point value with approximate equality – Larger is better.
-
check_comparable
(other: sgx.fitness.simple.Approximate)¶ Checks if the fitness is able to be compared.
-
decorate
() → str¶ Represent the individual fitness value with a nice string.
-
is_distinguishable
(other: sgx.fitness.base.Fitness) → bool¶ Check whether some differences from the other Fitness may be perceived.
-
is_fitter
(other: sgx.fitness.base.Fitness) → bool¶ Check whether fitter than the other.
-
-
class
sgx.fitness.simple.
Integer
¶ A single numeric value – Larger is better.
-
class
sgx.fitness.simple.
Scalar
(x=0, /)¶ A single numeric value – Larger is better.
-
class
sgx.fitness.simple.
Vector
(value: Sequence, fitness_type: Type[sgx.fitness.base.Fitness] = <class 'sgx.fitness.simple.Scalar'>, **kwargs)¶ A generic vector of Fitness values.
fitness_type is the subtype, **kwargs are passed to fitness init
Examples
f1 = sgx.fitness.Vector([23, 10], fitness_type=Approximate, abs_tol=.1) f2 = sgx.fitness.Vector([23, 10], fitness_type=Approximate, abs_tol=.001)
f1 > sgx.fitness.Vector([23, 9.99], fitness_type=Approximate, abs_tol=.1) is False f2 > sgx.fitness.Vector([23, 9.99], fitness_type=Approximate, abs_tol=.001) is True
-
check_comparable
(other: sgx.fitness.simple.Vector)¶ Checks if the fitness is able to be compared.
-
static
compare_vectors
(v1: Sequence[sgx.fitness.base.Fitness], v2: Sequence[sgx.fitness.base.Fitness]) → int¶ Compare Fitness values in v1 and v2.
- Parameters
v1 – The first fitness vector.
v2 – The second fitness vector.
- Returns
-1 if v1 < v2; +1 if v1 > v2; 0 if v1 == v2
-
decorate
() → str¶ Represent the individual fitness value with a nice string.
-
is_distinguishable
(other: sgx.fitness.simple.Vector) → bool¶ Check whether some differences from the other Fitness may be perceived.
-
is_fitter
(other: sgx.fitness.base.Fitness) → bool¶ Check whether fitter than the other.
-
Utils¶
CPU_time¶
Jupyter Support¶
-
sgx.utils.jupyter_support.
is_notebook
() → bool¶ Check if running inside a notebooks
Logging¶
-
sgx.utils.logging.
log_cpu
(level: int = 20, msg: str = '', *args, **kwargs) → None¶ Like log(), but including cpu time.
Random class¶
Archive class¶
sgx.archive
Base class¶
-
class
sgx.base.
Genome
(*args)¶ A tuple of Alleles, each one specifying a set of alternative genes.
-
is_valid
(genotype: sgx.base.Genotype) → bool¶ Check an object against a specification.
The function may be used to check a value against a parameter definition, a node against a section definition).
- Returns
True if the object is valid, False otherwise.
-
run_paranoia_checks
() → bool¶ Check the internal consistency of a “paranoid” object.
The function should be overridden by the sub-classes to implement the required, specific checks. It always returns
True
, but throws an exception whenever an inconsistency is detected.Notez bien: Sanity checks may be computationally intensive, paranoia checks are not supposed to be used in production environments (i.e., when
-O
is used for compiling). Their typical usage is:assert foo.run_paranoia_checks()
- Returns
True (always)
- Raises
AssertionError if some internal data structure is incoherent –
-
-
class
sgx.base.
Genotype
(*args)¶ A tuple containing the organism’s actual genes (their values).
-
run_paranoia_checks
() → bool¶ Check the internal consistency of a “paranoid” object.
The function should be overridden by the sub-classes to implement the required, specific checks. It always returns
True
, but throws an exception whenever an inconsistency is detected.Notez bien: Sanity checks may be computationally intensive, paranoia checks are not supposed to be used in production environments (i.e., when
-O
is used for compiling). Their typical usage is:assert foo.run_paranoia_checks()
- Returns
True (always)
- Raises
AssertionError if some internal data structure is incoherent –
-
-
class
sgx.base.
Paranoid
¶ Abstract class: Paranoid classes do implement
run_paranoia_checks()
.-
run_paranoia_checks
() → bool¶ Check the internal consistency of a “paranoid” object.
The function should be overridden by the sub-classes to implement the required, specific checks. It always returns
True
, but throws an exception whenever an inconsistency is detected.Notez bien: Sanity checks may be computationally intensive, paranoia checks are not supposed to be used in production environments (i.e., when
-O
is used for compiling). Their typical usage is:assert foo.run_paranoia_checks()
- Returns
True (always)
- Raises
AssertionError if some internal data structure is incoherent –
-
-
class
sgx.base.
Pedantic
¶ Abstract class: Pedantic classes do implement
is_valid()
.-
abstract
is_valid
(obj: Any) → bool¶ Check an object against a specification.
The function may be used to check a value against a parameter definition, a node against a section definition).
- Returns
True if the object is valid, False otherwise.
-
abstract
Species class¶
-
class
sgx.species.
Species
(genome: Sequence[Any], fitness_function: sgx.fitness.function.FitnessFunction, mutation_rate: Optional[float] = None)¶ Missing
-
evaluate
(genotype: sgx.base.Genotype) → sgx.fitness.base.Fitness¶ Evaluates a genotype according to a specific Fitness Function.
- Parameters
genotype – The Genotype which is going to be evaluated.
- Returns
The fitness calculated from a Fitness Function.
-
sample
(sample_type: Optional[str] = 'sample') → sgx.base.Genotype¶ Sampling the genotypes with a specific method.
- Parameters
sample_type – The type of sampling is going to be used.
- Returns
The candidate’s Genotype after sampling.
-
update
(winner: sgx.base.Genotype, loser: sgx.base.Genotype)¶ Updates the winner and the loser Genotype, so as to modify the new Learning Rate.
- Parameters
winner – The genotype of the better solution.
loser – The genotype of the worse solution.
-
Modular Design Rationale¶
Object-oriented programming (OOP) is based on the concept of “objects”, which can contain data and code: data in the form of field, and code, in the form of procedures (often known as methods). A feature of objects is that an object’s own procedures can access and often modify the data fields of itself. In OOP, computer programs are designed by making them out of objects that interact with one another.
Python language is a multi-paradigm and supports object-oriented programming to a greater degree, typically in combination with imperative, procedural programming.
Extended Selfish Gene Optimizer |
||||||
---|---|---|---|---|---|---|
Genotype |
Fitness |
|||||
Genome |
Single-Objective |
Multi-Objective |
||||
Allele |
-Int -Float -Double |
-Char -Float
|
||||
Boolean |
Categorical |
|||||
-0 |
-1 |
-R |
-G |
-B |
Authors¶
Giovanni Squillero¶
Alberto Tonda¶
Stella Stakiadi¶
License¶
Copyright © 2021 Giovanni Squillero
The Extended Selfish Gene (SGX) is free and open-source software , and it is distributed under the permissive Apache License 2.0.
Acknowledgements¶
This Documentation was built with Sphinx using a theme provided by Read the Docs.
Appendix - Terms¶
`Allele`: An allele is one of two, or more, forms of a given gene variant. An allele is one of two, or more, versions of the same gene at the same place on a chromosome.
`Locus`: A locus (plural loci) is a specific, fixed position on a chromosome where a particular gene or genetic marker is located. Each chromosome carries many genes, with each gene occupying a different position or locus;
`Gene`: A gene is a basic unit of heredity and a sequence of nucleotides in DNA or RNA that encodes the synthesis of a gene product, either RNA or protein.
`Genome`: A genome is all genetic material of an organism. It consists of DNA (or RNA in RNA viruses). The genome includes both the genes (the coding regions) and the noncoding DNA, as well as mitochondrial DNA and chloroplast DNA.
`Genotype`: A genotype is an organism’s complete set of genetic material. Often though, genotype is used to refer to a single gene or set of genes, such as the genotype for eye color. The genes take part in determining the characteristics that are observable (phenotype) in an organism, such as hair color, height, etc.
`Phenotype`: Phenotype is the term used in genetics for the composite observable characteristics or traits of an organism. The term covers the organism’s morphology or physical form and structure, its developmental processes, its biochemical and physiological properties, its behavior, and the products of behavior.
`Individual`: An individual is that which exists as a distinct entity (see Population).
`Population`: A population is defined as a group of individuals of the same species living and interbreeding within a given area.