Welcome to Extended Selfish Gene’s documentation!

Introduction

What is SGX?

The Selfish Gene optimization algorithm (SG) is a population-less evolutionary algorithm loosely inspired by the interpretation of the Darwinian theory given by the English biologist Richard Dawkins and popularized as the Selfish Gene theory. It enables a user to efficiently find the list parameters, either discrete symbols or real numbers, that maximizes a given target function.

The original SG was an almost straightforward implementation of a though experiment, only able to handle binary values; it was published in SAC98 as “The selfish gene algorithm: a new evolutionary optimization strategy” (F. Corno, M.S. Reorda, G. Squillero, 1998) and few month later, with some modifications, in ICEC98 as “A new evolutionary algorithm inspired by the selfish gene theory” (F. Corno, M.S. Reorda, G. Squillero, 1998). The base algorithm was later discovered surprisingly similar to the Equilibrium Genetic Algorithm, developed by Ari Juels, Shumeet Baluja, and Alistair Sinclair in 1993 and never published – see “Lost gems of EC: the equilibrium genetic algorithm and the role of crossover” (Fernando G. Lobo, 2007). Even more surprisingly, Georges Harik, Fernando Lobo, and David Golberg proposed a quite similar, yet completely unrelated, algorithm in the very same ICEC98: “The Compact Genetic Algorithm” (G.R. Harik, F.G. Lobo, D.E. Goldberg, 1998). Comprehensive background information on “Estimation of Distribution Algorithms (EDAs) (Martin Pelikan, Mark W. Hauschild, Fernando G. Lobo, 2015) can be found in an introduction by Lobo et al.

Since its appearance, the SG was demonstrated more robust than pure hill climbing, reasonably efficient, and quite easy to implement. It was immediately exploited by practitioners in many real-world applications, CAD problems; and by scholars for various test benches. Moreover, the SG framework enabled the inclusions of tricks that made it effective in quite a wider range of situations. The enhanced SG-Clans added to the basic SG a sort of recursive evolution, inspired by the concept of allopatric speciation, to escape local optima. Results were published in “Optimizing deceptive functions with the SG-Clans algorithm” (F. Corno, M.S. Reorda, G. Squillero, 1999). Non-binary encodings were eventually added in 2000s. Indeed, real-valued parameters was never included as they never worked properly, although a draft paper titled “A population-less evolutionary algorithm for real and integer optimization” mysteriously crawled its way up to semantic scholar.

Over the years, the algorithm was reimplemented by different researchers in different languages, and a few brand new approaches derived from it (see Google Scholar’s up-to-date references). In 2016, a comprehensive review was published on IOPScience “Selfish Gene Algorithm Vs Genetic Algorithm: A Review” (Ariff, Norharyati Md, Khalid, Noor Elaiza Abdul, Hashim, Rathiah, Noor, Noorhayati Mohamed, 2016).

Audience

The expected audience for SGX, a ‘Quick n’ Dirty’ numerical optimization, includes computer scientists, engineers and practitioners.

  • This evolutionary algorithm provides a Sub-Optimal result, which is better than a “hill-climbing” algorithm.

  • It is a real, industrial application where fitness function is computationally intensive.

  • Real-time application.

  • SGX also provides an easy-to-use and standard interface.

  • The code is parallelizable, which means that it can run in parallel multiple threads. There is no need to wait for everything to be completed. Some implementations might be Embarrassingly parallel (see https://en.wikipedia.org/wiki/Embarrassingly_parallel).

  • SGX is available as a PyPi package and it can be easily installed using pip.

  • The modular design allows scholars to extend SGX for custom application.

Installation

Importance of FOSS

Personal control, customizability and freedom:

Users of FOSS benefit from the Four Essential Freedoms to make unrestricted use of, and to study, copy, modify, and redistribute such software with or without modification. If they would like to change the functionality of software they can bring about changes to the code and, if they wish, distribute such modified versions of the software or often − depending on the software’s decision making model and its other users − even push or request such changes to be made via updates to the original software.

Privacy and security:

Manufacturers of proprietary, closed-source software are sometimes pressured to building in backdoors or other covert, undesired features into their software. Instead of having to trust software vendors, users of FOSS can inspect and verify the source code themselves and can put trust on a community of volunteers and users. As proprietary code is typically hidden from public view, only the vendors themselves and hackers may be aware of any vulnerabilities in them while FOSS involves as many people as possible for exposing bugs quickly.

Low costs or no costs:

FOSS is often free of charge although donations are often encouraged. This also allows users to better test and compare software.

Quality, collaboration and efficiency:

FOSS allows for better collaboration among various parties and individuals with the goal of developing the most efficient software for its users or use-cases while proprietary software is typically meant to generate profits. Furthermore, in many cases more organizations and individuals contribute to such projects than to proprietary software. It has been shown that technical superiority is typically the primary reason why companies choose open source software.


The default branch is always the more stable and the only one tested through Travis CI. The experimental branches exp/* contain code and comments that some programmers may find disturbing — Viewers discretion advised. Before trying to contribute read this paper and this style guide. It may be wise to send Giovanni an email [@] before digging into the project.

Source Code

SGX is available as a PyPi package from https://pypi.org/project/sgx/ and installing it is as simple as

pip install sgx

and then

>>> import sgx

Caveat: on some systems the package manager is pip3.

Fitness Function

Fitness class also redefines the relational operator in order to handle different types of optimization (eg. maximization, minimization) and to provide limited support to more complex scenarios (eg. multi-objective optimization).

How do we handle a different scenario

When subclassing, one’s fitness should only redefine ‘is_fitter’, and optionally ‘is_distinguishable’ and ‘is_dominant’; ‘is_dominant’ must be changed if ‘is_fitter’ is randomized (the result is uncertain).

The idea of several, different scenarios is the following:

`a == b` In this case, fitness a cannot be distinguished from fitness b.

`a != b` In this case, fitness a is distinguishable from fitness b.

`a > b` In this case, fitness a is fitter than fitness b. (may not always be the case, see lexicographic)

`a >= b` In this case, fitness a is fitter or not distinguishable from fitness b.

`a < b` In this case, fitness b is fitter than fitness a, respectively. (may not always be the case, see lexicographic)

`a <= b` In this case, fitness b is fitter or not distinguishable from fitness a, respectively.

`a >> b` In this case, fitness a dominates fitness b which is a certain case.

`a << b` In this case, fitness a is dominated by fitness b, accordingly.

Multi-Objective Evolutionary Algorithm

The problem becomes more interesting in case that there exist more than one characteristic that should be compared in order to decide which individual is “better”. It is very important to find rules that describe characteristics with respect to a property of interest. The MOEA approach, a method of combining the traditional genetic algorithm (TGA) with the multi-objective method, can consider the relation between the parameters and the objective spaces in the same time then explore the optimum solution. Our multi-objective evolutionary algorithm uses a ‘helper’ function which can decide the best individual when there are two comparable characteristics.

  • For this reason, SGX implements a specific function called Lexicase which can be found at sgx.fitness.multi_objective . Lexicase is a selection method that considers training cases individually, rather than in aggregate, when performing individual selection.

This special Multi-Objective scenario can be illustrated with an airplaine ticket purchase. Let us consider the example of buying a flight ticket where the price of ticket and travel time are the decision-making criteria.

_images/airticket.png

The points A,B,C,D,E and F represent the options for flying between two cities. We assume that difference in travel time is due to the waiting time for connecting flight at transit. Option A is the most expensive with ticket price of $4000, but with least travel time of 16 hours. The cheapest ticket is of $2000 with travel time of 32 hours if one takes flying Option E. Here the decision-making process of flight booking is not a single objective of either price or travel time. The traveler has few options to choose from with some trade-off between travel time and price. If one selects Option B instead of Option A, he will be saving on his ticket price by spending more time on transit. Again, if the traveler selects Option D instead of Option E, he has to sell out more money for buying the ticket, but he can save a few hours. If someone chooses Option F, he is definitely losing. He can go for Option B at same price with less travel time, or Option D of same travel duration at lower price. The points A,B,C,D and E are called Pareto Optimal Points, named after the famous Italian economist Vilfredo Pareto. They are also called Non-Dominated solutions. The example of flight options is for a two-objective optimization. In multi-objective optimization there can be more than two objectives.

Allele

An allele can be one of the following types:

Base Allele class

sgx.allele.base

class sgx.allele.base.Allele

Abstract class for Allele.

An allele must be Hashable (ie. non modifiable)

abstract describe() → str

Pretty describes the current allele.

property mode

Returns the most frequent allele.

property possible_values

Possible values of the allele. None if not reasonably applicable (eg. a float)

abstract sample(sample_type: Optional[str] = 'sample') → Hashable

Sample.

Parameters

sample_type – ‘sample’ (default): random value according to the current probability distribution ‘uniform’: random value according to a uniform probability distribution (ie. completely random) ‘mode’: most common value according to the current probability distribution

abstract update(winner: Hashable, loser: Hashable) → None

Updates the winner and the loser Genotype, so as to modify the new Learning Rate.

Parameters
  • winner – The genotype of the better solution.

  • loser – The genotype of the worse solution.

Boolean Allele class

sgx.allele.boolean

class sgx.allele.boolean.Boolean(learning_rate: float = 0.001)
describe() → str

Pretty describes the current boolean allele.

is_valid(value: Hashable) → bool

Checks if the allele is boolean.

run_paranoia_checks() → bool

Returns True, if all tests are passed.

sample(sample_type: Optional[str] = 'sample') → Hashable

Sample.

Parameters

sample_type – The type of sample (sample (default), uniform, mode).

static sigmoid(x: float, k: Optional[float] = 1) → float

Logistic function with given logistic growth (k). See https://en.wikipedia.org/wiki/Logistic_function

Parameters
  • x – A float number.

  • k – Logistic Growth.

Returns

The result probability of the sigmoid function.

update(winner: Hashable, loser: Hashable) → None

Updates the winner and the loser Genotype, so as to modify the new Learning Rate.

Parameters
  • winner – The genotype of the better solution.

  • loser – The genotype of the worse solution.

Categorical Allele class

sgx.allele.categorical

class sgx.allele.categorical.Categorical(alternatives: Sequence[Hashable], weights: Optional[Union[Sequence[float], dict]] = None, learning_rate: Optional[float] = None)
describe() → str

Pretty describes the current categorical allele.

is_valid(value: Hashable) → bool

Checks if the allele is categorical.

property possible_values

Possible values of the allele. None if not reasonably applicable (eg. a float)

run_paranoia_checks() → bool

Returns True, if all tests are passed.

sample(sample_type: Optional[str] = 'sample') → Hashable

Sample.

Parameters

sample_type – The type of sample (sample (default), uniform, mode).

update(winner: Hashable, loser: Hashable) → None

Updates the winner and the loser Genotype, so as to modify the new Learning Rate.

Parameters
  • winner – The genotype of the better solution.

  • loser – The genotype of the worse solution.


Fitness

Base fitness class

sgx.fitness.base

class sgx.fitness.base.Fitness

Fitness of a phenotype, handle multiple formats (eg. scalar, tuple).

The class also redefines the relational operator in order to handle different types of optimization (eg. maximization, minimization) and to provide limited support to more complex scenarios (eg. multi-objective optimization)

Equalities (‘==’ and ‘!=’) are based on is_distinguishable.

Single angular-bracket operators (‘>’, ‘<’, ‘>=’, and ‘<=’) are based on is_fitter and may be randomized (the result may not be reproducible).

Double angular-bracket operators (‘>>’ and ‘<<’) are based on is_dominant and the result is stable. By default is_dominant is defined as is_fitter.

When subclassing, one should only redefine is_fitter, and optionally is_distinguishable and is_dominant; is_dominant must be changed if is_fitter is randomized (the result is uncertain).

Additional sanity checks should be added to check_comparable. Subclasses may redefine the decorate method to change the values appearance.

__eq__(other: sgx.fitness.base.Fitness) → bool

Returns True if one fitness is equal (==) to the other.

__ge__(other: sgx.fitness.base.Fitness) → bool

Returns True if one fitness is greater or equal (>=) to the other.

__gt__(other: sgx.fitness.base.Fitness) → bool

Returns True if one fitness is greater (>) than the other.

__le__(other: sgx.fitness.base.Fitness) → bool

Returns True if one fitness is less or equal (<=) to the other.

__lshift__(other: sgx.fitness.base.Fitness) → bool

Returns True if one fitness does not dominates (<<) the other.

__lt__(other: sgx.fitness.base.Fitness) → bool

Returns True if one fitness is less (<) than the other.

__ne__(other: sgx.fitness.base.Fitness) → bool

Returns True if one fitness is not equal (!=) to the other.

__rshift__(other: sgx.fitness.base.Fitness) → bool

Returns True if one fitness dominates (>>) the other.

check_comparable(other: sgx.fitness.base.Fitness)

Checks if the fitness is able to be compared.

decorate() → str

Represent the individual fitness value with a nice string.

is_distinguishable(other: sgx.fitness.base.Fitness) → bool

Check whether some differences from the other Fitness may be perceived.

is_dominant(other: sgx.fitness.base.Fitness) → bool

Check whether dominates the other (result is certain).

is_fitter(other: sgx.fitness.base.Fitness) → bool

Check whether fitter than the other (result may be accidental).

is_valid(fitness: sgx.fitness.base.Fitness) → bool

Returns True if the fitness is able to be compared, otherwise Raises an Assertion Error.

run_paranoia_checks() → bool

Returns True if checks are successful.

sgx.fitness.base.reversed(fitness_class: sgx.fitness.base.Fitness)sgx.fitness.base.Fitness

Reverse fitness class turning a maximization problem into a minimization one.

Fitness Function class

sgx.fitness.function


Multi-Objective class

sgx.fitness.multi_objective

class sgx.fitness.multi_objective.Lexicase(value: Sequence, fitness_type: Type[sgx.fitness.base.Fitness] = <class 'sgx.fitness.simple.Scalar'>, **kwargs)

Pseudo-MO through Lexicase selection (DOI:10.1109/TEVC.2014.2362729).

is_fitter(other: sgx.fitness.multi_objective.Lexicase) → bool

Check whether fitter than the other.

class sgx.fitness.multi_objective.MultiObjective(value: Sequence, fitness_type: Type[sgx.fitness.base.Fitness] = <class 'sgx.fitness.simple.Scalar'>, **kwargs)

Abstract class for handling Multi-Objective problems.

is_dominant(other: sgx.fitness.multi_objective.Lexicase) → bool

Check whether dominates the other (result is certain).

abstract is_fitter(other: sgx.fitness.multi_objective.Lexicase) → bool

Check whether fitter than the other.


Simple class

sgx.fitness.simple

class sgx.fitness.simple.Approximate(argument, rel_tol: float = 1e-09, abs_tol: float = 0)

A single, floating-point value with approximate equality – Larger is better.

check_comparable(other: sgx.fitness.simple.Approximate)

Checks if the fitness is able to be compared.

decorate() → str

Represent the individual fitness value with a nice string.

is_distinguishable(other: sgx.fitness.base.Fitness) → bool

Check whether some differences from the other Fitness may be perceived.

is_fitter(other: sgx.fitness.base.Fitness) → bool

Check whether fitter than the other.

class sgx.fitness.simple.Integer

A single numeric value – Larger is better.

class sgx.fitness.simple.Scalar(x=0, /)

A single numeric value – Larger is better.

class sgx.fitness.simple.Vector(value: Sequence, fitness_type: Type[sgx.fitness.base.Fitness] = <class 'sgx.fitness.simple.Scalar'>, **kwargs)

A generic vector of Fitness values.

fitness_type is the subtype, **kwargs are passed to fitness init

Examples

f1 = sgx.fitness.Vector([23, 10], fitness_type=Approximate, abs_tol=.1) f2 = sgx.fitness.Vector([23, 10], fitness_type=Approximate, abs_tol=.001)

f1 > sgx.fitness.Vector([23, 9.99], fitness_type=Approximate, abs_tol=.1) is False f2 > sgx.fitness.Vector([23, 9.99], fitness_type=Approximate, abs_tol=.001) is True

check_comparable(other: sgx.fitness.simple.Vector)

Checks if the fitness is able to be compared.

static compare_vectors(v1: Sequence[sgx.fitness.base.Fitness], v2: Sequence[sgx.fitness.base.Fitness]) → int

Compare Fitness values in v1 and v2.

Parameters
  • v1 – The first fitness vector.

  • v2 – The second fitness vector.

Returns

-1 if v1 < v2; +1 if v1 > v2; 0 if v1 == v2

decorate() → str

Represent the individual fitness value with a nice string.

is_distinguishable(other: sgx.fitness.simple.Vector) → bool

Check whether some differences from the other Fitness may be perceived.

is_fitter(other: sgx.fitness.base.Fitness) → bool

Check whether fitter than the other.


Utils

Jupyter Support

sgx.utils.jupyter_support

sgx.utils.jupyter_support.is_notebook() → bool

Check if running inside a notebooks

Credits: https://stackoverflow.com/questions/15411967/

Logging

sgx.utils.logging

sgx.utils.logging.log_cpu(level: int = 20, msg: str = '', *args, **kwargs) → None

Like log(), but including cpu time.

Random class

sgx.utils.random

Archive class

sgx.archive

Base class

sgx.base

class sgx.base.Genome(*args)

A tuple of Alleles, each one specifying a set of alternative genes.

is_valid(genotype: sgx.base.Genotype) → bool

Check an object against a specification.

The function may be used to check a value against a parameter definition, a node against a section definition).

Returns

True if the object is valid, False otherwise.

run_paranoia_checks() → bool

Check the internal consistency of a “paranoid” object.

The function should be overridden by the sub-classes to implement the required, specific checks. It always returns True, but throws an exception whenever an inconsistency is detected.

Notez bien: Sanity checks may be computationally intensive, paranoia checks are not supposed to be used in production environments (i.e., when -O is used for compiling). Their typical usage is: assert foo.run_paranoia_checks()

Returns

True (always)

Raises

AssertionError if some internal data structure is incoherent

class sgx.base.Genotype(*args)

A tuple containing the organism’s actual genes (their values).

run_paranoia_checks() → bool

Check the internal consistency of a “paranoid” object.

The function should be overridden by the sub-classes to implement the required, specific checks. It always returns True, but throws an exception whenever an inconsistency is detected.

Notez bien: Sanity checks may be computationally intensive, paranoia checks are not supposed to be used in production environments (i.e., when -O is used for compiling). Their typical usage is: assert foo.run_paranoia_checks()

Returns

True (always)

Raises

AssertionError if some internal data structure is incoherent

class sgx.base.Paranoid

Abstract class: Paranoid classes do implement run_paranoia_checks().

run_paranoia_checks() → bool

Check the internal consistency of a “paranoid” object.

The function should be overridden by the sub-classes to implement the required, specific checks. It always returns True, but throws an exception whenever an inconsistency is detected.

Notez bien: Sanity checks may be computationally intensive, paranoia checks are not supposed to be used in production environments (i.e., when -O is used for compiling). Their typical usage is: assert foo.run_paranoia_checks()

Returns

True (always)

Raises

AssertionError if some internal data structure is incoherent

class sgx.base.Pedantic

Abstract class: Pedantic classes do implement is_valid().

abstract is_valid(obj: Any) → bool

Check an object against a specification.

The function may be used to check a value against a parameter definition, a node against a section definition).

Returns

True if the object is valid, False otherwise.

Species class

sgx.species

class sgx.species.Species(genome: Sequence[Any], fitness_function: sgx.fitness.function.FitnessFunction, mutation_rate: Optional[float] = None)

Missing

evaluate(genotype: sgx.base.Genotype)sgx.fitness.base.Fitness

Evaluates a genotype according to a specific Fitness Function.

Parameters

genotype – The Genotype which is going to be evaluated.

Returns

The fitness calculated from a Fitness Function.

sample(sample_type: Optional[str] = 'sample')sgx.base.Genotype

Sampling the genotypes with a specific method.

Parameters

sample_type – The type of sampling is going to be used.

Returns

The candidate’s Genotype after sampling.

update(winner: sgx.base.Genotype, loser: sgx.base.Genotype)

Updates the winner and the loser Genotype, so as to modify the new Learning Rate.

Parameters
  • winner – The genotype of the better solution.

  • loser – The genotype of the worse solution.


Modular Design Rationale

Object-oriented programming (OOP) is based on the concept of “objects”, which can contain data and code: data in the form of field, and code, in the form of procedures (often known as methods). A feature of objects is that an object’s own procedures can access and often modify the data fields of itself. In OOP, computer programs are designed by making them out of objects that interact with one another.

Python language is a multi-paradigm and supports object-oriented programming to a greater degree, typically in combination with imperative, procedural programming.

Extended Selfish Gene Optimizer

Genotype

Fitness

Genome

Single-Objective

Multi-Objective

Allele

-Int -Float -Double

-Char -Float

etc.

Boolean

Categorical

-0

-1

-R

-G

-B

Authors

Giovanni Squillero

Politecnico di Torino
Department of Control and Computer Engineering
Corso Duca degli Abruzzi 24
10129 Torino — Italy

Alberto Tonda

Génie et Microbiologie des Procédés Alimentaires (GMPA)
French National Institute for Agricultural Research
AgroParisTech, Université Paris - Saclay
1 av. Brétignières
78850 Thiverval-Grignon — France

Stella Stakiadi

Politecnico di Torino
Department of Control and Computer Engineering
Corso Duca degli Abruzzi 24
10129 Torino — Italy

License

Copyright © 2021 Giovanni Squillero

The Extended Selfish Gene (SGX) is free and open-source software , and it is distributed under the permissive Apache License 2.0.

Acknowledgements

This Documentation was built with Sphinx using a theme provided by Read the Docs.

Appendix - Terms

`Allele`: An allele is one of two, or more, forms of a given gene variant. An allele is one of two, or more, versions of the same gene at the same place on a chromosome.

`Locus`: A locus (plural loci) is a specific, fixed position on a chromosome where a particular gene or genetic marker is located. Each chromosome carries many genes, with each gene occupying a different position or locus;

`Gene`: A gene is a basic unit of heredity and a sequence of nucleotides in DNA or RNA that encodes the synthesis of a gene product, either RNA or protein.

`Genome`: A genome is all genetic material of an organism. It consists of DNA (or RNA in RNA viruses). The genome includes both the genes (the coding regions) and the noncoding DNA, as well as mitochondrial DNA and chloroplast DNA.

`Genotype`: A genotype is an organism’s complete set of genetic material. Often though, genotype is used to refer to a single gene or set of genes, such as the genotype for eye color. The genes take part in determining the characteristics that are observable (phenotype) in an organism, such as hair color, height, etc.

`Phenotype`: Phenotype is the term used in genetics for the composite observable characteristics or traits of an organism. The term covers the organism’s morphology or physical form and structure, its developmental processes, its biochemical and physiological properties, its behavior, and the products of behavior.

`Individual`: An individual is that which exists as a distinct entity (see Population).

`Population`: A population is defined as a group of individuals of the same species living and interbreeding within a given area.