As pointed out in gap GE1 (Evolution ofThreats) of ReSIST D13 deliverable, threats evolve during system lifetime“because attackers are actively involved in the development of new techniquesto inject and, or, activate latent faults in existing systems”. This means thatresilient systems need also to evolve in order that attacks never originatesystem failures. The ideal goal of such evolution would be the complete removalof vulnerabilities (thus eliminating any chances of an attack causing afailure), but it is well known that such goal is very difficult, if notimpossible, to achieve. Nevertheless, during system execution, one can minimizethe number of vulnerabilities by applying security patches to operating systemsor by introducing newer (better) versions of the application code.
The remaining vulnerabilities may be targetedby attacks and produce faults or intrusions. A resilient system needs to dealwith such faults/intrusions, which may be masked through fault/intrusiontolerance protocols. These protocols are typically run on replicated systemsand are able to tolerate the failure of a finite set of f replicas. The problemis that given a sufficient amount of time, a malicious adversary can find waysto compromise more than f replicas. Therefore, if one wants to build aresilient system that is continuously operating, then some sort of recoverymechanism will need to be added. The goal would be to detect and (reactively)recover compromised replicas at a pace faster than the time needed by anadversary to compromise more than f replicas. However, arbitrary faults arevery difficult to detect. One alternative approach is to calculate the minimumtime needed by an adversary to compromise more than f replicas and(proactively) trigger periodic replica recoveries at a faster pace . Notethat the time needed by an adversary to compromise more than f replicas ishighly dependent on how different/diverse replicas are, and, as pointed out ingap GD1 (Diversity for Security) of ReSIST D13 deliverable, “the use ofdiversity in security raises several concerns and a large amount of workremains to be done in order to get a workable solution”.
The simplest recovery procedure is to boot aclean image containing the original operating system and application(s) code,and to obtain the current state (if there is state) from the remainingreplicas. Clearly this technique removes the effects of any faults/intrusionsthat could have occurred before the recovery. However, the bugs/vulnerabilitiesthat caused such faults/intrusions may remain or the adversary may haveacquired knowledge before the recovery (e.g., the password of some user, theversion of the operating system) sufficient to deploy a more advanced attackafter the recovery. Following this reasoning, the adversary may accumulateknowledge over time (days, weeks, months) until it is able to compromise morethan f replicas between recoveries. This means that diversity in the spacedomain should be complemented with diversity in the time domain: recoveriesshould introduce diversity.
The goal of FOREVER is to address some ofthe challenges identified above by developing a Fault/intrusiOn REmoVal throughEvolution & Recovery (FOREVER) service. This service can be used to enhancethe resilience of replicated systems, namely those that can be affected bymalicious attacks. As the name implies, the main objective of the service is toremove faults and intrusions by combining both evolution and recoverytechniques. In order to achieve this goal, the work will be divided in threemain tasks: