Skip to content

GhostVariables

Jorge Navas edited this page Apr 10, 2022 · 10 revisions

Ghost variables are variables that do not appear in the program. They are created dynamically (i.e., during the analysis) by abstract domains to express complex properties which typically help to improve the expressiveness of the analysis. In Crab, memory and array domains create ghost variables. For instance, memory domains use ghost variables to represent the offset of a pointer or the size of an allocated object. Array domains can represent each array cell with a ghost variable that represents its contents. Ghost variables are also very useful when low-level properties are modeled involving precise bitwise reasoning (e.g., to represent the k least or most significant bits separately) although Crab does not provide such a domains at the moment.

However, the implementation of ghost variables in Crab has been tricky because there is a tradeoff between modularity and performance. Modularity is a very important property for an abstract domain. This is the idea that complex abstract domains can be composed of a stack and/or products of simpler abstract domains without making any assumptions apart from providing sound abstract operations. However, when one of the simpler domains operates on a ghost variable created by another domain, modularity can be broken. Whenever the domain that created a ghost variable renames it with another ghost variable (this can happen during binary operations such as join, meet, etc) the simpler domain that uses that ghost variable needs to be notified. One way of doing this is by heavily using the abstract rename operation which is implemented by all the Crab abstract domains. At each binary operation, the domain that creates ghost variables generates a new map from program variables to their ghost variables and communicate recursively to all its dependent domains by calling rename. Unfortunately, this operation can be very expensive. For instance, environment maps which are implemented by Patricia trees do not implement this operation efficiently and their sharing properties can be completely destroyed if too many rename operations take place.

How Crab manages ghost variables

In Crab, each abstract domain is responsible for its ghost variables. There are currently two different strategies to manage ghost variables:

  1. Crab variable factory provides a method get that returns always the same variable for a given fixed pair of variable and string. With this method, we can create ghost variables from an existing variable v and a string that defines the role of the ghost variable. For instance, get(v, ".offset"), get(v, ".size"), etc create ghost variables to represent the offset of a pointer, the size of an allocated object, etc. However, this solution only works under two assumptions:

a. The roles can be efficiently named. For instance, this might not be the case if we use ghost variables to represent symbolically array slices and these slices are determined by complex linear constraints which cannot be efficiently extracted.

b. This solution is sound only if the variable v is a program variable or if v is another ghost variable then it cannot be changed during the analysis.

Currently, Crab memory domains use this approach because they never act as subdomains or reduced product with the other domains. Note that this might change in the future and modularity in that case should be revisited.

  1. Each domain that uses ghost variables creates an internal map from program variables to its ghost variables and use abstract rename operation to communicate to other domains changes in the ghost variables during binary operations (join, meet, widening, etc). This is an expensive approach as described above and therefore, it should be used when #1 is not an option.

Other known solutions

  • Sharing Ghost Variables in a Collection of Abstract Domains by Marc Chevalier and Jerome Feret published in VMCAI 2020. In this work, authors address exactly the same problem described above and provide a clean and elegant way to deal with ghost variables by defining the Reduced Product with Ghost Variables. Authors reported 3x slowdown wrt to a baseline implementation that provided support for ghost variables in a hacky way.
Clone this wiki locally