Purpose: libErator automatically generates drivers (unit-tests) starting from a library source.
The whole framework is composed of three main components:
- Static analyzer: it takes a library soure code an emits a list of constraints -
./condition_extractor
. - Driver generator: it uses the library constraints (from the static analyzer) and synthetizes the drivers (+ seeds) -
./tool/main.py
. - Fuzzing: we use slightly customized libfuzz to fuzz the new generated drivers -
./custom-libfuzzer
.
The environment has been designed for VS Code. The Dockerfile in the root folder builds the container, that will be used as development environment in VS Code.
To install Docker extension and make your dev env:
Tips to use your GH SSH from inside the Docker
https://code.visualstudio.com/docs/remote/containers#_sharing-git-credentials-with-your-container
If you do not remember how to Docker your user
From here.
TL;DR;
# if not docker group exists
$ sudo groupadd docker
$ sudo usermod -aG docker $USER
# IMPORTANT: logout/login
# IMPORTANT 2: kill/exit tmux sessions (be careful to other tmux users with open sessions)
Remember to disable multithreading
Important: do not turn multithreading on, reboot the machine instead
echo off | sudo tee /sys/devices/system/cpu/smt/control
NSF tricks
# install client
sudo apt install nfs-common
mkdir $LOCAL_FOLDER
sudo mount $SERVER_IP:$REMOTE_FOLDER $LOCAL_FOLDER
Preliminary installation
In the host, run the script:
./preinstall.sh
It will install the minimal Python packages and compile our custom libfuzzer version.
Ad-hoc guides for specific components of the project
These are the challenges that we face, and the solution proposed. We further indicate if the solution requires manual intervention or is completely automatic.
- Correct sequence of APIs:
- control-flow -> a valid chain of functions
- data-flow -> assign coherent arguments to each function
automatic: both solved with NDA, which guarantees the creation of a sequence of APIs whose function + arguments match the contraints of a context. The constraints are modelled as list of fields manipulated.
- source/sink identification: depends on the type-system/philosophy used in the library.
automatic: we systematize a few strategies and associate a policy for each of them. Each policy finds source/sink APIs. We then automatically identify the most suitable policy for a library.
- Data constraints -> the arguments should adhere to some data constraints:
- args used as a string -> add NULL to the end
- args used as an array -> prepare an array of objects
- args used as a file -> prepare a temporary file to store information
- args modified by the API -> known it is an output
- dependencies, we know two arguments are used together (e.g., in a
if-condition or in a loop). This info allows us to define few policies:
- arg + len: one arg is an array and another arg controls its length.
- arg addr + arg addr: the API expects two arguments that belong to the same buffer argument.
- coherent cleanup: we infer which function correctly de-allocate an object
(i.e.,
fclose(pf)
). If it is not possible, we fallback tofree()
if the object is allocated in heap. - args used for malloc-length: we cap the malloc allocation to avoid OOM
automatic: For each API's argument, we infer a set of constraints, which combined with some policies, allow us to automatically prepare the driver and increase its stability.
- Data initialization -> some library expects the user to initialize data before
interacting with the APIs. These operations fall beyond the API code analysis.
- chain of objects -> the library expects the user to create a chain of objects that point to each other, without using any APIs
- callbacks -> for testing reason, the user should prepare a set of callbacks to test specific library functionalities
- var arg functions -> should we care of these cases (??)
manual: we require an operator to define a small set of templates to be used for initializing these cases, e.g., a set of minimal callbacks, or simple patterns to initialize a struct. Then our driver generator leverages this info to build valid drivers. We also estimate the number of manual templates required and their types.
- Useful header -> what are the headers useful for a consumer?
manual: we require an operator to indicate the public header files of a library. Many headers are supposed to be private but they are installed as publicly accessible, thus, it is not possible to infer which is which automatically.