Skip to content

Revision Call Graph format

Mehdi edited this page Jul 21, 2020 · 18 revisions

This page describes JSON formats for textual, permanent storage of information associated with a revision, such as the product, version, call graph, etc. Currently, three different languages are supported; C, Java, and Python.

Table of Contents

  1. C
  2. Java
  3. Python

C

The following format describes C products from the Debian ecosystem.

{
  "forge": "debian",
  "release": "bullseye",
  "product": "mutt",
  "version": "1.13.2-1",
  "source": "mutt",
  "architecture": "amd64",
  "generator": "cscout",
  "timestamp": "1576731656",
  "depset": [
    {
      "product": "libncursesw6",
      "forge": "debian",
      "architectures": "",
      "constraints": "[6,)",
      "dependency_type": "Depends",
      "is_virtual": false,
      "alternatives": []
    },
    {
      "product": "libc6",
      "forge": "debian",
      "architectures": "",
      "constraints": "[2.17,)",
      "dependency_type": "Depends",
      "is_virtual": true,
      "alternatives": []
    },
    {
      "product": "default-mta",
      "forge": "debian",
      "architectures": "",
      "constraints": "",
      "dependency_type": "Suggests",
      "is_virtual": true,
      "alternatives": [
        {
          "product": "mail-transport-agent",
          "forge": "debian",
          "architectures": "",
          "constraints": "",
          "dependency_type": "Suggests",
          "is_virtual": true,
          "alternatives": []
        }
      ]
    }
  ],
  "build_depset": [
    {
      "product": "zlib1g-dev",
      "forge": "debian",
      "architectures": "",
      "constraints": "",
      "dependency_type": "Build-Depends",
      "is_virtual": false,
      "alternatives": []
    }
  ],
  "undeclared_depset": [
    {
      "forge": "debian",
      "product": "libc6-dev",
      "architectures": "",
      "constraints": "",
      "dependency_type": "",
      "is_virtual": false,
      "alternatives": []
    }
  ],
  "graph": {
    "externalCalls": [
      [0, "///libc.so;C/printf()"],
      [0, "///libc.so;C/fread()"],
      [1, "///libncurse.so;C/endwin()"],
      [1, "//libc6-dev/;%2Fusr%2Finclude%2Fx86_64-linux-gnu%2Fsys/stat.h;S_ISREG()"]
    ],
    "internalCalls": [
      [1, 0]
    ]
  },
  "cha": {
    "binaries": {
      "mutt": {
        "/mutt;C/txt2c()": {
          "id": 0,
          "files": [
            "txt2c.c"
          ],
          "metadata": {
            "access": "public",
            "defined": true,
            "first": "109",
            "last": "116"
          }
        },
      }
    },
    "static_functions": {
      "/;./addrbook.c;RSORT()": {
        "id": 1,
        "files": [
          "addrbook.c"
        ],
        "metadata": {
          "access": "static",
          "defined": true,
          "first": "11",
          "last": "17"
        }
      }
    }
  }
}

Code Fragment 1: A simple example of JSON storage for C revisions.

Detailed description

Forge: The name of the forge associated with this revision.

Release: The name of the Debian release that includes this revision (Debian Specific).

Product: The name of the product associated with this revision.

Version: The version associated with this revision.

Source: The source of the related product (Debian Specific).

Architecture: The architecture associated with this revision.

Generator: The name of the call graph generator that created this graph.

Timestamp: A timestamp related to this revision, in UNIX time (seconds from the Unix epoch).

depset: An array of JSON objects representing the binary dependencies of a revision. The format of a dependency object is described after this list.

build_depset: An array of JSON objects representing the build dependencies of a revision. The format of a dependency object is described after this list.

undeclared_depset: Products related to functions found in the Graph and are not included in depset or build depset. Usually, these products are included in Debian essentials.

Graph: An object containing a list of internal and external calls. Each internal call is a list of integers that represents a call using the IDs of source and target method, both of which belong to the analyzed product. IDs are available in the cha. Each external call is a list of strings. The first string represents the ID of the source method, whereas the second string is a Fasten URI of an external callable.

CHA: an object with three keys. The first one, called "binaries", contains the executables and the libraries of the revision as keys, and a map containing FASTEN URIs of current revision as keys and type objects as values. A type object is composed of a key id (an integer), and a key files that is associated with a list of filenames. The second one, called "static_functions", contains a map with all the static functions of the revision in FASTEN URI format as keys and type objects as values. The third one, called "metadata" and contains extra information about functions. Specifically, it has four keys; (1) access (i.e., public or static), (2) first (i.e., first line number of function), (3) last (i.e., last line number of function), (4) defined (i.e. whether the call graph generator has detect the definition of the function).

Dependency objects: Dependency objects are JSON objects that composed the depsets of a revision. This format is designed for Debian dependencies. Each object has:

  • Key product
  • Key forge
  • Key architecture which consisting of a list of Debian architectures
  • Key constraints that contains a string with the constraint of the version for this dependency in the maven format
  • Key dependency_type that for binary dependencies can take one of the following values; Depends, Recommends, Suggests, Enhances, Pre-Depends. Whereas for build dependencies, one of; Build-Depends, Build-Depends-Indep, Build-Depends-Arch, Built-Using.
  • Key is_virtual, which takes a boolean value to declare if a dependency is virtual or not.
  • Key alternatives that has associated an array of other dependency objects if this dependency has alternatives. Note that the dependency that has a list of alternatives is the highest priority alternative.

Java

Version 1

{
  "product": "foo",
  "forge": "mvn",
  "generator": "bar",
  "depset": [
    [
      { "product": "a", "forge": "mvn", "constraints": ["[1.2..1.5]", "[2.3..]"] },
      { "product": "b", "forge": "mvn", "constraints": ["[2.0.1]"] }
    ]
  ],
  "version": "3.10.0.7",
  "cha": {
    "/name.space/A": {
      "methods": {
        "0": "/name.space/A.A()%2Fjava.lang%2FVoidType",
        "1": "/name.space/A.g(%2Fjava.lang%2FString)%2Fjava.lang%2FInteger"
      },
      "superInterfaces": [ "/java.lang/Serializable" ],
      "sourceFile": "filename.java",
      "superClasses": [ "/java.lang/Object" ]
    }
  },
  "graph": {
    "internalCalls": [ 
      [ 0, 1 ] 
    ],
    "externalCalls": [
      [ "1", "///their.package/TheirClass.method()Response", { "invokeinterface": "1" } ]
    ]
  },
  "timestamp": 123
}

Code Fragment 2: A simple example of version 1 JSON storage for a JAVA revision.

Detailed description

product The name of the product associated with this revision.

forge The name of the forge associated with this revision.

generator The name of the call graph generator that created this graph.

depset An array, containing nested arrays of JSON objects representing a dependency set. Each nested array represents a set of dependencies within the profile in pom.xml in case profiles are present the given project. Otherwise all dependencies are put in the first nested array. Each dependency object has a key product, a key forge, and a key constraints. The latter has associated an array of strings representing (possibly semiopen) intervals of versions using the syntax "[.. v]" (versions up to v, inclusive), "[v ..]" (versions starting from v, inclusive), "[v .. w]" (versions from v to w, both included) or "[v]" (version v).

version The version associated with this revision.

cha A map containing a FASTEN URI of a class as a key and associated with it Type. Type contains the following information:

  • "methods" A map of methods implemented in the given class with a numerical index serving as a key and a method in schemeless canonical form as a value;
  • "superInterfaces"A list of super interfaces of a given class;
  • "sourceFile" A source file name;
  • "superClasses" A list of classes that this type inherits from in the order of instantiation.

graph An object, containing a list of internal and external calls. Each internal call is a list of integers that represents a call using the IDs of source and target method, both of which belong to the application scope. The first element of the list is the ID of the source method, and the second one is the target's ID. IDs are available in the cha. Each external call is a map containing a pair of elements, representing a call, as a key, and key-value metadata about the call as a value. The tuple keeps the ID of the source method in the left element and the FASTEN URI of the target method in the right element. The metadata per call is stored as a map that keys and values are Strings.

timestamp A timestamp associated with this revision, in UNIX time (seconds from the Unix epoch).

Version 2

{
    "product": "SingleSourceToTarget",
    "nodes": 4,
    "forge": "",
    "generator": "",
    "version": "",
    "cha": {
        "externalTypes": {"/java.lang/Object": {
            "access": "",
            "methods": {"3": {
                "metadata": {},
                "uri": "/java.lang/Object.Object()VoidType"
            }},
            "final": false,
            "superInterfaces": [],
            "sourceFile": "",
            "superClasses": []
        }},
        "internalTypes": {"/name.space/SingleSourceToTarget": {
            "access": "public",
            "methods": {
                "0": {
                    "metadata": {
                        "access": "public",
                        "last": 3,
                        "first": 3,
                        "defined": true
                    },
                    "uri": "/name.space/SingleSourceToTarget.SingleSourceToTarget()%2Fjava.lang%2FVoidType"
                },
                "1": {
                    "metadata": {
                        "access": "public",
                        "last": 7,
                        "first": 6,
                        "defined": true
                    },
                    "uri": "/name.space/SingleSourceToTarget.sourceMethod()%2Fjava.lang%2FVoidType"
                },
                "2": {
                    "metadata": {
                        "access": "public",
                        "last": 10,
                        "first": 10,
                        "defined": true
                    },
                    "uri": "/name.space/SingleSourceToTarget.targetMethod()%2Fjava.lang%2FVoidType"
                }
            },
            "final": false,
            "superInterfaces": [],
            "sourceFile": "SingleSourceToTarget.java",
            "superClasses": ["/java.lang/Object"]
        }},
        "resolvedTypes": {}
    },
    "graph": {
        "internalCalls": [[
            "1",
            "2",
            {"0": {
                "receiver": "/name.space/SingleSourceToTarget",
                "line": 6,
                "type": "invokestatic"
            }}
        ]],
        "externalCalls": [[
            "0",
            "3",
            {"1": {
                "receiver": "/java.lang/Object",
                "line": 3,
                "type": "invokespecial"
            }}
        ]],
        "resolvedCalls": []
    },
    "timestamp": 0
}

Code Fragment 2: A simple example of version 2 JSON storage for a JAVA revision.

Here we describe the new features added to the RevisionCallGraph version 2.

cha These features are added to the cha of the version 2:

  • Types are divided into internalTypes, externalTypes and resolvedTypes. Internals and externals indicate the classes that are declared in the current revision or libraries of the current revision, but resolvedTypes are the types that are the result of the stitching algorithm. In the other world, resolvedTypes are the externalTypes that are resolved after the stitching is done;
  • External nodes are indexed the same as internal nodes. This results in a uniform representation of all nodes as opposed to using // for external nodes (previous approach). This feature also helps us to capture callbacks from the library revisions;
  • metadata per node is added:
    • first is the first line number of the method;
    • last is the last line number of the method;
    • access indicates the access identifier of the method e.g. public or private;
    • defined whether the method has a definition or not, i.e. if the method has a body this field is true if the method doesn't have a body (interface, abstract) this field is false.
  • metadata per type is extended:
    • access the access modifier of the type which can be public or packagePrivate;
    • final is a boolean that indicates if a class is final or not.

graph These features are added to the graph of the version 2

  • External calls use the keys defined in the cha instead of URIs, same as internal calls.
  • Call-site sensitive metadata per edge is added to the graph:
    • pc the PC in which the call-site exists. For each edge, we use the pc as a key to a value including line, type, and receiver;
    • line the line number in which the call has happened. Line numbers can not be used as a key for call-site since a line might result in multiple bytecode instructions. However pc indicates a single bytecode instruction and can be used as a key;
    • type type of JVM call used in the bytecode instruction e.g. invokespecial;
    • receiver the receiver type of the call, i.e the type that is used to call a method.

Python


    "product": "foo",
    "forge": "PyPI",
    "generator": "PyCG",
    "depset": [
        {"product": "a", "forge": "PyPI", "constraints": ["[1.2..1.5]","[2.3..]"},
        {"product": "b", "forge": "PyPI", "constraints": ["[2.0.1]"]}
    ],
    "version": "3.10.0.7",
    "modules": {
        "/module.name/": {
            "sourceFile": "module/name.py",
            "namespaces": {
                "0": {
                    "namespace": "/module.name/",
                    "metadata": {
                        "first": 1,
                        "last": 15
                    }
                },
                "1": {
                    "namespace": "/module.name/Cls",
                    "metadata": {
                        "first": 4,
                        "last": 8
                    }
                },
                "2": {
                    "namespace": "/module.name/Cls.func()",
                    "metadata": {
                        "first": 9,
                        "last": 13
                    }
                }
            }
        },
        "/other.module/": {
            "sourceFile": "other/module.py",
            "namespaces": {
                "3": {
                    "namespace": "/other.module/",
                    "metadata": {
                        "first": 1,
                        "last": 10
                    }
                },
                "4": {
                    "namespace": "/other.module/Parent",
                    "metadata": {
                        "first": 3,
                        "last": 10
                    }
                }
            }
        }
    },
    "cha": {
        "1": [4, "//external//external.package.SomeClass"],
        "4": ["//external//external.package.SomeClass"]
    },
    "graph": {
        "internalCalls": [
            [0, 1],
            [0, 2]
        ],
        "externalCalls": [
            [2, "//external//external.package.method"]
        ]
    },
    "timestamp": 123
}

Code Fragment 3: A simple example of JSON storage for Python revisions.

product: The name of the product associated with this revision.

forge: The name of the forge associated with this revision.

generator: The name of the call graph generator that created this graph.

depset: An array consisting of JSON objects. Each JSON object describes a specific dependency by its product, forge and a constraints list. The latter, follows the maven format for describing contraints.

version: The version associated with this revision.

modules: An object describing modules and the namespaces defined inside them. The internal URI formats of the module names are used as keys while the values are objects with the following items:

  • sourceFile: The source file location of the module relative to the package root.
  • namespaces: An object with keys being unique identifiers and values being objects describing a namespace and its metadata, namely the first and last line of the namespace. The module's namespace is added to this list and assigned a unique identifier, along with the classes and functions defined inside it. Note that function URIs are suffixed by "()".

cha: An object describing the Method Resolution Order of each class. The unique integer identifiers assigned to each class in the modules object are used as keys, and the values are an ordered list decribing the MRO for that class. The MRO list does not contain the class itself. For internal predecessors, the MRO items are replaced by their respective unique identifier.

graph: An object, containg lists of internal and external calls. Each call is denoted by a tuple (source, dest) and describes a call from source to dest. For internal calls, the unique integer identifiers assigned to the source and destination in the modules object are used to describe the call. For external calls, the unique identifier of the source is used and the destination is an external FASTEN URI.

timestamp: A timestamp related to this revision, in UNIX time (seconds from the Unix epoch).