2013-10-02

My thesis project is about automatic evaluation of object-oriented design by means of automated code analysis; in other words evaluate the design of a program automatically.

I found the need for a tool that would allow to me to extract the code model from a program and then query over that model in order to extract information and metrics about the program's design.

The characteristics I was looking for in a code analysis tool where:

  1. The code model extractor must process source code or intermediary code Java or .Net

    Rationale: If we are able to analyze bytecode or MSIL we can use the same tool for any language of the given platform.
    Acceptance Criteria: A proof of concept must be implemented that is able to extract the code model in the form of objects of the given platform.

  2. It must possible to invoke the tool from within the code as objects, without having to go through a GUI or CLI.

    Rationale: This is to easily integrate the tools as part of another project without hassle or strange behavior (i.e. opening a GUI, etc)
    Acceptance Criteria: A proof of concept must be created that invokes the tool in a programmatic manner without using the CLI.

  3. It must be able to provide:

    • Number of lines of code in a method, class or namespace (package).
    • Method calls between classes
    • Location of method calls between classes
    • Method and member signatures of a class
    • Enumeration of classes, abstract classes and interfaces

    Rationale: This features are needed in order to perform the code analysis, because that will provide the data we will measure and evaluate. With those features it is possible to obtain all the Chidamber & Kemerer metric suite which are fundamental for code analysis.
    Acceptance Criteria: A proof of concept that can obtain each and one of the listed items programmatically.

  4. The extracted code model must be readable in-memory, directly or indirectly.

    • It is allowed for the tool to generate a file first and then read that file to generate an in-memory model

    Rationale: The code model must be representable in-memory in order to use for automatic evaluation.
    Acceptance Criteria: A proof of concept must be created where the code model is fully representable in memory.

Projects Considered

  • NDepend (Winner)
  • Microsoft CCI + AST (Runner-Up)
  • Microsoft Project Roslyn
  • iPlasma
  • Nitriq

Nomenclature

Levels of compliance
  1. Very low compliance (Very small part or nothing of the proof of concept can be implemented)
  2. Low compliance (Some parts of the proof of concept cannot be implemented)
  3. Medium compliance (Possible to implement the proof of concept with a lot of extra code)
  4. High compliance (Can implement the proof of concept with some extra code)
  5. Very high compliance (Easy to implement the proof of concept)

Overview

Requirement Microsoft Project Roslyn Microsoft CCI + AST iPlasma Nitriq NDepend*
Process intermediary or source code 1 5 4 5 5
Invokable programatically 4 4 1 1 5
Code Information provided 4 4 5 5 5
In-memory code model 5 5 5 3 5

Detailed analysis for NDepend

  • It can process any strongly typed .net language: C++, C#, VB, J#, JScript.Net
  • NDepend can be called programmatically via its NDepend.API by using the NDependProvider class and IProjectManager and IProject classes respectively.
  • The NDepend.API exposes an easy to 'analyze' interface that allows the programmer to write queries over code structure in a fluent way, it also comes with many extensions for filtering and navigating code structure, and calculating metrics.
  • The code model, the analysis and results generated programmatically with the NDepend.API can be manipulated in-memory and stored to a file, by using the NDepend.API.

Detailed analysis for Nitriq

  • It can process any strongly typed .net language: C++, C#, VB, J#, JScript.Net
  • Nitriq is closed source, as such, it is not meant to be used without a GUI or CLI. Neither does it expose an API to use it programmatically. The only way would be to use Reverse Engineering and .net reflection for any interaction with the engine, but this might not be legal according to their license terms.
  • Nitriq has an easy to use language for defining 'nitriq queries', but at most, these can be called using the Nitriq.Console, and not programmatically, making it hard to integrate with other tools.
  • Nitriq uses its own memory representation of the meta-model, known only by its developers.

Detailed analysis for iPlasma

  • It uses an intermediary meta-model representation of the design, currently there are extractors for Java, C++ and C#. It is possible to write another extractor for any other language needed.
  • Its too complicated, a proof of concept test could not be programmed because there is not sufficient documentation on the iPlasma project. It is unknown if it is legal to do this according to their License Terms (which I did not find).
  • The visual tool can obtain every specified item for requirement #3, and more; because the visual tool comes with previously programmed 'plugins'.
  • The model is extracted directly from source code, and is represented in a modeling language called MEMORIA.

Detailed analysis for Microsoft CCI + AST

  • It can process any static .Net language that is compiled to MSIL:C++, C#, VB, J#, JScript.Net.
  • It can be invoked programmatically without the need of the CLI; but it requires a lot of code to do the simplest of things like get the methods in a class.
  • Complies with requirement #3, and its API is written to do code analysis. The problem is that there is very little or outdated documentation for its API.
  • The model is totally generated in-memory, without an intermediary format.

Detailed analysis for Microsoft Project Roslyn

  • It can only process C# or VB, using directly their respective compiler and language extensions.
  • It can be invoked programmatically without the CLI; but it requires a lot of code to do the simplest of things like get the methods of a class.
  • Code analysis can be done with this tool, but detailed knowledge of Roslyn is required to do this, because Roslyn is not specifically created for code analysis.
  • The model is totally generated in-memory, without an intermediary format.

Conclusion: NDepend wins

NDepend is clearly the best choice for a project with the requirements specified at the beginning of this article, thanks to its API for fluently writing queries over NDepend's extracted code model, you can start implementing any code structure or metrics based analysis right away.

If you don't have the $$$ to get a NDepend license, then Microsoft's CCI + AST open source project is a decent option, but get ready to write a lot of code to extract any code metrics or structure from a program (.dll), from what I determined, an extra layer of abstraction over CCI + AST is needed in order to do anything useful with it.



blog comments powered by Disqus