Pharo Libclang FFI, part 4, AST walking with visitors & callbacks

Okay, so we’ve got most of the parts ready. In the last part we managed to load the AST. Now lets do something useful with it. Traversing the tree is done uses a visitor pattern that supplies cursors to a callback function that define locations in the tree.  To the original C code from part 3 we add: the callback function, which I’ve called acceptCursorCallback(); and the callout function clang_visitChildren(), which traverses the tree and invokes the callback function for each node it visits.

Doing it in C

Adding the red lines to simple.c.

#include <clang-c/Index.h>
#include <stdio.h>

void show_cursor_kind(CXCursor cursor)
{
    enum CXCursorKind cursorKind  = clang_getCursorKind(cursor);
    CXString kindName  = clang_getCursorKindSpelling(cursorKind);
    CXString entityName = clang_getCursorSpelling(cursor);
    printf("%03d  %s(%s)\n",
        cursorKind,
        clang_getCString(kindName),
        clang_getCString(entityName));
    clang_disposeString(kindName);
    clang_disposeString(entityName);
}

// accept cursor callback
enum CXChildVisitResult
acceptCursorCallback(CXCursor cursor,
                     CXCursor parent,
                     CXClientData clientData)
{
	show_cursor_kind(cursor);
	return CXChildVisit_Break;
}

int main( int argc, const char *const * argv )
{   // Process arguments
    if( argc < 2 )
    {   printf("Usage: %s inputfile {clang-options}\n", argv[0] );
        return -1;
    }
    const char * mainSourceFilename = argv[1];
    const char *const* options = argv + 2;
    int optionCount = argc - 2;

    // Create the Index and Transalation Units
    CXIndex index = clang_createIndex( 0, 0 )
    CXTranslationUnit TU =
            clang_createTranslationUnit(index, mainSourceFilename);
    if( !TU )
    {   printf("Failed to get Translation Unit\n");
        return -1;
    }

    // The root cursor of a TU is the translation unit itself
    CXCursor rootCursor  = clang_getTranslationUnitCursor( TU );
    show_cursor_kind(rootCursor);

    // invoke visitor
    int treeLevel = 1;
    clang_visitChildren(rootCursor, acceptCursorCallback, &treeLevel);

    clang_disposeTranslationUnit( TU )
    clang_disposeIndex( index );
    return 0;
}

We’ll look into the callback function acceptCursorCallback() first. Given a cursor, we just show the same cursor information as we did in part 3.    Our callback must answer a CXChildVisitResult to direct the visitor on its next traversal action, being either:

  1. CXChildVisit_Break – terminate visitor traversal
  2. CXChildVisit_Continue – next sibling node of current cursor. Does not process children of cursor.
  3. CXChildVisit_Recurse – dive depth first into traversing the children .

To see how these work in practice we’ll try each in turn.   CXChildVisit_Break is up first as you can see in the code above.

We pass our callback function to the libclang function clang_visitChildren(), which takes three arguments…

  • a cursor at the root of the tree, which we got from the translation unit in part 3;
  • our callback function (above) which is invoked for each visited node;
  • some optional custom client data (later we’ll use it to store indent level, but for now we leave it alone)

Note that treeLevel is not used until part 5.

Compile and run and we get…
$ clang-3.5 -I/usr/lib/llvm-3.5/include -lclang -o simple simple.c
$ ./simple foo.ast 
Processing foo.ast
300  TranslationUnit (/home/ben/Apps/moose_suite_6_0/libclang-play/foo.c)
008  FunctionDecl (addtwo)

Comparing the last line output to the ast-dump in part 1, it looks like we’re on track. The visitor only went one step – as we told it to.   Now we’ll tell the visitor to continue stepping along the cursor’s siblings – just an alternative callback return value…


enum CXChildVisitResult
acceptCursorCallback(CXCursor cursor,
                     CXCursor parent,
                     CXClientData clientData)
{
	show_cursor_kind(cursor);
	return CXChildVisit_Continue;
}

Compile and run and we get…
$ clang-3.5 -I/usr/lib/llvm-3.5/include -lclang -o simple simpl.c
$ ./simple foo.ast
Processing foo.ast
300  TranslationUnit (/home/ben/Apps/moose_suite_6_0/libclang-play/foo.c)
008  FunctionDecl (addtwo)
008  FunctionDecl (main)

In foo.c there was only one more sibling at the same level as the addtwo function.  Now lets try the third traversal option…


enum CXChildVisitResult
acceptCursorCallback(CXCursor cursor,
                     CXCursor parent,
                     CXClientData clientData)
{
	show_cursor_kind(cursor);
	return CXChildVisit_Recurse;
}

Compile and run and we get…
$ clang-3.5 -I/usr/lib/llvm-3.5/include -lclang -o simple simpl.c
$ ./simple foo.ast
Processing foo.ast
300  TranslationUnit (/home/ben/Apps/moose_suite_6_0/libclang-play/foo.c)
008  FunctionDecl (addtwo)
010  ParmDecl (number)
202  CompoundStmt ()
214  ReturnStmt ()
114  BinaryOperator ()
100  UnexposedExpr (number)
101  DeclRefExpr (number)
106  IntegerLiteral ()
008  FunctionDecl (main)
202  CompoundStmt ()
231  DeclStmt ()
009  VarDecl (result)
103  CallExpr (addtwo)
100  UnexposedExpr (addtwo)
101  DeclRefExpr (addtwo)
106  IntegerLiteral ()

We can see this matches the ast-dump from part 1 – but without the indenting (we’ll try that later).

And now in Pharo…

The required C declarations we need to implement in Pharo are:

  • The client data…


typedef void *CXClientData;

which we won’t use right away so we define as a place holder like this…

FFIExternalObject subclass: #CXClientData
    uses: TLibclangLibrary
    instanceVariableNames: ''
    classVariableNames: ''
    package: 'Libclang'
  • The callout function…

unsigned
clang_visitChildren(CXCursor parent,
                    CXCursorVisitor visitor,
                    CXClientData client_data);

can be defined as…

FFICallback subclass: CXCursorVisitor    instanceVariableNames: ''classVariableNames: ''package: 'Libclang' 

Libclang class >> clang_visitChildren__parentCursor: parent
                                       visitorCallback: visitor
                                       clientData: client_data
    ^ self ffiCall: #( uint clang_visitChildren(
	 			CXCursor parent,
				CXCursorVisitor visitor,
				CXClientData client_data))
  • The return value of our callback function…
enum CXChildVisitResult {
      CXChildVisit_Break,
      CXChildVisit_Continue,
      CXChildVisit_Recurse }; 

which can be defined like…

FFIExternalEnumeration subclass: #CXChildVisitResult
    instanceVariableNames: ''
    classVariableNames: ''
    package: 'Libclang'
FFIExternalEnumeration class >> enumDecl
    ^ #(    CXChildVisit_Break 1
	       CXChildVisit_Continue 2
	       CXChildVisit_Recurse 3   )
FFIExternalEnumeration class >> initialize
	"self initialize"
	self initializeEnumeration

Don’t forget to evaluate that comment to initialize the class.

  • The callback function to accept cursors from the visitor…

typedef enum CXChildVisitResult
(*CXCursorVisitor)(CXCursor cursor,
                   CXCursor parent,
                   CXClientData client_data);

which we’ll see how to define in a unit test based on simple.c.  Our tests need to access the CXChildVisitResult enumeration, so we add it to the pool dictionaries.  We want to run essentially the same code for the three types of return values, so we’ll factor out the common code like this…


TestCase subclass: #LibclangTest
  uses: TLibclangLibrary
  instanceVariableNames: ''
  classVariableNames: ''
  poolDictionaries: 'CXCursorKind CXChildVisitResult'
  package: 'LibclangPractice'
LibclangTest >> visitChildrenCBReturning: aCXChildVisitResult
  |rootCursor rootClientData acceptCallbackFn visitResult|
  rootCursor := self getRootCursor.
  rootClientData := CXClientData new.
  visitResult := OrderedCollection new.

  acceptCallbackFn :=
    FFICallback
      signature:  #( CXChildVisitResult (
                          CXCursor cursor,
                          CXCursor parent,
                          CXClientData client_data))
      block: [ :cursor :parent :clientData |
          visitResult add: cursor kind item.
          aCXChildVisitResult value "#value not required in Pharo 6" ].
Libclang clang_visitChildren__parentCursor: rootCursor
                           visitorCallback: acceptCallbackFn
                                clientData: rootClientData.
         ^ visitResult asArray

Then define the tests for the three different callback return values…

LibclangTest >> testVisitChildrenCallbackBreak
  | expectedResult visitResult |
  expectedResult := #(          #CXCursor_FunctionDecl ).
  visitResult := self visitChildrenCBReturning: CXChildVisit_Break.
  self assert: visitResult equals: expectedResult.
LibclangTest >> testVisitChildrenCallbackContinue
  | expectedResult visitResult |
  expectedResult := #(
         #CXCursor_FunctionDecl
         #CXCursor_FunctionDecl ).
  visitResult := self visitChildrenCBReturning: CXChildVisit_Continue.
  self assert: visitResult equals: expectedResult.
LibclangTest >> testVisitChildrenCallbackRecurse
  | expectedResult visitResult |
  expectedResult := #(
          #CXCursor_FunctionDecl
          #CXCursor_ParmDecl
          #CXCursor_CompoundStmt
          #CXCursor_ReturnStmt
          #CXCursor_BinaryOperator
          #CXCursor_UnexposedExpr
          #CXCursor_DeclRefExpr
          #CXCursor_IntegerLiteral
          #CXCursor_FunctionDecl
          #CXCursor_CompoundStmt
          #CXCursor_DeclStmt
          #CXCursor_VarDecl
          #CXCursor_CallExpr
          #CXCursor_UnexposedExpr
          #CXCursor_DeclRefExpr
          #CXCursor_IntegerLiteral ).
  visitResult := self visitChildrenCBReturning: CXChildVisit_Recurse.
  self assert: visitResult equals: expectedResult.

Parting words

The next part shows the use of client data to track the indent level during recursive calls to the visitor so the text output shows the depth of AST elements.

This entry was posted in FFI, Pharo. Bookmark the permalink.

Leave a Reply