Pharo PDF Rendering, part 2, UFFI interfacing PDFium

Following on from Part 1 where we built PDFium from source into a shared library, we will replicate in Pharo the C example presented at the end of Part 1.  Lets review the  declaration prototypes of the function used, which we’ll need to implement in Pharo.

void          FPDF_InitLibrary()
void          FPDF_DestroyLibrary()
FPDF_DOCUMENT FPDF_LoadDocument(FPDF_STRING file_path, FPDF_BYTESTRING password)
unsigned long FPDF_GetLastError()
int           FPDF_GetPageCount(FPDF_DOCUMENT document)
int           FPDF_GetPageSizeByIndex(FPDF_DOCUMENT document, int page_index,
                                      double* width, double* height)
void          FPDF_CloseDocument(FPDF_DOCUMENT document)

So first we’ll define the basic scaffolding for our UFFI library interface to the Part 1 created shared library libpdfium.so.

FFILibrary subclass: #PDFium
	   instanceVariableNames: ''
	   classVariableNames: ''
	   package: 'PDFium'

PDFium >> unixModuleName
	^'/home/ben/Repos/PDFium/pdfium/out/shared/libpdfium.so'

PDFium class >> ffiLibraryName
	^PDFium

TestCase subclass: #PDFiumTest
	instanceVariableNames: 'missingPdf helloPdf'
	classVariableNames: ''
	package: 'PDFium'

PDFiumTest >> setUp
        missingPdf := 'm!ss!ng.pdf'.
        helloPdf := '/home/ben/Repos/PDFium/pdfium/testing/resources/hello_world.pdf'.
	self assert: helloPdf asFileReference exists.

Then to the class-side we’ll add the ffi functions for library configuration, with a simple test to check whether the callout fails or crashes.

PDFium class >> FPDF_InitLibrary
    ^self ffiCall: #( void FPDF_InitLibrary()  ) 

PDFium class >> FPDF_DestroyLibrary
    ^self ffiCall: #( void FPDF_DestroyLibrary()  )

PDFiumTest >> testLibraryConfiguration
    PDFium FPDF_InitLibrary.
    PDFium FPDF_DestroyLibrary.

Aside: The first time I ran #testLibraryConfiguration I got an “Error: No module to load address from” because I’d left out the return symbol from #unixModuleName.  And after correcting it I needed to restart Pharo to reset the error.  But the  code above is fixed so should work without error.

Next we’ll implement the document control methods. For this we’ll need to define a few types from fpdfview.h. We don’t need all these right now, but its useful to be aware of…

// PDF Types
typedef void* FPDF_DOCUMENT;

// String types
typedef unsigned short FPDF_WCHAR;
typedef unsigned char const* FPDF_LPCBYTE;

// FPDFSDK may use three types of strings: byte string, wide string (UTF-16LE
// encoded), and platform dependent string
typedef const char* FPDF_BYTESTRING;

// FPDFSDK always uses UTF-16LE encoded wide strings, each character uses 2
// bytes (except surrogation), with the low byte first.
typedef const unsigned short* FPDF_WIDESTRING;

// For Windows programmers: In most cases it's OK to treat FPDF_WIDESTRING as a
// Windows unicode string, however, special care needs to be taken if you
// expect to process Unicode larger than 0xffff.
//
// For Linux/Unix programmers: most compiler/library environments use 4 bytes
// for a Unicode character, and you have to convert between FPDF_WIDESTRING and
// system wide string by yourself.
typedef const char* FPDF_STRING;

So FPDF_BYTESTRING and FPDF_STRING look like they map well to Pharo class String, so subclassing that should allow use to use them directly without manual handling. FPDF_WIDESTRING is more complicated, but we don’t need it at the moment. We’ve no information in the API about the internals of FPDF_DOCUMENT, so we need to consider it an opaque object. So lets try…

String subclass: #FPDF_BYTESTRING
	instanceVariableNames: ''
	classVariableNames: ''
	package: 'PDFium'

String subclass: #FPDF_STRING
	instanceVariableNames: ''
	classVariableNames: ''
	package: 'PDFium'

FFIOpaqueObject subclass: #FPDF_DOCUMENT
	instanceVariableNames: ''
	classVariableNames: ''
	package: 'PDFium'

…and use of those to define the document control methods. We’ll use the adhoc method naming convention of appending the C-function name with a double-underscore separator to keyword parameters. Now you should (because I got it wrong, and thank you jbroman on #chromium(Freenode) for setting me straight) pay special attention to the comment of FPDF_GetLastError() in “fpdfview.h” which says…

If the previous SDK call succeeded,
the return value of FPDF_GetLastError() is not defined.

PDFium class >> FPDF_GetLastError
    ^self ffiCall: #( unsigned long FPDF_GetLastError() ) 

PDFium class >> FPDF_CloseDocument__document: document
    ^self ffiCall: #( void FPDF_CloseDocument( FPDF_DOCUMENT *document ) ) 

PDFium class >> FPDF_LoadDocument__file_path: file_path password: password
    ^self ffiCall: #(FPDF_DOCUMENT *FPDF_LoadDocument(FPDF_STRING file_path, FPDF_BYTESTRING password))

PDFiumTest >> testDocumentMissing
	| document error |
	PDFium FPDF_InitLibrary.
	document := PDFium FPDF_LoadDocument__file_path:  missingPdf  password: ''.
	error := PDFium FPDF_GetLastError.
	PDFium FPDF_CloseDocument__document: document.
	PDFium FPDF_DestroyLibrary.
	self assert: document isNull.
	self assert: error equals: 2.  "#define FPDF_ERR_FILE"

PDFiumTest >> testDocumentValid
	| document |
	PDFium FPDF_InitLibrary.
	document := PDFium FPDF_LoadDocument__file_path:  helloPdf  password: ''.
	PDFium FPDF_CloseDocument__document: document.
	PDFium FPDF_DestroyLibrary.
	self assert: document isNull not.
	"note, FPDF_GetLastError() is undefined when SDK calls succeed"

Now if you’ve been paying attention ;),  you’ll have noticed the Pharo definitions of FPDF_LoadDocument() and FPDF_CloseDocument() differ slightly from their C definitions by inclusion of an extra indirection symbol.  Otherwise you get an error FFIDereferencedOpaqueObjectError. The FFIOpaqueObject class comment informs us…

“external objects have a natural arity of zero but they MUST be called with some arity,  because they are actually external addresses (pointers).  That means, you need to always declare external objects as this example:
self ffiCall: #( FFIExternalObject *c_function ( FFIExternalObject *handle ) ) “

So tests are green! Groooveh-babeh! Now lets grab some document info. Getting the number of pages is easy…

PDFium >> FPDF_GetPageCount__document: document
    ^self ffiCall: #(int FPDF_GetPageCount(FPDF_DOCUMENT *document))

PDFiumTest >> testPageCountHelloPdf
	| document pageCount|
	PDFium FPDF_InitLibrary.
	document := PDFium FPDF_LoadDocument__file_path:  helloPdf  password: ''.
	pageCount := PDFium FPDF_GetPageCount__document: document.
	PDFium FPDF_CloseDocument__document: document.
	PDFium FPDF_DestroyLibrary.
	self assert: pageCount equals: 1.

Thats it for now.  Later I’ll look at working with multiple pages.

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply