Pharo PDF Rendering, part 1, building PDFium

Background

For a while now I’ve been wanting to render PDFs inside Pharo.  A few external libraries existed but none had suitable licenses.  Recently I bumped into PDFium – the Foxit renderer open sourced by Google out of Chrome for use by Chromium. With its BSD license this seemed a good candidate, as well as being derived from a successful existing commercial product and part of a significant Google backed project.  So it leverages a lot of funded engineering and expectations of quality are high. Its written in C++ but has a public C interface.

So here I am recording my exploration of building PDFium from source. Later in Part 2 I’ll interface to it using Pharo’s UFFI to render PDF pages to bitmaps displayed within Pharo.

Building PDFium

So we start by following the “Get the code” section in the canonical build instructions.

$ sudo apt install git

$ mkdir -p PDFium && cd PDFium

$ export $MYDEV=`pwd` # just for the sake of being explicit in this post

$ git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git

$ export PATH=$PATH:$MYDEV/depot_tools
Don’t forget to add this to each new terminal you open. You may want to add it to .bashrc.

$ gclient config –unmanaged https://pdfium.googlesource.com/pdfium.git

$ gclient sync –verbose
Wait…..a…..long…..time….. (go get a coffee)

$ cd $MYDEV/pdfium

$ ./build/install-build-deps.sh

Phew! Is that everything? Now actually, we want to work from a stable base, which https://omahaproxy.appspot.com/ shows is Linux-stable-62.0.3202.89

Okay, so now we are ready to try our first build. The build system uses gn to generate ninja build files.  I’ve not this build system before so I’m a bit the blind leading they blind, but the way it works is… you pass the build directory to `gn args` which drops you into an editor to specify the args.gn parameters file, from which (combined with BUILD.gn) the ninja configuration files are produced.  Then from in the build  directory you run `ninja pdfium` to perform the build.   To change build parameters you can run `gn args .` from the build directory.

$ gn args out/FirstBuild

pdf_is_standalone = true                     # Set for a non-embedded build.
is_debug = true                                         # Enable debugging features.

pdf_enable_xfa = true                           # XFA support enabled.
pdf_enable_v8 = true                             # Javascript support enabled.
pdf_use_skia = false                               # Avoid skia backend experiment.
pdf_use_skia_paths = false                 # Avoid other skia backend experiement.
is_component_build = false                # Disable component build (must be false)
clang_use_chrome_plugins = false  # Currently must be false.

$ cd out/FirstBuild

I use `nice ninja` here since otherwise the parallel compiles launched by ninja cause my laptop to crawl (and anyway who wants to play with a nasty ninja?). So here we go, using the canonical args.gn

$ nice ninja pdfium
[2162/2162] AR obj/libpdfium.a

Yay! It built! Hmmm… a static library…? FFI needs a shared library.  We can adapt the build instructions for PdfiumViewer.

$ cd $MYDEV/pdfium

$ vi BUILD.gn

  • Change static_library(“pdfium”) to shared_library(“pdfium”)
  • In section config(“pdfium_common_config”) add this to the defines list:
    • “FPDFSDK_EXPORTS”

$ gn args out/shared

pdf_is_standalone = true                     # Set for a non-embedded build.
is_component_build = false                # Disable component build (must be false)
is_debug = false                                        # Enable debugging features.

pdf_enable_xfa = false                           # XFA support enabled.
pdf_enable_v8 = false                            # Javascript support enabled.
pdf_use_skia = false                               # Avoid skia backend experiment.
pdf_use_skia_paths = false                 # Avoid other skia backend experiement.

Generating files…ERROR at //.gn:9:28: Build argument has no effect.  v8_extra_library_files = []
.                                             ^The variable “v8_extra_library_files” was set as a build argument but never appeared in a declare_args() block in any buildfile.

Darn jigitty!!  Already an error just creating the ninja build files. So everything I know in the past 20 years tells me this means “BROKEN. WON’T BUILD. WON’T RUN”.  So ingrained is this convention and being only days old using ninja and gn I believe it.  But half a day scouring the web for how to fix this found an issue saying… “This is non-fatal by design. Ideally the messaging would be better and say ‘WARNING’. But this is the only nonfatal warning in the entire program so there isn’t code to vary this string.  Low priority. Depending on code complexity may not be worth fixing.”

Ha! Haaaaaarrrrrrrgggghghhhhhh!!!

Well! I could say more, but lets call it a blessing that it seems okay to proceed.
$ cd out/shared

$ nice ninja pdfium
[753/753] SOLINK ./libpdfium.so
Nice! Thats what we’re looking for. So lets try making use of it…

$ mkdir -p $MYDEV/AppTesting/First  &&  cd $MYDEV/AppTesting/First

[Important note... I got sick of fighting WordPress screwing with the angle brackets
of the #includes, so I've substituted similar looking unicode symbols, which you will need to fix if you cut&paste from this page.]

$ vi first.c

#include 〈stdio.h〉
#include 〈fpdfview.h〉
int main() {
        FPDF_InitLibrary();
        FPDF_DestroyLibrary();
        printf("worked okay\n");
}

$ vi Makefile

PDFIUM_REPO= ../../pdfium
INC_DIR= -I ${PDFIUM_REPO}/public
LIB_DIR= -L ${PDFIUM_REPO}/out/shared
PDF_LIBS= -lpdfium
STD_LIBS= -lpthread -lm -lc -lstdc++
default:
    rm -f first
    gcc -o first first.c ${INC_DIR} ${LIB_DIR} ${PDF_LIBS} ${STD_LIBS}
    chmod +x first
    ./first

$ make
first.c:(.text+0xa): undefined reference to `FPDF_InitLibrary’
first.c:(.text+0×14): undefined reference to `FPDF_DestroyLibrary’

Hmmm… Well it found the libpdfium.so library because it didn’t complain about that.  
So here’s a quick summary of a few days hunting this down (oh Smalltalk, shall I count the ways I love thee…).  First lets examine the library…

$ cd $MYDEV/pdfium/out/shared

$ nm libpdfium.so | grep InitLibrary
Hmmm… nothing

$ ls -lh libpdfium.so
-rwxrwxr-x 1 ben ben 41K Nov  7 23:10 libpdfium.so

That does seem rather small. Is our symbol anywhere?…

$ find . -name “*.o” -exec nm -A {} \; | grep InitLibrary
./obj/pdfium/fpdfview.o:0000000000000000 T FPDF_InitLibrary
./obj/pdfium/fpdfview.o:0000000000000000 T FPDF_InitLibraryWithConfig

At least it shows in the object file.  Now from what I read here, the capital “T” indicates these are global symbols in the object file. Lets manually build it into a shared library…

$  gcc -fPIC -shared -o testlib.so obj/pdfium/fpdfview.o

$ nm testlib.so | grep InitLib
testlib.so:00000000000036e0 t FPDF_InitLibrary
testlib.so:0000000000003720 t FPDF_InitLibraryWithConfig

The lower case “t” indicates the symbol changed to an internal/hidden symbol. But why the change?  Perhaps I’m not using the tools right? I found this bewildering – until I discovered `readelf`.

$ readelf -a obj/pdfium/fpdfview.o | grep InitLibrary
133: 000000000000000   56  FUNC  GLOBAL HIDDEN  31 FPDF_InitLibrary
134: 000000000000000 105  FUNC  GLOBAL HIDDEN  33 FPDF_InitLibraryWithConfi

Ahhh… this additional information helps. Since (as `nm` showed earlier) the symbol is global, but its tagged as hidden. After learning more about controlling exported symbols (thanks Nicolas Cellier),  visibility, and why visibility is good I try…

$ grep -R visibility=hidden *

which in build/config/gcc/BUILD.gn finds…

# This config causes functions not to be automatically exported from shared
# libraries. By default, all symbols are exported but this means there are
# lots of exports that slow everything down. In general we explicitly mark
# which functions we want to export from components.
#
# Some third_party code assumes all functions are exported so this is separated
# into its own config so such libraries can remove this config to make symbols
# public again.
#
# See http://gcc.gnu.org/wiki/Visibility
config(“symbol_visibility_hidden”)
{  cflags = [ "-fvisibility=hidden" ]

which apparently can be disabled with…

if (!is_win) {
configs -= [ "//build/config/gcc:symbol_visibility_hidden" ]
}

But rather than experiment like that with an unfamiliar build system, further hunting found the following in ”public/fpdfview.h”

#if defined(_WIN32) && defined(FPDFSDK_EXPORTS)
// On Windows system, functions are exported in a DLL
#define FPDF_EXPORT __declspec(dllexport)
#define FPDF_CALLCONV __stdcall
#else
#define FPDF_EXPORT
#define FPDF_CALLCONV
#endif

which had something familiar about it.  Hmm…..  The PDFiumViewer build instructions had us define “FPDFSDK_EXPORTS”.  But here we see that this only work with Win32 (PDFiumViewer’s target platform).   Lets rearrange this a little…

#if defined(FPDFSDK_EXPORTS)
#if defined(_WIN32)
#define FPDF_EXPORT __declspec(dllexport)
#define FPDF_CALLCONV __stdcall
#else
#define FPDF_EXPORT __attribute__((visibility(“default”)))
#define FPDF_CALLCONV
#endif //_WIN32

#else
#define FPDF_EXPORT
#define FPDF_CALLCONV
#endif //FPDFSDK_EXPORTS

$ nice ninja pdfium
[753/753] SOLINK ./libpdfium.so
FAILED: libpdfium.so libpdfium.so.TOC
and a bunch of undefined references

Further hunting finds gradescope’s suggestion to  “disabling building with clang to avoid dependency hell.” So…

$ gn args .

pdf_is_standalone = true                     # Set for a non-embedded build.
is_component_build = false                # Disable component build (must be false)
is_debug = false                                        # Enable debugging features.

pdf_enable_xfa = false                           # XFA support enabled.
pdf_enable_v8 = false                            # Javascript support enabled.
pdf_use_skia = false                               # Avoid skia backend experiment.
pdf_use_skia_paths = false                 # Avoid other skia backend experiment.
is_clang=false                                           # Avoid dependency hell.

$ nice ninja pdfium
[703/703] SOLINK ./libpdfium.so

readelf -a libpdfium.so | grep InitLibrary
355: 0000000000053ee0     7 FUNC    GLOBAL DEFAULT   12 FPDF_InitLibrary
436: 0000000000053e60   121 FUNC    GLOBAL DEFAULT   12 FPDF_InitLibraryWithConfi
9096: 0000000000053e60   121 FUNC    GLOBAL DEFAULT   12 FPDF_InitLibraryWithConfi
9097: 0000000000053ee0     7 FUNC    GLOBAL DEFAULT   12 FPDF_InitLibrary

$ nm libpdfium.so | grep InitLibrary
0000000000053ee0 T FPDF_InitLibrary
0000000000053e60 T FPDF_InitLibraryWithConfig

Now that looks promising! Lets try it out.

$ cd $MYDEV/AppTesting/First

$ LD_LIBRARY_PATH=$MYDEV/pdfium/out/shared   make
rm -f first
gcc -o first first.c -I ../../pdfium/public -L ../../pdfium/out/shared -lpdfium -lpthread -lm -lc -lstdc++
chmod +x first
./first
worked okay

Yay! So now we are ready to try the library from Pharo!  Stay tuned for Part 2.

cheers -ben

Posted in Uncategorized | Leave a comment

An evening with Pharo and the ESP32 microcontroller

Two popular choices for controlling maker projects are the Arduino and Raspberry Pi.
The Pi is a micro-”computer” that runs Linux to operate as a low powered desktop computer.  The Arduino is a much lower powered micro-”controller” without display nor wireless interfaces, but it comes with analog IO the Pi lacks. But now we’ve a new cool-kid on the block – the ESP32 in the form of the Sparkfun ESP32 Thing and the WeMos LOLIN32.

Fitting squarely between the Pi and Arduino, the ESP32 is a micro-controller like the Arduino nearing the speed of the Pi ZeroW.  Its got even more analog IO where the Pi has none, and built-in WiFi and Bluetooth interfaces the Arduino lacks.  This makes the ESP32 a great candidate platform for many applications including machine control and equipment condition monitoring.  A built-in battery charger is a nice bonus.  
I’ve tabled a spec comparison… Continue reading

Posted in Pharo, Uncategorized | 1 Comment

Pharo Libclang FFI, part 5, client data and recursive visitor/callbacks

Now we make use of the client data to track the indent level.  The recursive call to clang_visitChildren() seems a bit of an anti-pattern to use with a visitor – presumably a new visitor is created each call.   However that’s how it was done in a few tutorials I found and it does provide local storage for each nextLevel variable for the purpose of this demonstration. Continue reading

Posted in FFI, Pharo | Leave a comment

Pharo Libclang FFI, part 4, AST walking with visitors & callbacks

Okay, so we’ve got most of the parts ready. In the last part we managed to load the AST. Now lets do something useful with it. Traversing the tree is done uses a visitor pattern that supplies cursors to a callback function that define locations in the tree.  To the original C code from part 3 we add: the callback function, which I’ve called acceptCursorCallback(); and the callout function clang_visitChildren(), which traverses the tree and invokes the callback function for each node it visits. Continue reading

Posted in FFI, Pharo | Leave a comment

Pharo Libclang FFI, part 3, loading an AST

In the last part we learnt how to get the version string of the library.  That was good to prove it basically works, and also to develop our first C type “CXString“. Now we want to Pharo to process some C code.  Baby steps with `libclang`: Walking an abstract syntax tree provided a good introductory tutorial to using libclang but was a bit C++ oriented, which is not so suitable for Pharo’s FFI.  A pure C interface is easier, so I adapted that tutorial with help from sabottenda’s libclang-sample ASTVisitor. Continue reading

Posted in FFI, Pharo | Leave a comment

Pharo Libclang FFI, part 2, simple callout string return

This is my first exposure to using Pharo’s FFI, so before diving in to process some AST, lets try something simpler to gain familiarity with the library.  Something real simple… 
no parameters and just returning a string. The function clang_getClangVersion() seems to fit the bill.  First lets see how it works in pure-C. Continue reading

Posted in FFI, Pharo | Leave a comment

Pharo Libclang FFI, part 1, preamble

Table of contents

Background

I wanted to better understand the opensmalltalk-vm that Pharo runs on.  I started to manually chart and compare the C code between platforms, which was insightful but tedious and error prone.  What I needed was to automatically process these files.  Clang is a C language front-end for the LLVM compiler, designed to be integrated into external projects.  Libclang provides an interface suitable for the Pharo FFI, but I’d never used FFI before.  From a distance FFI had seemed somewhat daunting and complex, but it turns out reasonably straight forward.  I’m documenting my experience in the form of this tutorial that I can refer back to, and perhaps shines a newbie light on things that may encourage other FFI neophytes to give it a go. Continue reading

Posted in FFI, Pharo | Leave a comment

Contributing to Pharo By Example

When I was learning Pharo Smalltalk, I found the Pharo By Example book a great help.  It was well written and available at a good price ;). However as Pharo advances at great speed and the original authors are busy documenting and implementing new and advanced features – this learning resource has become out of date with some of its examples.  So I thought I’d contribute back to help update it to the latest release.  Here is how…

Continue reading

Posted in Pharo | 4 Comments

Windows 7 Pharo DBXTalk – “my hack”

Having just got ConfigurationOfODBC working from Pharo Smalltalk, I had some trouble determining exactly how to get at the individual data items.  So I thought I’d check out DBXTalk for comparison.  DBXTalk is a lot more comprehensive solution leaveraging OpenDBX which includes its own ODBC interface along with several other backends.  However all the ODBC connection examples I saw were for database servers with connection strings that were not of the “DSN” form that I think is required for Microsoft Access – so I ended up returning to ConfigurationOfODBC and resolving the issue above.

Yet I was most of the way through getting DBXTalk working, so I record my experience here for posterity.  It is the “hack” version since to resolve library dependencies I simply copied everything next to virtual machine executable.  I’ll look into resolving these more correctly later.  So I… Continue reading

Posted in Uncategorized | Leave a comment

Pharo 1.3 ODBC working on Windows 7

Wow. Eight months since my last post.  It is now apparent the impact over that time of my 60-70 hour work week onsite at a mine expansion.

This post summarises the result of discussion on the pharo-project mailling list where I sought assistance getting ODBC working on Pharo 1.3.  Credit goes to Mariano Peck and Eliot Mirranda for assistance troubleshooting, providing a slightly older configuration that worked, and then the latest VM build.  I am happy to report that ODBC appears to be working with the Pharo 1.3 image on MS Windows 7 using CogVM version 2522.

The purpose of this is that I have a UML design an application has stored in a Microsoft Access database.  I want to use Pharo to implement that design directly into Smalltalk classes since I could not find a common export/import format.  ConfigurationOfODBC looked promising so I followed instructions at http://www.pharocasts.com/2010/12/access-database-through-odbc.html, except that rather than using SQLLite I started with a blank Microsoft Access database.  This worked well on Pharo 1.2.1 but not out of the box Pharo 1.3.

Here is the method to get ODBC working and tested with Pharo 1.3. Continue reading

Posted in Uncategorized | Tagged , , , | 3 Comments