Pharo PDF Rendering, part 1, building PDFium

Background

For a while now I’ve been wanting to render PDFs inside Pharo.  A few external libraries existed but none had suitable licenses.  Recently I bumped into PDFium – the Foxit renderer open sourced by Google out of Chrome for use by Chromium. With its BSD license this seemed a good candidate, as well as being derived from a successful existing commercial product and part of a significant Google backed project.  So it leverages a lot of funded engineering and expectations of quality are high. Its written in C++ but has a public C interface.

So here I am recording my exploration of building PDFium from source. Later in Part 2 I’ll interface to it using Pharo’s UFFI to render PDF pages to bitmaps displayed within Pharo.

Building PDFium

So we start by following the “Get the code” section in the canonical build instructions.

$ sudo apt install git

$ mkdir -p PDFium && cd PDFium

$ export $MYDEV=`pwd` # just for the sake of being explicit in this post

$ git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git

$ export PATH=$PATH:$MYDEV/depot_tools
Don’t forget to add this to each new terminal you open. You may want to add it to .bashrc.

$ gclient config –unmanaged https://pdfium.googlesource.com/pdfium.git

$ gclient sync –verbose
Wait…..a…..long…..time….. (go get a coffee)

$ cd $MYDEV/pdfium

$ ./build/install-build-deps.sh

Phew! Is that everything? Now actually, we want to work from a stable base, which https://omahaproxy.appspot.com/ shows is Linux-stable-62.0.3202.89

Okay, so now we are ready to try our first build. The build system uses gn to generate ninja build files.  I’ve not this build system before so I’m a bit the blind leading they blind, but the way it works is… you pass the build directory to `gn args` which drops you into an editor to specify the args.gn parameters file, from which (combined with BUILD.gn) the ninja configuration files are produced.  Then from in the build  directory you run `ninja pdfium` to perform the build.   To change build parameters you can run `gn args .` from the build directory.

$ gn args out/FirstBuild

pdf_is_standalone = true                     # Set for a non-embedded build.
is_debug = true                                         # Enable debugging features.

pdf_enable_xfa = true                           # XFA support enabled.
pdf_enable_v8 = true                             # Javascript support enabled.
pdf_use_skia = false                               # Avoid skia backend experiment.
pdf_use_skia_paths = false                 # Avoid other skia backend experiement.
is_component_build = false                # Disable component build (must be false)
clang_use_chrome_plugins = false  # Currently must be false.

$ cd out/FirstBuild

I use `nice ninja` here since otherwise the parallel compiles launched by ninja cause my laptop to crawl (and anyway who wants to play with a nasty ninja?). So here we go, using the canonical args.gn

$ nice ninja pdfium
[2162/2162] AR obj/libpdfium.a

Yay! It built! Hmmm… a static library…? FFI needs a shared library.  We can adapt the build instructions for PdfiumViewer.

$ cd $MYDEV/pdfium

$ vi BUILD.gn

  • Change static_library(“pdfium”) to shared_library(“pdfium”)
  • In section config(“pdfium_common_config”) add this to the defines list:
    • “FPDFSDK_EXPORTS”

$ gn args out/shared

pdf_is_standalone = true                     # Set for a non-embedded build.
is_component_build = false                # Disable component build (must be false)
is_debug = false                                        # Enable debugging features.

pdf_enable_xfa = false                           # XFA support enabled.
pdf_enable_v8 = false                            # Javascript support enabled.
pdf_use_skia = false                               # Avoid skia backend experiment.
pdf_use_skia_paths = false                 # Avoid other skia backend experiement.

Generating files…ERROR at //.gn:9:28: Build argument has no effect.  v8_extra_library_files = []
.                                             ^The variable “v8_extra_library_files” was set as a build argument but never appeared in a declare_args() block in any buildfile.

Darn jigitty!!  Already an error just creating the ninja build files. So everything I know in the past 20 years tells me this means “BROKEN. WON’T BUILD. WON’T RUN”.  So ingrained is this convention and being only days old using ninja and gn I believe it.  But half a day scouring the web for how to fix this found an issue saying… “This is non-fatal by design. Ideally the messaging would be better and say ‘WARNING’. But this is the only nonfatal warning in the entire program so there isn’t code to vary this string.  Low priority. Depending on code complexity may not be worth fixing.”

Ha! Haaaaaarrrrrrrgggghghhhhhh!!!

Well! I could say more, but lets call it a blessing that it seems okay to proceed.
$ cd out/shared

$ nice ninja pdfium
[753/753] SOLINK ./libpdfium.so
Nice! Thats what we’re looking for. So lets try making use of it…

$ mkdir -p $MYDEV/AppTesting/First  &&  cd $MYDEV/AppTesting/First

[Important note... I got sick of fighting WordPress screwing with the angle brackets
of the #includes, so I've substituted similar looking unicode symbols, which you will need to fix if you cut&paste from this page.]

$ vi first.c

#include 〈stdio.h〉
#include 〈fpdfview.h〉
int main() {
        FPDF_InitLibrary();
        FPDF_DestroyLibrary();
        printf("worked okay\n");
}

$ vi Makefile

PDFIUM_REPO= ../../pdfium
INC_DIR= -I ${PDFIUM_REPO}/public
LIB_DIR= -L ${PDFIUM_REPO}/out/shared
PDF_LIBS= -lpdfium
STD_LIBS= -lpthread -lm -lc -lstdc++
default:
    rm -f first
    gcc -o first first.c ${INC_DIR} ${LIB_DIR} ${PDF_LIBS} ${STD_LIBS}
    chmod +x first
    ./first

$ make
first.c:(.text+0xa): undefined reference to `FPDF_InitLibrary’
first.c:(.text+0×14): undefined reference to `FPDF_DestroyLibrary’

Hmmm… Well it found the libpdfium.so library because it didn’t complain about that.  
So here’s a quick summary of a few days hunting this down (oh Smalltalk, shall I count the ways I love thee…).  First lets examine the library…

$ cd $MYDEV/pdfium/out/shared

$ nm libpdfium.so | grep InitLibrary
Hmmm… nothing

$ ls -lh libpdfium.so
-rwxrwxr-x 1 ben ben 41K Nov  7 23:10 libpdfium.so

That does seem rather small. Is our symbol anywhere?…

$ find . -name “*.o” -exec nm -A {} \; | grep InitLibrary
./obj/pdfium/fpdfview.o:0000000000000000 T FPDF_InitLibrary
./obj/pdfium/fpdfview.o:0000000000000000 T FPDF_InitLibraryWithConfig

At least it shows in the object file.  Now from what I read here, the capital “T” indicates these are global symbols in the object file. Lets manually build it into a shared library…

$  gcc -fPIC -shared -o testlib.so obj/pdfium/fpdfview.o

$ nm testlib.so | grep InitLib
testlib.so:00000000000036e0 t FPDF_InitLibrary
testlib.so:0000000000003720 t FPDF_InitLibraryWithConfig

The lower case “t” indicates the symbol changed to an internal/hidden symbol. But why the change?  Perhaps I’m not using the tools right? I found this bewildering – until I discovered `readelf`.

$ readelf -a obj/pdfium/fpdfview.o | grep InitLibrary
133: 000000000000000   56  FUNC  GLOBAL HIDDEN  31 FPDF_InitLibrary
134: 000000000000000 105  FUNC  GLOBAL HIDDEN  33 FPDF_InitLibraryWithConfi

Ahhh… this additional information helps. Since (as `nm` showed earlier) the symbol is global, but its tagged as hidden. After learning more about controlling exported symbols (thanks Nicolas Cellier),  visibility, and why visibility is good I try…

$ grep -R visibility=hidden *

which in build/config/gcc/BUILD.gn finds…

# This config causes functions not to be automatically exported from shared
# libraries. By default, all symbols are exported but this means there are
# lots of exports that slow everything down. In general we explicitly mark
# which functions we want to export from components.
#
# Some third_party code assumes all functions are exported so this is separated
# into its own config so such libraries can remove this config to make symbols
# public again.
#
# See http://gcc.gnu.org/wiki/Visibility
config(“symbol_visibility_hidden”)
{  cflags = [ "-fvisibility=hidden" ]

which apparently can be disabled with…

if (!is_win) {
configs -= [ "//build/config/gcc:symbol_visibility_hidden" ]
}

But rather than experiment like that with an unfamiliar build system, further hunting found the following in ”public/fpdfview.h”

#if defined(_WIN32) && defined(FPDFSDK_EXPORTS)
// On Windows system, functions are exported in a DLL
#define FPDF_EXPORT __declspec(dllexport)
#define FPDF_CALLCONV __stdcall
#else
#define FPDF_EXPORT
#define FPDF_CALLCONV
#endif

which had something familiar about it.  Hmm…..  The PDFiumViewer build instructions had us define “FPDFSDK_EXPORTS”.  But here we see that this only work with Win32 (PDFiumViewer’s target platform).   Lets rearrange this a little…

#if defined(FPDFSDK_EXPORTS)
#if defined(_WIN32)
#define FPDF_EXPORT __declspec(dllexport)
#define FPDF_CALLCONV __stdcall
#else
#define FPDF_EXPORT __attribute__((visibility(“default”)))
#define FPDF_CALLCONV
#endif //_WIN32

#else
#define FPDF_EXPORT
#define FPDF_CALLCONV
#endif //FPDFSDK_EXPORTS

$ nice ninja pdfium
[753/753] SOLINK ./libpdfium.so
FAILED: libpdfium.so libpdfium.so.TOC
and a bunch of undefined references

Further hunting finds gradescope’s suggestion to  “disabling building with clang to avoid dependency hell.” So…

$ gn args .

pdf_is_standalone = true                     # Set for a non-embedded build.
is_component_build = false                # Disable component build (must be false)
is_debug = false                                        # Enable debugging features.

pdf_enable_xfa = false                           # XFA support enabled.
pdf_enable_v8 = false                            # Javascript support enabled.
pdf_use_skia = false                               # Avoid skia backend experiment.
pdf_use_skia_paths = false                 # Avoid other skia backend experiment.
is_clang=false                                           # Avoid dependency hell.

$ nice ninja pdfium
[703/703] SOLINK ./libpdfium.so

readelf -a libpdfium.so | grep InitLibrary
355: 0000000000053ee0     7 FUNC    GLOBAL DEFAULT   12 FPDF_InitLibrary
436: 0000000000053e60   121 FUNC    GLOBAL DEFAULT   12 FPDF_InitLibraryWithConfi
9096: 0000000000053e60   121 FUNC    GLOBAL DEFAULT   12 FPDF_InitLibraryWithConfi
9097: 0000000000053ee0     7 FUNC    GLOBAL DEFAULT   12 FPDF_InitLibrary

$ nm libpdfium.so | grep InitLibrary
0000000000053ee0 T FPDF_InitLibrary
0000000000053e60 T FPDF_InitLibraryWithConfig

Now that looks promising! Lets try it out.

$ cd $MYDEV/AppTesting/First

$ LD_LIBRARY_PATH=$MYDEV/pdfium/out/shared   make
rm -f first
gcc -o first first.c -I ../../pdfium/public -L ../../pdfium/out/shared -lpdfium -lpthread -lm -lc -lstdc++
chmod +x first
./first
worked okay

Yay! So now we are ready to try the library from Pharo!  Stay tuned for Part 2.

cheers -ben

This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply