Analysing SignalR with Wireshark & Pharo

I’m trying to hook into the real-time order book feed from the Bittrex cryptocurrency exchange, which comes via the signalr protocol on top of a websocket connection.

Signalr is a Microsoft ASP.NET library designed to establish “persistent connections”. SignalR differentiates between three types of connections: SignalR connections, transport connections and physical connections. When a lower connection level drops the connection, SignalR can continue since it can reconnect interrupted connections. From the framework’s user point of view connections are persistent in that ConnectionId won’t change if the connection needs to be re-established. So it feels like have a persistent connection.

To use Signalr outside the Microsoft garden needs some protocol hacking from watching other implementations in action. Pawel Kadluczka’s Informal Description of the Signalr Protocol was also a great help.

Note: As a general philosophy, I’m ignoring Signalr’s fallback to non-websocket protocols.

Connection Negotiate

As the first step in starting a signalr connection the client sends the server a negotiate request. Following on from my Pharo v. Cloudflare post I’ll presume you’ve done steps 1 to 6 from there. So start a new Wireshark capture, and once again do…

$ python order_book.py

Wireshark-signalr

Ignoring the first few packets that CloudflareUn will handle, the signalr negotiate request is seen at packet #26 with a response at packet #29. This part can be replicated in Pharo as… (btw, anyone unfamiliar with Pharo syntax should read the first three pages here)


cloudflareun := CloudflareUn knockUrl: 'http://bittrex.com'.
client := cloudflareun client.
client url: 'http://bittrex.com/signalr/negotiate'.
client queryAt: 'connectionData' put: '[{"name": "coreHub"}]'.
client queryAt: 'clientProtocol' put: '1.5'.
(response := client get) inspect. 

==>
{   "Url":"/signalr",
    "ConnectionToken":"MT55CnH4LlWNxjX0Q5w<...snip...>",
    "ConnectionId":"fa8d0fc5-b8d0-4925-bc63-7aa8984b1f4d",
    "KeepAliveTimeout":20.0,
    "DisconnectTimeout":30.0,
    "ConnectionTimeout":110.0,
    "TryWebSockets":true,
    "ProtocolVersion":"1.5",
    "TransportConnectTimeout":5.0,
    "LongPollDelay":0.0
}

The meaning of which is:

  • ConnectionToken – For each request, the client and server pass a connection token which contains the connection id and username for authenticated users.
  • ConnectionId – Each client has its own unique id that is randomly generated when the client connects to the server hub. It persists for the duration of the signalr session and can be used to reacquire broken connections.
  • ConnectionTimeout – Represents the amount of time to leave a connection open before timing out. Default is 110 seconds. This setting applies only when keepalive functionality is disabled, which normally applies only to the long polling transport, and so is not relevant to the current use case.
  • DisconnectTimeout – Represents the amount of time to wait after a connection goes away before raising the disconnect event. Default is 30 seconds.
  • KeepAlive – Representing the amount of time to wait before sending a keep alive packet over an idle connection. Set to null to disable keep alive. This is set to 30 seconds by default. When this is on, the ConnectionTimeout will have no effect.
  • TransportConnectTimeout – amount of time a client should allow to connect before falling back to another transport or failing. This is not relevant since we’ll not be falling back to non-websocket protocols.
  • LongPollDelay – is not relevant to websockets

Those timeouts are described more thoroughly at Understanding and Handling Connection Lifetime Events in SignalR. Interestingly, it says, “KeepAlive must not be more than 1/3 of the DisconnectTimeout value” but we see that assertion is not true for the Bittrex configuration. In any case, perhaps client–>server keep-alives should be sent initially at 5 second intervals [opinions anyone?].

Connection Connect

The next step in the Signalr protocol is seen in packet #36 where the client sends a connect request carrying forward parameters connectionData & clientProtocol
and adding new parameters connectionToken & transport. As well, it attempts to upgrade from transfer protocol to websockets. The response in packets #36 and #38 shows the successful upgrade to the websocket protocol…

HTTP REQUEST (#36)
GET /signalr/connect?connectionToken=ulMFV4z5JG%2BxfSpMX3A4%2BS%2FFay55rJr1Y[...truncated...]
    &connectionData=%5B%7B%22name%22%3A+%22coreHub%22%7D%5D
    &transport=webSockets
    &clientProtocol=1.5
Cookie: __cfduid=df17ba99a887664411404c9f88347504f1517897211;
  cf_clearance=097eaac668c0eb0db0a3cceb265e6ef7ea0a384c-1517897216-10800
Connection: Upgrade
Upgrade: websocket
Sec-WebSocket-Key: YymChuidZ4RPOna5ZwD1EQ==
Sec-WebSocket-Version: 13

HTTP RESPONSE (#38)
HTTP/1.1 101 Switching Protocols

WEBSOCKET S-->C (#40)
{"C":"d-4F5DA126-B,0|F8TuM,0|F8TuN,1","S":1,"M":[]}

We’ll use NeoJSON to parse the connectionToken from the previous response


Gofer it
   smalltalkhubUser: 'SvenVanCaekenberghe' project: 'Neo';
   configurationOf: 'NeoJSON';
   loadStable.

(params := NeoJSONReader fromString: response) inspect.
connectionToken := params at: 'ConnectionToken'.
==>MT55CnH4LlWNxjX0Q5w<...snip...>

Looking under the hood of Pharo’s websocket implementation we see it adds the required headers for the protocol upgrade…


(ZnWebSocket class >> #webSocketClientTo:) inspect

[Source]tab==>
webSocketClientTo: url
	"Create and return a new ZnClient instance ready for the initial client side WebSocket setup request"
	| client |
	self assert: (#(ws wss) includes: url scheme).
	(client := ZnClient new)
		url: url;
		method: #GET;
		headerAt: 'Upgrade' put: 'websocket';
		headerAt: 'Connection' put: 'Upgrade';
		headerAt: 'Sec-WebSocket-Version' put: '13';
		headerAt: 'Sec-WebSocket-Key' put: ZnWebSocketUtils newClientKey.
	^ client

but this is not quite suitable for us since it creates a fresh ZnClient, while we need to use our cloudflareun preconfigured ZnClient. Since this is Pharo, we can easily tweak that #webSocketClientTo: into a new method…


ZnWebSocket class >> upgradeSocketClient: aZnClient
	"Make an existing ZnClient instance ready for the initial client side WebSocket setup request"
	aZnClient
		method: #GET;
		headerAt: 'Upgrade' put: 'websocket';
		headerAt: 'Connection' put: 'Upgrade';
		headerAt: 'Sec-WebSocket-Version' put: '13';
		headerAt: 'Sec-WebSocket-Key' put: ZnWebSocketUtils newClientKey.

and also tweak ZnWebSocket >> #to: urlObject into a new method that calls it…


ZnWebSocket class >> onHttpClient: client
	"Attempt to upgrade an existing http client client to a WebSocket.
	Do the initial upgrade handshake and return a functioning ZnWebSocket object.
	Signals a ZnWebSocketFailed error when unsuccessful."

	self upgradeSocketClient: client.
	client execute.
	(self isValidWebSocketResponse: client)
		ifTrue: [
			^ (self onStream: client connection)
				role: #client;
				yourself ]
		ifFalse: [
			client close.
			(ZnWebSocketFailed response: client response) signal ]

Now we are ready to build our connect request to match packet #36. Continuing in Playground…


client := cloudflareun client.
client url: 'http://bittrex.com/signalr/connect'.
client queryAt: 'connectionToken' put: connectionToken.
client queryAt: 'connectionData' put: '[{"name": "coreHub"}]'.
client queryAt: 'transport' put: 'webSockets'.
client queryAt: 'clientProtocol' put: '1.5'.
(websocket := ZnWebSocket onHttpClient: client) inspect.
[websocket runWith: [ :msg | self inform: crShow: msg printString]] forkAt: 35.

Watching Wireshark, we see the request, and the response is the hoped for “HTTP/1.1 101 Switching Protocols\r\n” just like packet #38… And also just like packet #40 we received a WebSocket packet containing…

WEBSOCKET S-->C
    {"C":"d-3D5EB89-B,0|Ek,0|El,2","S":1,"M":[]} 

Deciphered this: message ID “C” ; an “S” of 1 indicating the transport was initialized; and “M” empty of actual data.

Subsequently every 8 seconds or so we receive a new WebSocket packet with empty data…

WEBSOCKET S-->C
     {} 

until we close the connection in the Inspector that appeared…
InspectorOnWebSocket

Hub Start


To complete our Signalr handshake, over HTTP we need to send a start url similar to packet #42 and hope for a response like packet #48.

HTTP REQUEST (#42)
GET /signalr/start?connectionToken=ulMFV4z5JG%2BxfSpMX3A4%2BS%2FFay55rJr1Y[...truncated...]
    &connectionData=%5B%7B%22name%22%3A+%22coreHub%22%7D%5D
    &transport=webSockets
    &clientProtocol=1.5
Cookie: cf_clearance=097eaac668c0eb0db0a3cceb265e6ef7ea0a384c-1517897216-10800; __cfduid=df17ba99a887664411404c9f88347504f1517897211

HTTP RESPONSE (#48)
HTTP/1.1 200 OK   (application/json)
{ "Response": "started" }

Now I found I couldn’t reuse the previous client because that interfered with the websocket operation, since its underlying socket was taken by the websocket. But this is Pharo!! So its easy to modify the ZnClient system library! Adding the following method to it…


ZnClient >> adoptConnection
	|givingThisAway|
	givingThisAway := connection.
	connection := nil. "so #caseReuseConnection ==> false ==> new connection next request"
	^givingThisAway

then make use of it here (redefined from above)…


ZnWebSocket class >> onHttpClient: client
	"Attempt to upgrade an existing http client client to a WebSocket.
	Do the initial upgrade handshake and return a functioning ZnWebSocket object.
	Signals a ZnWebSocketFailed error when unsuccessful."

	self upgradeSocketClient: client.
	client execute.
	(self isValidWebSocketResponse: client)
		ifTrue: [
			^ (self onStream: client adoptConnection)
				role: #client;
				yourself ]
		ifFalse: [
			client close.
			(ZnWebSocketFailed response: client response) signal ]

Now we can reuse the http client…


client url: 'http://bittrex.com/signalr/start'.
client queryAt: 'connectionToken' put: connectionToken.
client queryAt: 'connectionData' put: '[{"name": "coreHub"}]'.
client queryAt: 'transport' put: 'webSockets'.
client queryAt: 'clientProtocol' put: '1.5'.
(response := client get) inspect.

==>{ "Response": "started" }

YAY! Lookin’ good.

Ongoing Signalr Hub Messaging

We now move to examine the ongoing websocket communication between the Pharo client and the Bittrex server. Looking first at only the Client–>Server communication using wireshark display filter “(http || websocket) && ip.src == 192.168.43.79” it seems the general philosophy is to first subscribe to exchange deltas so no incremental updates are missed while querying the state of the exchange. The client calls the server coreHub >> SubscribeToExchangeDeltas method with “A” arguments . The client seems responsible for incrementing the invocation identifier “I” with each method call. This is the complete list of methods invoked on the Bittrex server.


WEBSOCKET C-->S (#52)
{"A": ["BTC-ETH"], "H": "coreHub", "M": "SubscribeToExchangeDeltas", "I": 0}

WEBSOCKET C-->S (#53)
{"A": ["BTC-NEO"], "H": "coreHub", "M": "SubscribeToExchangeDeltas", "I": 1}
{"A": ["BTC-ZEC"], "H": "coreHub", "M": "SubscribeToExchangeDeltas", "I": 2}
{"A": ["ETH-NEO"], "H": "coreHub", "M": "SubscribeToExchangeDeltas", "I": 3}
{"A": ["ETH-ZEC"], "H": "coreHub", "M": "SubscribeToExchangeDeltas", "I": 4}

WEBSOCKET C-->S (#124)
{"A": ["BTC-ETH"], "H": "coreHub", "M": "queryExchangeState", "I": 5}

WEBSOCKET C-->S (#227)
{"A": ["BTC-NEO"], "H": "coreHub", "M": "queryExchangeState", "I": 6}

WEBSOCKET C-->S (#305)
{"A": ["BTC-ZEC"], "H": "coreHub", "M": "queryExchangeState", "I": 7}

WEBSOCKET C-->S (#403)
{"A": ["ETH-NEO"], "H": "coreHub", "M": "queryExchangeState", "I": 8}

WEBSOCKET C-->S (#481)
{"A": ["ETH-ZEC"], "H": "coreHub", "M": "queryExchangeState", "I": 9}

Now looking at just a few select Server–>Client packets, these “R”eturn value of the coreHub >> SubscribeToExchangeDeltas calls is minimal, indicating only that each callback was successfully set up. An important observation below is that the responses can arrive out of order compared to the invocation order above.

I’d generally presume that waiting for a return value would be a synchronous, and we see all the invocations of SubscribeToExchangeDeltas were issued before any response arrived, so can infer that the python client is issuing these from multiple threads. Thus we see a requirement for the asynchronous communication channel to be managed by a thread of its own, with a queue receiving synchronous invocations that wait to be signalled from the communication thread.


WEBSOCKET S-->C (#58)
{"R":true,"I":"0"}

WEBSOCKET S-->C (#68)
{"R":true,"I":"2"}

WEBSOCKET S-->C (#70)
{"R":true,"I":"1"}
{"R":true,"I":"3"}

WEBSOCKET S-->C (#72)
{"R":true,"I":"4"}

Below can be seen a few callbacks invoking the CoreHub >> updateExchangeState method on the client with arguments “A”. No data is provided to match the original invocation “I” from above, since the MarketName parameter fully specifies what client data to update. The callback specifies the hub, so implementing it a singular SignalrConnection common to multiple hubs seems appropriate. Waiting synchronous invocations can register themselves as an asynchronous callback to signal the Semaphore they are waiting on.

(btw, I’d be interested if anyone can describe how the message-id “C” comes into play.)

The description here indicates that: Type ’0′ is a simple add operation; Type ’1′ is a delete operation and Type ’2′ is a replace/update operation. The nounce seems to be a monotonically increasing sync point, described here that “you must make sure that your order nounces match or are lower than the order book snapshot nounce for the sync to work.”


WEBSOCKET S-->C (#74)
{"C":"d-3908A267-B,0|yT,0|yU,6|BU,3265|BW,2A33|Bi,2ED8|yV,1|yW,0",
"M":[{"H":"CoreHub","M":"updateExchangeState",
    "A":[{"MarketName":"ETH-NEO","Nounce":9910,
      "Buys":[ {"Type":0,"Rate":0.13574727,"Quantity":24.13300000},
               {"Type":1,"Rate":0.13546489,"Quantity":0.0},
               {"Type":1,"Rate":0.13413976,"Quantity":0.0}],
       "Sells":[{"Type":1,"Rate":0.13797594,"Quantity":0.0}],
       "Fills":[]}  ]}  ]}

{"C":"d-3908A267-B,0|yT,0|yU,6|BU,3266|BW,2A33|Bi,2ED8|yV,1|yW,0",
"M":[{"H":"CoreHub","M":"updateExchangeState",
     "A":[{"MarketName":"BTC-ETH","Nounce":12943,
          "Buys":[{"Type":0,"Rate":0.09900001,"Quantity":8.78200000},
                  {"Type":1,"Rate":0.09896192,"Quantity":0.0},
                  {"Type":0,"Rate":0.09865459,"Quantity":5.49451704},
                  {"Type":0,"Rate":0.09863459,"Quantity":49.24190000},
                  {"Type":2,"Rate":0.09818553,"Quantity":200.52821030},
                  {"Type":1,"Rate":0.09270006,"Quantity":0.0},
                  {"Type":1,"Rate":0.09270000,"Quantity":0.0}],
          "Sells":[{"Type":2,"Rate":0.09990997,"Quantity":30.41524754},
                  {"Type":0,"Rate":0.09990998,"Quantity":1.19474349},
                  {"Type":1,"Rate":0.10033083,"Quantity":0.0},
                  {"Type":0,"Rate":0.10036968,"Quantity":9.99786740},
                  {"Type":1,"Rate":0.10347521,"Quantity":0.0},
                  {"Type":1,"Rate":0.10347608,"Quantity":0.0}],
          "Fills":[]}  ]} ]}

The packet below is tagged as the “R”eturn value to the queryExchangeState call of “I”nvocation=5. You can see from the large number of reassembled segments that it was a large stream of data to specify the full state of orders on the market.


WEBSOCKET S-->C (#295)
[42 Reassembled TCP Segments (57624 bytes): #229, #231, ... #259, #261
{"R":{"MarketName":null,"Nounce":12947,
"Buys":[
    {"Quantity":0.90897209,"Rate":0.09957987},
    {"Quantity":0.67737287,"Rate":0.09926044},
    <snip>
    {"Quantity":19.88806857,"Rate":0.09900000},
    {"Quantity":0.50000000,"Rate":0.09270006}],
"Sells":[
    {"Quantity":0.01126028,"Rate":0.09957992},
    {"Quantity":2.77036447,"Rate":0.09957994},
    <snip>
    {"Quantity":1.16361678,"Rate":0.10347608},
    {"Quantity":0.03352742,"Rate":0.10347627}],
"Fills"[
    {"Id":209766666,"TimeStamp":"2018-02-09T03:53:43.26",
      "Quantity":0.01107151,"Price":0.09957987,"Total":0.00110249,
      "FillType":"PARTIAL_FILL","OrderType":"SELL"},
    {"Id":209766637,"TimeStamp":"2018-02-09T03:53:37.29",
      "Quantity":0.10760787,"Price":0.09957987,"Total":0.01071557,
      "FillType":"PARTIAL_FILL","OrderType":"SELL"},
    <snip>
    {"Id":209765351,"TimeStamp":"2018-02-09T03:49:34.82",
      "Quantity":0.75099182,"Price":0.09937563,"Total":0.07463028,
      "FillType":"FILL","OrderType":"BUY"}
    {"Id":209765349,"TimeStamp":"2018-02-09T03:49:34.76",
      "Quantity":0.06302414,"Price":0.09926046,"Total":0.00625580,
      "FillType":"PARTIAL_FILL","OrderType":"SELL"}
]},"I":"5"}

There are come other Server–>Client packets like the following but at this time I’m not sure how Group “G” plays into things.


WEBSOCKET S-->C (#56)
{"C":"d-3908A267-B,0|yT,0|yU,2|BU,3265",
"G":"JVEbE2r44ZskGVzF+WDXAfADfPGqtlx<...snip...>",
"M":[] }

All the above seems sufficient for me to code up a SignalR Client MVP for Pharo.

Finer points

Reading around some more, I began to suspect that the handshake start might be more associated with hubs than connections, and wondered if the connectionData=[{"name": "coreHub"}] was really required for the negotiate part of the handshake. So to experiment with negotiating without it…

cloudflareun := CloudflareUn knockUrl: 'http://bittrex.com'.

client := cloudflareun client.
client url: 'http://bittrex.com/signalr/negotiate'.
client queryAt: 'clientProtocol' put: '1.5'.
(response := client get).
(params := NeoJSONReader fromString: response) inspect.

==>
{   "Url":"/signalr",
    "ConnectionToken":"rGeOv5pn7MVxYH9<...snip...>",
    "ConnectionId":"af1e5c12-0e6b-482a-a559-0fde8b62fc78",
    "KeepAliveTimeout":20.0,
    "DisconnectTimeout":30.0,
    "ConnectionTimeout":110.0,
    "TryWebSockets":true,
    "ProtocolVersion":"1.5",
    "TransportConnectTimeout":5.0,
    "LongPollDelay":0.0
}

Thats an identical response to when clientData was included. But without clientProtocol is defaults to a lower protocol version…

client := cloudflareun client.
client url: 'http://bittrex.com/signalr/negotiate'.
client queryAt: 'clientProtocol' put: '1.5'.
(response := client get).
(params := NeoJSONReader fromString: response) inspect.

==>
{   "Url":"/signalr",
    "ConnectionToken":"4RcmbWI8yYu6sD6IDX<...snip...>",
    "ConnectionId":"54b3569d-a3a5-4910-a369-142bcd951f99",
    "KeepAliveTimeout":20.0,
    "DisconnectTimeout":30.0,
    "ConnectionTimeout":110.0,
    "TryWebSockets":true,
    "ProtocolVersion":"1.2",
    "TransportConnectTimeout":5.0,
    "LongPollDelay":0.0
}

And then I wondered, clientProtocol seemed redundant in subsequent handshake steps. So I slimmed down the remaining handshake…

client := cloudflareun client.
client url: 'http://bittrex.com/signalr/connect'.
client queryAt: 'connectionToken' put: connectionToken.
client queryAt: 'transport' put: 'webSockets'.
(websocket := ZnWebSocket onHttpClient: client) inspect.
Transcript open; clear.
[websocket runWith: [ :msg | Transcript crShow: msg ]] forkAt: 35.

client := cloudflareun client.
client url: 'http://bittrex.com/signalr/start'.
client queryAt: 'connectionToken' put: connectionToken.
client queryAt: 'connectionData' put: '[{"name": "coreHub"}]'.
client queryAt: 'transport' put: 'webSockets'.
(response := client get) inspect.

==>{ "Response": "started" }

And btw, works equally well if you use “COREHUB”. Microsoft’s usual case insensitivity.

So, does it work? Can we get data…

Leave that websocket running and lets try replicating Invocation #1 from above. We’ll need to JSON encode the request, slide it down the pipe and see what we get…


message := Dictionary new.
message at: 'I' put: 0.
message at: 'H' put: 'coreHub'.
message at: 'M' put: 'SubscribeToExchangeDeltas'.
message at: 'A' put: {'BTC-ETH'}.
json := String streamContents: [ :stream |
	(NeoJSONWriter on: stream) nextPut: message ].
websocket sendText: json.

==>(in Transcript)
{"C":"d-64571BDB-LJ,1","S":1,"M":[]}
{}
{}
{}
{"C":"d-64571BDB-LJ,2|JE,D6D","G":"oS5hHv23WBWXIR","M":[]}
{"R":true,"I":"0"}
{"C":"d-64571BDB-LJ,2|JE,D6E","M":[{"H":"CoreHub","M":"updateExchangeState",
   "A":[{"MarketName":"BTC-ETH","Nounce":4326,
     "Buys":[
        {"Type":2,"Rate":0.10036454,"Quantity":23.77749657},
        {"Type":0,"Rate":0.09912933,"Quantity":33.90360000},
        {"Type":1,"Rate":0.09905617,"Quantity":0.0}],
     "Sells":[
        {"Type":0,"Rate":0.10069995,"Quantity":44.06500000},
        {"Type":1,"Rate":0.10092096,"Quantity":0.0}],
     "Fills":[
        {"OrderType":"SELL","Rate":0.10036454,"Quantity":1.30127600,"TimeStamp":"2018-02-10T17:47:42.933"}]
}]}]}
{"C":"d-64571BDB-LJ,2|JE,D6F","M":[{"H":"CoreHub","M":"updateExchangeState",
   "A":[{"MarketName":"BTC-ETH","Nounce":4327,
     "Buys":[
        {"Type":1,"Rate":0.09912933,"Quantity":0.0},
        {"Type":0,"Rate":0.09910001,"Quantity":33.04100000}],
     "Sells":[],
     "Fills":[]
 }]}]}
{"C":"d-64571BDB-LJ,2|JE,D70","M":[{"H":"CoreHub","M":"updateExchangeState",
   "A":[{"MarketName":"BTC-ETH","Nounce":4328,
     "Buys":[
        {"Type":2,"Rate":0.10036454,"Quantity":23.25325894},
        {"Type":1,"Rate":0.09956801,"Quantity":0.0},
        {"Type":0,"Rate":0.09920001,"Quantity":33.96970000},
        {"Type":1,"Rate":0.09910001,"Quantity":0.0},
        {"Type":0,"Rate":0.09360000,"Quantity":0.26403692}],
    "Sells":[
        {"Type":0,"Rate":0.10069994,"Quantity":0.99299630},
        {"Type":0,"Rate":0.10083024,"Quantity":14.27868176},
        {"Type":1,"Rate":0.10092095,"Quantity":0.0},
        {"Type":1,"Rate":0.10305599,"Quantity":0.0}],
     "Fills":[
        {"OrderType":"SELL","Rate":0.10036454,"Quantity":0.52423763,"TimeStamp":"2018-02-10T17:47:44.167"}]
}]}]}

WOOHOO!! Ready to roll….

Conclusion

Okay! Now I’m off for a bit to build what I learned into a library. I’ll report back here to link to it shortly.

Posted in Uncategorized | Leave a comment

Pharo v. Cloudflare

In my pursuit to connect Pharo to the realtime order book feed of the Bittrex cryptocurrency exchange there are two main challenges:

  1. It uses Microsoft’s signalr protocol.
  2. The site is guarded by Cloudflare, which requires a Javascript puzzle to be solved.

Here I attack the latter. So lets get started…

On the wire

  1. First we should review how Bittrex libraries for other languages do it. Follow the installation instructions for python-bittrex-websocket and then also clone the repo to get its examples.
    $ git clone git@github.com:slazarov/python-bittrex-websocket.git
    $ cd python-bittrex-websocket/bittrex_websocket/example
    $ python order_book.py
    
  2. By default this library runs over HTTPS which impedes our ability to peek at it, so we need to hack it to use HTTP instead. To discover which file to modify, change the bottom of order_book.py as follows…
     $ vi order_book.py
    
            import inspect
            if __name__ == "__main__":
                print(inspect.getmodule(BittrexSocket))
                main()
    

    Now when you run it the first line displays is the file to modify. Edit it to change all “https” to “http”….

    $ python order_book.py
    <module 'bittrex_websocket.websocket_client' from
        '/home/ben/.local/lib/python2.7/site-packages/bittrex_websocket/websocket_client.pyc' >
    $ vi /home/ben/.local/lib/python2.7/site-packages/bittrex_websocket/websocket_client.py
    
                   urls = ['http://socket-stage.bittrex.com/signalr',
                         'http://socket.bittrex.com/signalr']
    
    
  3. To get a clean view of whats whats happening on the wire it helps to filter for the Bittrex IP address…
     $ ping socket.bittrex.com
    ==> PING socket.bittrex.com (104.17.156.108)
    

    I’ve observed the Bittrex IP addresses bounce around within a subnet 10.17.0.0/16 so we’ll use that for our filter.

  4. Install Wireshark and go to “Capture > Capture filters…” to pre-define a capture filter. Click the plus and enter the bottom line shown here…
  5. Activate that capture filter by clicking the circular icon (fourth from left) to select your network interface (here wlp2s0) and click on the “…using this filter” tag (yellow or green) to choose your pre-defined “bittrex” filter from the list. Then click the Start icon. Note, nothing appears until the next step.
  6. In the “Apply a display filter…” box, enter “http || websocket”, then at the shell do… 
    $ python order_book.py

    and you should see something like…Wireshark-capture

The initial request is shown in packet #4 with its response packet #14 setting a cookie __cfduid and supplying the puzzle to solve. The GET query string (below) decoded looks like connectionData=[{"name": "coreHub"}]&clientProtocol=1.5.

REQUEST (#4)
GET /signalr/negotiate?connectionData=%5B%7B%22name%22%3A+%22coreHub%22%7D%5D&clientProtocol=1.5 HTTP/1.1\r\n 

RESPONSE (#14)
HTTP/1.1 503 Service Temporarily Unavailable
Set-Cookie: __cfduid=df17ba99a887664411404c9f88347504f1517897211; expires=Wed, 06-Feb-19 06:06:51 GMT; path=/; domain=.bittrex.com; HttpOnly
Server: cloudflare
CF-RAY: 3e8bed0546114d2e-PER

<form id="challenge-form" action="/cdn-cgi/l/chk_jschl" method="get">
<input name="jschl_vc" type="hidden" value="2227c799ed495508dbc259c1fb59bc97" />
<input name="pass" type="hidden" value="1517897215.726-F3N87MTQOm" />
<input id="jschl-answer" name="jschl_answer" type="hidden" />
<form>

<script type="text/javascript">
//<![CDATA[
(function(){
  var a = function() {try{return !!window.addEventListener} catch(e) {return !1} },
  b = function(b, c) {a() ? document.addEventListener("DOMContentLoaded", b, c) : document.attachEvent("onreadystatechange", b)};
  b(function(){
    var a = document.getElementById('cf-content');a.style.display = 'block';
    setTimeout(function(){
      var s,t,o,p,b,r,e,a,k,i,n,g,f, rxJOrIr={"jdOijG":+((!+[]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]))};
      t = document.createElement('div');
      t.innerHTML="<a href='/'>x</a>";
      t = t.firstChild.href;r = t.match(/https?:\/\//)[0];
      t = t.substr(r.length); t = t.substr(0,t.length-1);
      a = document.getElementById('jschl-answer');
      f = document.getElementById('challenge-form');
     [truncated]        ;rxJOrIr.jdOijG*=+((+!![]+[])+(+[]));rxJOrIr.jdOijG-=+((+!![]+[])+(+!![]));rxJOrIr.jdOijG-=+((!+[]+!![]+!![]+[])+(!+[]+!![]+!![]));rxJOrIr.jdOijG*=+!![];rxJOrIr.jdOijG+=+((!+[]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]));
      f.action += location.hash;
      f.submit();
      }, 4000);
    }, false);
  })();
//]]>
</script>

The first two fields of the HTML form are pre-seeded with values that (5 seconds later) are carried through to next request in packet #22 with the third field “jschl_answer” calculated by the javascript. The response packet #24 sets second cookie cf_clearance and redirects back to the original URL…

REQUEST (#22)
GET /cdn-cgi/l/chk_jschl?jschl_answer=381&jschl_vc=2227c799ed495508dbc259c1fb59bc97&pass=1517897215.726-F3N87MTQOm HTTP/1.1\r\n

RESPONSE (#24)
Set-Cookie: cf_clearance=097eaac668c0eb0db0a3cceb265e6ef7ea0a384c-1517897216-10800; path=/; expires=Tue, 06-Feb-18 10:06:56 GMT; domain=.bittrex.com; HttpOnly
Server: cloudflare-nginx
CF-RAY: 3e8bed2597f14d40-PER
Location: http://socket-stage.bittrex.com/signalr/negotiate?connectionData=%5B%7B%22name%22%3A+%22coreHub%22%7D%5D&clientProtocol=1.5

As redirected, packet #26 requests the original URI again but now including two cookies. The response packet #29 is the signalr protocol indicating that its ready to try connecting with websockets…

REQUEST (#26)
GET /signalr/negotiate?connectionData=%5B%7B%22name%22%3A+%22coreHub%22%7D%5D&clientProtocol=1.5 HTTP/1.1
Cookie:
cf_clearance=097eaac668c0eb0db0a3cceb265e6ef7ea0a384c-1517897216-10800;
__cfduid=df17ba99a887664411404c9f88347504f1517897211

RESPONSE (#29)
HTTP/1.1 200 OK  (application/json)
{  "Url":"/signalr",
   "ConnectionToken":"ulMFV4z5JG+xfSpMX3A4+S/Fa[...snip...]",
   "ConnectionId":"9317f009-fb2f-4904-84b6-679c40c8b23a",
   "KeepAliveTimeout":20.0,
   "DisconnectTimeout":30.0,
   "ConnectionTimeout":110.0,
   "TryWebSockets":true,
   "ProtocolVersion":"1.5",
   "TransportConnectTimeout":5.0,
   "LongPollDelay":0.0
}

So if Pharo can somehow obtain those two cookies to enable it to receive a response like packet #29, then we’ll have successfully navigated through Cloudflare.

Implementation

Cloudflare provides DDOS protection against massive bot attacks, but obviously Bittrex doesn’t mind reasonably behaved programs connecting to it. After all, they provide an API for this. But its a hurdle we need to get over.

The dependencies of python-bittrex-websocket include Cloudflare-scrape which in turn depends on nodejs to evaluate the Javascript puzzle. But Pharo calling a python library calling a javascript library seems a bit fragile. Also, now that we understand what is happening on the wire, there is no need to muck around in Pharo to parse the web page to extract the javascript challenge to pass nodejs. Instead we should just use a nodejs library that does the whole thing and returns the keys we need. For this cloudscraper looks like a reasonable candidate. Lets start by trialling it from the shell. As a virgin nodejs user, I needed to start with something super simple to check if nodejs was installed…

nodejs -e "console.log(17+25)" 

==> 42

Cool, its ready to go. After some playing around I found the following provides a concise list of the headers we need…


$ npm install cloudscraper
$ nodejs -e \
   '  var cloudscraper = require("cloudscraper");
       cloudscraper.get("http://bittrex.com/",
         function(error, response, body) {console.log(body, response); });
   ' | grep '_header:' | sed 's:\\r\\n:\n:g'

==>
_header: 'GET / HTTP/1.1
User-Agent: Ubuntu Chromium/34.0.1847.116 Chrome/34.0.1847.116 Safari/537.36
Referer: http://bittrex.com/
cookie: __cfduid=d9d16a4714d2db938df32a7e50d1f24001517999469;
 cf_clearance=b68bda98b8f337de6b37f29ac2c2db831741b7d8-1517999475-10800
host: bittrex.com
Connection: close

Note User-Agent is important as noted here that “You must use the same user-agent string for obtaining tokens and for making requests with those tokens, otherwise Cloudflare will flag you as a bot.”

To invoke that from Pharo we’ll use OSProcess (since it seems to have better cross platform support than OSSubProcess). You can load it from the Pharo Catalog.

I haven’t used OSProcess before, so lets try the simplest thing first…


(PipeableOSProcess command: 'echo hi there') output inspect

==> hi there

Yep! That works fine. Lets try some simple nodejs…


(PipeableOSProcess command: 'nodejs -e "console.log(17+25)" ') output inspect

==> 42

Cool! Now lets shoot for what we really need…


headers := (PipeableOSProcess waitForCommand:
    'nodejs -e ''var cloudscraper = require("cloudscraper");
        cloudscraper.get("http://bittrex.com", function(error, response, body)
        {console.log(body, response); }); '' | grep "_header:" '  ) output.
headers inspect.

==>
_header: 'GET / HTTP/1.1\r\n
User-Agent: Ubuntu Chromium/34.0.1847.116 Chrome/34.0.1847.116 Safari/537.36\r\n
Referer: http://bittrex.com/\r\n
cookie: __cfduid=dacadf9197092e503974603ad61e934401518009298; cf_clearance=563d4d1e36594750d69500c49a8b6936b3188b86-1518009304-10800\r\n
host: bittrex.com\r\n
Connection: close\r\n\r\n',

Woo hoo! Now continuing in Playground, to extract our magic pass…


re := '.*(__cfduid=)([^;]*).*' asRegex.
re matchesPrefix: headers.
cfduid := re subexpression: 3. 

re := '.*(cf_clearance=)([^\\]*).*' asRegex.
re matchesPrefix: headers.
cf_clearance := re subexpression: 3. 

re := '.*(User-Agent\: )([^\\]*).*' asRegex.
re matchesPrefix: headers.
userAgent := re subexpression: 3. 

{cfduid . cf_clearance . userAgent} inspect.

And lets put that to use…


client := ZnClient new url: 'https://bittrex.com'.
cookieJar := client session cookieJar.
cookieJar add: ((ZnCookie name: '__cfduid' value: cfduid) domain: 'bittrex.com').
cookieJar add: ((ZnCookie name: 'cf_clearance' value: cf_clearance) domain: 'bittrex.com').
client headerAt: 'User-Agent' put: userAgent.
(response := client get) inspect.

==>
<head>
    <title>Bittrex.com - Bittrex, The Next Generation Digital Currency Exchange</title>

YES!!!! (*arms punch the sky*)
And now the kicker…


client := ZnClient new url: 'https://bittrex.com/signalr/negotiate'.
client queryAt: 'connectionData' put: '[{"name": "coreHub"}]'.
client queryAt: 'clientProtocol' put: '1.5'.
cookieJar := client session cookieJar.
cookieJar add: ((ZnCookie name: '__cfduid' value: cfduid) domain: 'bittrex.com').
cookieJar add: ((ZnCookie name: 'cf_clearance' value: cf_clearance) domain: 'bittrex.com').
client headerAt: 'User-Agent' put: userAgent.
(response := client get) inspect.

==>
{   "Url":"/signalr",
    "ConnectionToken":"UrYmSIBmuE0KzPGcD4[...snip...]",
    "ConnectionId":"781d997f-8d2a-498f-8045-771c11896db5",
    "KeepAliveTimeout":20.0,
    "DisconnectTimeout":30.0,
    "ConnectionTimeout":110.0,
    "TryWebSockets":true,
    "ProtocolVersion":"1.5",
    "TransportConnectTimeout":5.0,
    "LongPollDelay":0.0
}

That looks pretty good to me.

Conclusion

Well, I feel that was a worthwhile journey to properly understand how cloudflare works. In the end its almost too simple to warrant a separate package, but to pull it all together I’ve uploaded minimal package CloudflareUn that can be used like this….


client := (CloudflareUn knockUrl: 'http://bittrex.com') client.
client url: 'https://bittrex.com/signalr/negotiate?connectionData=[{"name": "coreHub"}]&clientProtocol=1.5'.
(response := client get) inspect. 

==>
{   "Url":"/signalr",
    "ConnectionToken":"R+VIw4INdqVBV3r43rVn2gKI+yhqe[...snip...]",
    "ConnectionId":"fa8d0fc5-b8d0-4925-bc63-7aa8984b1f4d",
    "KeepAliveTimeout":20.0,
    "DisconnectTimeout":30.0,
    "ConnectionTimeout":110.0,
    "TryWebSockets":true,
    "ProtocolVersion":"1.5",
    "TransportConnectTimeout":5.0,
    "LongPollDelay":0.0
}

Your feedback and enhancements will be appreciated.

Posted in Uncategorized | 1 Comment

Pharo PDF Rendering, part 2, UFFI interfacing PDFium

Following on from Part 1 where we built PDFium from source into a shared library, we will replicate in Pharo the C example presented at the end of Part 1.  Lets review the  declaration prototypes of the function used, which we’ll need to implement in Pharo.

void          FPDF_InitLibrary()
void          FPDF_DestroyLibrary()
FPDF_DOCUMENT FPDF_LoadDocument(FPDF_STRING file_path, FPDF_BYTESTRING password)
unsigned long FPDF_GetLastError()
int           FPDF_GetPageCount(FPDF_DOCUMENT document)
int           FPDF_GetPageSizeByIndex(FPDF_DOCUMENT document, int page_index,
                                      double* width, double* height)
void          FPDF_CloseDocument(FPDF_DOCUMENT document)

So first we’ll define the basic scaffolding for our UFFI library interface to the Part 1 created shared library libpdfium.so.

FFILibrary subclass: #PDFium
	   instanceVariableNames: ''
	   classVariableNames: ''
	   package: 'PDFium'

PDFium >> unixModuleName
	^'/home/ben/Repos/PDFium/pdfium/out/shared/libpdfium.so'

PDFium class >> ffiLibraryName
	^PDFium

TestCase subclass: #PDFiumTest
	instanceVariableNames: 'missingPdf helloPdf'
	classVariableNames: ''
	package: 'PDFium'

PDFiumTest >> setUp
        missingPdf := 'm!ss!ng.pdf'.
        helloPdf := '/home/ben/Repos/PDFium/pdfium/testing/resources/hello_world.pdf'.
	self assert: helloPdf asFileReference exists.

Then to the class-side we’ll add the ffi functions for library configuration, with a simple test to check whether the callout fails or crashes.

PDFium class >> FPDF_InitLibrary
    ^self ffiCall: #( void FPDF_InitLibrary()  ) 

PDFium class >> FPDF_DestroyLibrary
    ^self ffiCall: #( void FPDF_DestroyLibrary()  )

PDFiumTest >> testLibraryConfiguration
    PDFium FPDF_InitLibrary.
    PDFium FPDF_DestroyLibrary.

Aside: The first time I ran #testLibraryConfiguration I got an “Error: No module to load address from” because I’d left out the return symbol from #unixModuleName.  And after correcting it I needed to restart Pharo to reset the error.  But the  code above is fixed so should work without error.

Next we’ll implement the document control methods. For this we’ll need to define a few types from fpdfview.h. We don’t need all these right now, but its useful to be aware of…

// PDF Types
typedef void* FPDF_DOCUMENT;

// String types
typedef unsigned short FPDF_WCHAR;
typedef unsigned char const* FPDF_LPCBYTE;

// FPDFSDK may use three types of strings: byte string, wide string (UTF-16LE
// encoded), and platform dependent string
typedef const char* FPDF_BYTESTRING;

// FPDFSDK always uses UTF-16LE encoded wide strings, each character uses 2
// bytes (except surrogation), with the low byte first.
typedef const unsigned short* FPDF_WIDESTRING;

// For Windows programmers: In most cases it's OK to treat FPDF_WIDESTRING as a
// Windows unicode string, however, special care needs to be taken if you
// expect to process Unicode larger than 0xffff.
//
// For Linux/Unix programmers: most compiler/library environments use 4 bytes
// for a Unicode character, and you have to convert between FPDF_WIDESTRING and
// system wide string by yourself.
typedef const char* FPDF_STRING;

So FPDF_BYTESTRING and FPDF_STRING look like they map well to Pharo class String, so subclassing that should allow use to use them directly without manual handling. FPDF_WIDESTRING is more complicated, but we don’t need it at the moment. We’ve no information in the API about the internals of FPDF_DOCUMENT, so we need to consider it an opaque object. So lets try…

String subclass: #FPDF_BYTESTRING
	instanceVariableNames: ''
	classVariableNames: ''
	package: 'PDFium'

String subclass: #FPDF_STRING
	instanceVariableNames: ''
	classVariableNames: ''
	package: 'PDFium'

FFIOpaqueObject subclass: #FPDF_DOCUMENT
	instanceVariableNames: ''
	classVariableNames: ''
	package: 'PDFium'

…and use of those to define the document control methods. We’ll use the adhoc method naming convention of appending the C-function name with a double-underscore separator to keyword parameters. Now you should (because I got it wrong, and thank you jbroman on #chromium(Freenode) for setting me straight) pay special attention to the comment of FPDF_GetLastError() in “fpdfview.h” which says…

If the previous SDK call succeeded,
the return value of FPDF_GetLastError() is not defined.

PDFium class >> FPDF_GetLastError
    ^self ffiCall: #( unsigned long FPDF_GetLastError() ) 

PDFium class >> FPDF_CloseDocument__document: document
    ^self ffiCall: #( void FPDF_CloseDocument( FPDF_DOCUMENT *document ) ) 

PDFium class >> FPDF_LoadDocument__file_path: file_path password: password
    ^self ffiCall: #(FPDF_DOCUMENT *FPDF_LoadDocument(FPDF_STRING file_path, FPDF_BYTESTRING password))

PDFiumTest >> testDocumentMissing
	| document error |
	PDFium FPDF_InitLibrary.
	document := PDFium FPDF_LoadDocument__file_path:  missingPdf  password: ''.
	error := PDFium FPDF_GetLastError.
	PDFium FPDF_CloseDocument__document: document.
	PDFium FPDF_DestroyLibrary.
	self assert: document isNull.
	self assert: error equals: 2.  "#define FPDF_ERR_FILE"

PDFiumTest >> testDocumentValid
	| document |
	PDFium FPDF_InitLibrary.
	document := PDFium FPDF_LoadDocument__file_path:  helloPdf  password: ''.
	PDFium FPDF_CloseDocument__document: document.
	PDFium FPDF_DestroyLibrary.
	self assert: document isNull not.
	"note, FPDF_GetLastError() is undefined when SDK calls succeed"

Now if you’ve been paying attention ;),  you’ll have noticed the Pharo definitions of FPDF_LoadDocument() and FPDF_CloseDocument() differ slightly from their C definitions by inclusion of an extra indirection symbol.  Otherwise you get an error FFIDereferencedOpaqueObjectError. The FFIOpaqueObject class comment informs us…

“external objects have a natural arity of zero but they MUST be called with some arity,  because they are actually external addresses (pointers).  That means, you need to always declare external objects as this example:
self ffiCall: #( FFIExternalObject *c_function ( FFIExternalObject *handle ) ) “

So tests are green! Groooveh-babeh! Now lets grab some document info. Getting the number of pages is easy…

PDFium >> FPDF_GetPageCount__document: document
    ^self ffiCall: #(int FPDF_GetPageCount(FPDF_DOCUMENT *document))

PDFiumTest >> testPageCountHelloPdf
	| document pageCount|
	PDFium FPDF_InitLibrary.
	document := PDFium FPDF_LoadDocument__file_path:  helloPdf  password: ''.
	pageCount := PDFium FPDF_GetPageCount__document: document.
	PDFium FPDF_CloseDocument__document: document.
	PDFium FPDF_DestroyLibrary.
	self assert: pageCount equals: 1.

Thats it for now.  Later I’ll look at working with multiple pages.

Posted in Uncategorized | Leave a comment

Pharo PDF Rendering, part 1, building PDFium

Background

For a while now I’ve been wanting to render PDFs inside Pharo.  A few external libraries existed but none had suitable licenses.  Recently I bumped into PDFium – the Foxit renderer open sourced by Google out of Chrome for use by Chromium. With its BSD license this seemed a good candidate, as well as being derived from a successful existing commercial product and part of a significant Google backed project.  So it leverages a lot of funded engineering and expectations of quality are high. Its written in C++ but has a public C interface.

So here I am recording my exploration of building PDFium from source. Later in Part 2 I’ll interface to it using Pharo’s UFFI to render PDF pages to bitmaps displayed within Pharo.

Building PDFium

So we start by following the “Get the code” section in the canonical build instructions.

$ sudo apt install git

$ mkdir -p PDFium && cd PDFium

$ export $MYDEV=`pwd` # just for the sake of being explicit in this post

$ git clone https://chromium.googlesource.com/chromium/tools/depot_tools.git

$ export PATH=$PATH:$MYDEV/depot_tools
Don’t forget to add this to each new terminal you open. You may want to add it to .bashrc.

$ gclient config –unmanaged https://pdfium.googlesource.com/pdfium.git

$ gclient sync –verbose
Wait…..a…..long…..time….. (go get a coffee)

$ cd $MYDEV/pdfium

$ ./build/install-build-deps.sh

Phew! Is that everything? Now actually, we want to work from a stable base, which https://omahaproxy.appspot.com/ shows is Linux-stable-62.0.3202.89

Okay, so now we are ready to try our first build. The build system uses gn to generate ninja build files.  I’ve not this build system before so I’m a bit the blind leading they blind, but the way it works is… you pass the build directory to `gn args` which drops you into an editor to specify the args.gn parameters file, from which (combined with BUILD.gn) the ninja configuration files are produced.  Then from in the build  directory you run `ninja pdfium` to perform the build.   To change build parameters you can run `gn args .` from the build directory.

$ gn args out/FirstBuild

pdf_is_standalone = true                     # Set for a non-embedded build.
is_debug = true                                         # Enable debugging features.

pdf_enable_xfa = true                           # XFA support enabled.
pdf_enable_v8 = true                             # Javascript support enabled.
pdf_use_skia = false                               # Avoid skia backend experiment.
pdf_use_skia_paths = false                 # Avoid other skia backend experiement.
is_component_build = false                # Disable component build (must be false)
clang_use_chrome_plugins = false  # Currently must be false.

$ cd out/FirstBuild

I use `nice ninja` here since otherwise the parallel compiles launched by ninja cause my laptop to crawl (and anyway who wants to play with a nasty ninja?). So here we go, using the canonical args.gn

$ nice ninja pdfium
[2162/2162] AR obj/libpdfium.a

Yay! It built! Hmmm… a static library…? FFI needs a shared library.  We can adapt the build instructions for PdfiumViewer.

$ cd $MYDEV/pdfium

$ vi BUILD.gn

  • Change static_library(“pdfium”) to shared_library(“pdfium”)
  • In section config(“pdfium_common_config”) add this to the defines list:
    • “FPDFSDK_EXPORTS”

$ gn args out/shared

pdf_is_standalone = true                     # Set for a non-embedded build.
is_component_build = false                # Disable component build (must be false)
is_debug = false                                        # Enable debugging features.

pdf_enable_xfa = false                           # XFA support enabled.
pdf_enable_v8 = false                            # Javascript support enabled.
pdf_use_skia = false                               # Avoid skia backend experiment.
pdf_use_skia_paths = false                 # Avoid other skia backend experiement.

Generating files…ERROR at //.gn:9:28: Build argument has no effect.  v8_extra_library_files = []
.                                             ^The variable “v8_extra_library_files” was set as a build argument but never appeared in a declare_args() block in any buildfile.

Darn jigitty!!  Already an error just creating the ninja build files. So everything I know in the past 20 years tells me this means “BROKEN. WON’T BUILD. WON’T RUN”.  So ingrained is this convention and being only days old using ninja and gn I believe it.  But half a day scouring the web for how to fix this found an issue saying… “This is non-fatal by design. Ideally the messaging would be better and say ‘WARNING’. But this is the only nonfatal warning in the entire program so there isn’t code to vary this string.  Low priority. Depending on code complexity may not be worth fixing.”

Ha! Haaaaaarrrrrrrgggghghhhhhh!!!

Well! I could say more, but lets call it a blessing that it seems okay to proceed.
$ cd out/shared

$ nice ninja pdfium
[753/753] SOLINK ./libpdfium.so
Nice! Thats what we’re looking for. So lets try making use of it…

$ mkdir -p $MYDEV/AppTesting/First  &&  cd $MYDEV/AppTesting/First

[Important note... I got sick of fighting WordPress screwing with the angle brackets
of the #includes, so I've substituted similar looking unicode symbols, which you will need to fix if you cut&paste from this page.]

$ vi first.c

#include 〈stdio.h〉
#include 〈fpdfview.h〉
int main() {
        FPDF_InitLibrary();
        FPDF_DestroyLibrary();
        printf("worked okay\n");
}

$ vi Makefile

PDFIUM_REPO= ../../pdfium
INC_DIR= -I ${PDFIUM_REPO}/public
LIB_DIR= -L ${PDFIUM_REPO}/out/shared
PDF_LIBS= -lpdfium
STD_LIBS= -lpthread -lm -lc -lstdc++
default:
    rm -f first
    gcc -o first first.c ${INC_DIR} ${LIB_DIR} ${PDF_LIBS} ${STD_LIBS}
    chmod +x first
    ./first

$ make
first.c:(.text+0xa): undefined reference to `FPDF_InitLibrary’
first.c:(.text+0×14): undefined reference to `FPDF_DestroyLibrary’

Hmmm… Well it found the libpdfium.so library because it didn’t complain about that.  
So here’s a quick summary of a few days hunting this down (oh Smalltalk, shall I count the ways I love thee…).  First lets examine the library…

$ cd $MYDEV/pdfium/out/shared

$ nm libpdfium.so | grep InitLibrary
Hmmm… nothing

$ ls -lh libpdfium.so
-rwxrwxr-x 1 ben ben 41K Nov  7 23:10 libpdfium.so

That does seem rather small. Is our symbol anywhere?…

$ find . -name “*.o” -exec nm -A {} \; | grep InitLibrary
./obj/pdfium/fpdfview.o:0000000000000000 T FPDF_InitLibrary
./obj/pdfium/fpdfview.o:0000000000000000 T FPDF_InitLibraryWithConfig

At least it shows in the object file.  Now from what I read here, the capital “T” indicates these are global symbols in the object file. Lets manually build it into a shared library…

$  gcc -fPIC -shared -o testlib.so obj/pdfium/fpdfview.o

$ nm testlib.so | grep InitLib
testlib.so:00000000000036e0 t FPDF_InitLibrary
testlib.so:0000000000003720 t FPDF_InitLibraryWithConfig

The lower case “t” indicates the symbol changed to an internal/hidden symbol. But why the change?  Perhaps I’m not using the tools right? I found this bewildering – until I discovered `readelf`.

$ readelf -a obj/pdfium/fpdfview.o | grep InitLibrary
133: 000000000000000   56  FUNC  GLOBAL HIDDEN  31 FPDF_InitLibrary
134: 000000000000000 105  FUNC  GLOBAL HIDDEN  33 FPDF_InitLibraryWithConfi

Ahhh… this additional information helps. Since (as `nm` showed earlier) the symbol is global, but its tagged as hidden. After learning more about controlling exported symbols (thanks Nicolas Cellier),  visibility, and why visibility is good I try…

$ grep -R visibility=hidden *

which in build/config/gcc/BUILD.gn finds…

# This config causes functions not to be automatically exported from shared
# libraries. By default, all symbols are exported but this means there are
# lots of exports that slow everything down. In general we explicitly mark
# which functions we want to export from components.
#
# Some third_party code assumes all functions are exported so this is separated
# into its own config so such libraries can remove this config to make symbols
# public again.
#
# See http://gcc.gnu.org/wiki/Visibility
config(“symbol_visibility_hidden”)
{  cflags = [ "-fvisibility=hidden" ]

which apparently can be disabled with…

if (!is_win) {
configs -= [ "//build/config/gcc:symbol_visibility_hidden" ]
}

But rather than experiment like that with an unfamiliar build system, further hunting found the following in ”public/fpdfview.h”

#if defined(_WIN32) && defined(FPDFSDK_EXPORTS)
// On Windows system, functions are exported in a DLL
#define FPDF_EXPORT __declspec(dllexport)
#define FPDF_CALLCONV __stdcall
#else
#define FPDF_EXPORT
#define FPDF_CALLCONV
#endif

which had something familiar about it.  Hmm…..  The PDFiumViewer build instructions had us define “FPDFSDK_EXPORTS”.  But here we see that this only work with Win32 (PDFiumViewer’s target platform).   Lets rearrange this a little…

#if defined(FPDFSDK_EXPORTS)
#if defined(_WIN32)
#define FPDF_EXPORT __declspec(dllexport)
#define FPDF_CALLCONV __stdcall
#else
#define FPDF_EXPORT __attribute__((visibility(“default”)))
#define FPDF_CALLCONV
#endif //_WIN32

#else
#define FPDF_EXPORT
#define FPDF_CALLCONV
#endif //FPDFSDK_EXPORTS

$ nice ninja pdfium
[753/753] SOLINK ./libpdfium.so
FAILED: libpdfium.so libpdfium.so.TOC
and a bunch of undefined references

Further hunting finds gradescope’s suggestion to  “disabling building with clang to avoid dependency hell.” So…

$ gn args .

pdf_is_standalone = true                     # Set for a non-embedded build.
is_component_build = false                # Disable component build (must be false)
is_debug = false                                        # Enable debugging features.

pdf_enable_xfa = false                           # XFA support enabled.
pdf_enable_v8 = false                            # Javascript support enabled.
pdf_use_skia = false                               # Avoid skia backend experiment.
pdf_use_skia_paths = false                 # Avoid other skia backend experiment.
is_clang=false                                           # Avoid dependency hell.

$ nice ninja pdfium
[703/703] SOLINK ./libpdfium.so

readelf -a libpdfium.so | grep InitLibrary
355: 0000000000053ee0     7 FUNC    GLOBAL DEFAULT   12 FPDF_InitLibrary
436: 0000000000053e60   121 FUNC    GLOBAL DEFAULT   12 FPDF_InitLibraryWithConfi
9096: 0000000000053e60   121 FUNC    GLOBAL DEFAULT   12 FPDF_InitLibraryWithConfi
9097: 0000000000053ee0     7 FUNC    GLOBAL DEFAULT   12 FPDF_InitLibrary

$ nm libpdfium.so | grep InitLibrary
0000000000053ee0 T FPDF_InitLibrary
0000000000053e60 T FPDF_InitLibraryWithConfig

Now that looks promising! Lets try it out.

$ cd $MYDEV/AppTesting/First

$ LD_LIBRARY_PATH=$MYDEV/pdfium/out/shared   make
rm -f first
gcc -o first first.c -I ../../pdfium/public -L ../../pdfium/out/shared -lpdfium -lpthread -lm -lc -lstdc++
chmod +x first
./first
worked okay

Yay! So now we are ready to try the library from Pharo!  Stay tuned for Part 2.

cheers -ben

Posted in Uncategorized | 1 Comment

An evening with Pharo and the ESP32 microcontroller

Two popular choices for controlling maker projects are the Arduino and Raspberry Pi.
The Pi is a micro-”computer” that runs Linux to operate as a low powered desktop computer.  The Arduino is a much lower powered micro-”controller” without display nor wireless interfaces, but it comes with analog IO the Pi lacks. But now we’ve a new cool-kid on the block – the ESP32 in the form of the Sparkfun ESP32 Thing and the WeMos LOLIN32.

Fitting squarely between the Pi and Arduino, the ESP32 is a micro-controller like the Arduino nearing the speed of the Pi ZeroW.  Its got even more analog IO where the Pi has none, and built-in WiFi and Bluetooth interfaces the Arduino lacks.  This makes the ESP32 a great candidate platform for many applications including machine control and equipment condition monitoring.  A built-in battery charger is a nice bonus.  
I’ve tabled a spec comparison… Continue reading

Posted in Pharo, Uncategorized | 1 Comment

Pharo Libclang FFI, part 5, client data and recursive visitor/callbacks

Now we make use of the client data to track the indent level.  The recursive call to clang_visitChildren() seems a bit of an anti-pattern to use with a visitor – presumably a new visitor is created each call.   However that’s how it was done in a few tutorials I found and it does provide local storage for each nextLevel variable for the purpose of this demonstration. Continue reading

Posted in FFI, Pharo | Leave a comment

Pharo Libclang FFI, part 4, AST walking with visitors & callbacks

Okay, so we’ve got most of the parts ready. In the last part we managed to load the AST. Now lets do something useful with it. Traversing the tree is done uses a visitor pattern that supplies cursors to a callback function that define locations in the tree.  To the original C code from part 3 we add: the callback function, which I’ve called acceptCursorCallback(); and the callout function clang_visitChildren(), which traverses the tree and invokes the callback function for each node it visits. Continue reading

Posted in FFI, Pharo | Leave a comment

Pharo Libclang FFI, part 3, loading an AST

In the last part we learnt how to get the version string of the library.  That was good to prove it basically works, and also to develop our first C type “CXString“. Now we want to Pharo to process some C code.  Baby steps with `libclang`: Walking an abstract syntax tree provided a good introductory tutorial to using libclang but was a bit C++ oriented, which is not so suitable for Pharo’s FFI.  A pure C interface is easier, so I adapted that tutorial with help from sabottenda’s libclang-sample ASTVisitor. Continue reading

Posted in FFI, Pharo | Leave a comment

Pharo Libclang FFI, part 2, simple callout string return

This is my first exposure to using Pharo’s FFI, so before diving in to process some AST, lets try something simpler to gain familiarity with the library.  Something real simple… 
no parameters and just returning a string. The function clang_getClangVersion() seems to fit the bill.  First lets see how it works in pure-C. Continue reading

Posted in FFI, Pharo | Leave a comment

Pharo Libclang FFI, part 1, preamble

Table of contents

Background

I wanted to better understand the opensmalltalk-vm that Pharo runs on.  I started to manually chart and compare the C code between platforms, which was insightful but tedious and error prone.  What I needed was to automatically process these files.  Clang is a C language front-end for the LLVM compiler, designed to be integrated into external projects.  Libclang provides an interface suitable for the Pharo FFI, but I’d never used FFI before.  From a distance FFI had seemed somewhat daunting and complex, but it turns out reasonably straight forward.  I’m documenting my experience in the form of this tutorial that I can refer back to, and perhaps shines a newbie light on things that may encourage other FFI neophytes to give it a go. Continue reading

Posted in FFI, Pharo | Leave a comment