Pharo v. Cloudflare

In my pursuit to connect Pharo to the realtime order book feed of the Bittrex cryptocurrency exchange there are two main challenges:

  1. It uses Microsoft’s signalr protocol.
  2. The site is guarded by Cloudflare, which requires a Javascript puzzle to be solved.

Here I attack the latter. So lets get started…

On the wire

  1. First we should review how Bittrex libraries for other languages do it. Follow the installation instructions for python-bittrex-websocket and then also clone the repo to get its examples.
    $ git clone git@github.com:slazarov/python-bittrex-websocket.git
    $ cd python-bittrex-websocket/bittrex_websocket/example
    $ python order_book.py
    
  2. By default this library runs over HTTPS which impedes our ability to peek at it, so we need to hack it to use HTTP instead. To discover which file to modify, change the bottom of order_book.py as follows…
     $ vi order_book.py
    
            import inspect
            if __name__ == "__main__":
                print(inspect.getmodule(BittrexSocket))
                main()
    

    Now when you run it the first line displays is the file to modify. Edit it to change all “https” to “http”….

    $ python order_book.py
    <module 'bittrex_websocket.websocket_client' from
        '/home/ben/.local/lib/python2.7/site-packages/bittrex_websocket/websocket_client.pyc' >
    $ vi /home/ben/.local/lib/python2.7/site-packages/bittrex_websocket/websocket_client.py
    
                   urls = ['http://socket-stage.bittrex.com/signalr',
                         'http://socket.bittrex.com/signalr']
    
    
  3. To get a clean view of whats whats happening on the wire it helps to filter for the Bittrex IP address…
     $ ping socket.bittrex.com
    ==> PING socket.bittrex.com (104.17.156.108)
    

    I’ve observed the Bittrex IP addresses bounce around within a subnet 10.17.0.0/16 so we’ll use that for our filter.

  4. Install Wireshark and go to “Capture > Capture filters…” to pre-define a capture filter. Click the plus and enter the bottom line shown here…
  5. Activate that capture filter by clicking the circular icon (fourth from left) to select your network interface (here wlp2s0) and click on the “…using this filter” tag (yellow or green) to choose your pre-defined “bittrex” filter from the list. Then click the Start icon. Note, nothing appears until the next step.
  6. In the “Apply a display filter…” box, enter “http || websocket”, then at the shell do… 
    $ python order_book.py

    and you should see something like…Wireshark-capture

The initial request is shown in packet #4 with its response packet #14 setting a cookie __cfduid and supplying the puzzle to solve. The GET query string (below) decoded looks like connectionData=[{"name": "coreHub"}]&clientProtocol=1.5.

REQUEST (#4)
GET /signalr/negotiate?connectionData=%5B%7B%22name%22%3A+%22coreHub%22%7D%5D&clientProtocol=1.5 HTTP/1.1\r\n 

RESPONSE (#14)
HTTP/1.1 503 Service Temporarily Unavailable
Set-Cookie: __cfduid=df17ba99a887664411404c9f88347504f1517897211; expires=Wed, 06-Feb-19 06:06:51 GMT; path=/; domain=.bittrex.com; HttpOnly
Server: cloudflare
CF-RAY: 3e8bed0546114d2e-PER

<form id="challenge-form" action="/cdn-cgi/l/chk_jschl" method="get">
<input name="jschl_vc" type="hidden" value="2227c799ed495508dbc259c1fb59bc97" />
<input name="pass" type="hidden" value="1517897215.726-F3N87MTQOm" />
<input id="jschl-answer" name="jschl_answer" type="hidden" />
<form>

<script type="text/javascript">
//<![CDATA[
(function(){
  var a = function() {try{return !!window.addEventListener} catch(e) {return !1} },
  b = function(b, c) {a() ? document.addEventListener("DOMContentLoaded", b, c) : document.attachEvent("onreadystatechange", b)};
  b(function(){
    var a = document.getElementById('cf-content');a.style.display = 'block';
    setTimeout(function(){
      var s,t,o,p,b,r,e,a,k,i,n,g,f, rxJOrIr={"jdOijG":+((!+[]+!![]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]+!![]+!![]))};
      t = document.createElement('div');
      t.innerHTML="<a href='/'>x</a>";
      t = t.firstChild.href;r = t.match(/https?:\/\//)[0];
      t = t.substr(r.length); t = t.substr(0,t.length-1);
      a = document.getElementById('jschl-answer');
      f = document.getElementById('challenge-form');
     [truncated]        ;rxJOrIr.jdOijG*=+((+!![]+[])+(+[]));rxJOrIr.jdOijG-=+((+!![]+[])+(+!![]));rxJOrIr.jdOijG-=+((!+[]+!![]+!![]+[])+(!+[]+!![]+!![]));rxJOrIr.jdOijG*=+!![];rxJOrIr.jdOijG+=+((!+[]+!![]+[])+(!+[]+!![]+!![]+!![]+!![]+!![]));
      f.action += location.hash;
      f.submit();
      }, 4000);
    }, false);
  })();
//]]>
</script>

The first two fields of the HTML form are pre-seeded with values that (5 seconds later) are carried through to next request in packet #22 with the third field “jschl_answer” calculated by the javascript. The response packet #24 sets second cookie cf_clearance and redirects back to the original URL…

REQUEST (#22)
GET /cdn-cgi/l/chk_jschl?jschl_answer=381&jschl_vc=2227c799ed495508dbc259c1fb59bc97&pass=1517897215.726-F3N87MTQOm HTTP/1.1\r\n

RESPONSE (#24)
Set-Cookie: cf_clearance=097eaac668c0eb0db0a3cceb265e6ef7ea0a384c-1517897216-10800; path=/; expires=Tue, 06-Feb-18 10:06:56 GMT; domain=.bittrex.com; HttpOnly
Server: cloudflare-nginx
CF-RAY: 3e8bed2597f14d40-PER
Location: http://socket-stage.bittrex.com/signalr/negotiate?connectionData=%5B%7B%22name%22%3A+%22coreHub%22%7D%5D&clientProtocol=1.5

As redirected, packet #26 requests the original URI again but now including two cookies. The response packet #29 is the signalr protocol indicating that its ready to try connecting with websockets…

REQUEST (#26)
GET /signalr/negotiate?connectionData=%5B%7B%22name%22%3A+%22coreHub%22%7D%5D&clientProtocol=1.5 HTTP/1.1
Cookie:
cf_clearance=097eaac668c0eb0db0a3cceb265e6ef7ea0a384c-1517897216-10800;
__cfduid=df17ba99a887664411404c9f88347504f1517897211

RESPONSE (#29)
HTTP/1.1 200 OK  (application/json)
{  "Url":"/signalr",
   "ConnectionToken":"ulMFV4z5JG+xfSpMX3A4+S/Fa[...snip...]",
   "ConnectionId":"9317f009-fb2f-4904-84b6-679c40c8b23a",
   "KeepAliveTimeout":20.0,
   "DisconnectTimeout":30.0,
   "ConnectionTimeout":110.0,
   "TryWebSockets":true,
   "ProtocolVersion":"1.5",
   "TransportConnectTimeout":5.0,
   "LongPollDelay":0.0
}

So if Pharo can somehow obtain those two cookies to enable it to receive a response like packet #29, then we’ll have successfully navigated through Cloudflare.

Implementation

Cloudflare provides DDOS protection against massive bot attacks, but obviously Bittrex doesn’t mind reasonably behaved programs connecting to it. After all, they provide an API for this. But its a hurdle we need to get over.

The dependencies of python-bittrex-websocket include Cloudflare-scrape which in turn depends on nodejs to evaluate the Javascript puzzle. But Pharo calling a python library calling a javascript library seems a bit fragile. Also, now that we understand what is happening on the wire, there is no need to muck around in Pharo to parse the web page to extract the javascript challenge to pass nodejs. Instead we should just use a nodejs library that does the whole thing and returns the keys we need. For this cloudscraper looks like a reasonable candidate. Lets start by trialling it from the shell. As a virgin nodejs user, I needed to start with something super simple to check if nodejs was installed…

nodejs -e "console.log(17+25)" 

==> 42

Cool, its ready to go. After some playing around I found the following provides a concise list of the headers we need…


$ npm install cloudscraper
$ nodejs -e \
   '  var cloudscraper = require("cloudscraper");
       cloudscraper.get("http://bittrex.com/",
         function(error, response, body) {console.log(body, response); });
   ' | grep '_header:' | sed 's:\\r\\n:\n:g'

==>
_header: 'GET / HTTP/1.1
User-Agent: Ubuntu Chromium/34.0.1847.116 Chrome/34.0.1847.116 Safari/537.36
Referer: http://bittrex.com/
cookie: __cfduid=d9d16a4714d2db938df32a7e50d1f24001517999469;
 cf_clearance=b68bda98b8f337de6b37f29ac2c2db831741b7d8-1517999475-10800
host: bittrex.com
Connection: close

Note User-Agent is important as noted here that “You must use the same user-agent string for obtaining tokens and for making requests with those tokens, otherwise Cloudflare will flag you as a bot.”

To invoke that from Pharo we’ll use OSProcess (since it seems to have better cross platform support than OSSubProcess). You can load it from the Pharo Catalog.

I haven’t used OSProcess before, so lets try the simplest thing first…


(PipeableOSProcess command: 'echo hi there') output inspect

==> hi there

Yep! That works fine. Lets try some simple nodejs…


(PipeableOSProcess command: 'nodejs -e "console.log(17+25)" ') output inspect

==> 42

Cool! Now lets shoot for what we really need…


headers := (PipeableOSProcess waitForCommand:
    'nodejs -e ''var cloudscraper = require("cloudscraper");
        cloudscraper.get("http://bittrex.com", function(error, response, body)
        {console.log(body, response); }); '' | grep "_header:" '  ) output.
headers inspect.

==>
_header: 'GET / HTTP/1.1\r\n
User-Agent: Ubuntu Chromium/34.0.1847.116 Chrome/34.0.1847.116 Safari/537.36\r\n
Referer: http://bittrex.com/\r\n
cookie: __cfduid=dacadf9197092e503974603ad61e934401518009298; cf_clearance=563d4d1e36594750d69500c49a8b6936b3188b86-1518009304-10800\r\n
host: bittrex.com\r\n
Connection: close\r\n\r\n',

Woo hoo! Now continuing in Playground, to extract our magic pass…


re := '.*(__cfduid=)([^;]*).*' asRegex.
re matchesPrefix: headers.
cfduid := re subexpression: 3. 

re := '.*(cf_clearance=)([^\\]*).*' asRegex.
re matchesPrefix: headers.
cf_clearance := re subexpression: 3. 

re := '.*(User-Agent\: )([^\\]*).*' asRegex.
re matchesPrefix: headers.
userAgent := re subexpression: 3. 

{cfduid . cf_clearance . userAgent} inspect.

And lets put that to use…


client := ZnClient new url: 'https://bittrex.com'.
cookieJar := client session cookieJar.
cookieJar add: ((ZnCookie name: '__cfduid' value: cfduid) domain: 'bittrex.com').
cookieJar add: ((ZnCookie name: 'cf_clearance' value: cf_clearance) domain: 'bittrex.com').
client headerAt: 'User-Agent' put: userAgent.
(response := client get) inspect.

==>
<head>
    <title>Bittrex.com - Bittrex, The Next Generation Digital Currency Exchange</title>

YES!!!! (*arms punch the sky*)
And now the kicker…


client := ZnClient new url: 'https://bittrex.com/signalr/negotiate'.
client queryAt: 'connectionData' put: '[{"name": "coreHub"}]'.
client queryAt: 'clientProtocol' put: '1.5'.
cookieJar := client session cookieJar.
cookieJar add: ((ZnCookie name: '__cfduid' value: cfduid) domain: 'bittrex.com').
cookieJar add: ((ZnCookie name: 'cf_clearance' value: cf_clearance) domain: 'bittrex.com').
client headerAt: 'User-Agent' put: userAgent.
(response := client get) inspect.

==>
{   "Url":"/signalr",
    "ConnectionToken":"UrYmSIBmuE0KzPGcD4[...snip...]",
    "ConnectionId":"781d997f-8d2a-498f-8045-771c11896db5",
    "KeepAliveTimeout":20.0,
    "DisconnectTimeout":30.0,
    "ConnectionTimeout":110.0,
    "TryWebSockets":true,
    "ProtocolVersion":"1.5",
    "TransportConnectTimeout":5.0,
    "LongPollDelay":0.0
}

That looks pretty good to me.

Conclusion

Well, I feel that was a worthwhile journey to properly understand how cloudflare works. In the end its almost too simple to warrant a separate package, but to pull it all together I’ve uploaded minimal package CloudflareUn that can be used like this….


client := (CloudflareUn knockUrl: 'http://bittrex.com') client.
client url: 'https://bittrex.com/signalr/negotiate?connectionData=[{"name": "coreHub"}]&clientProtocol=1.5'.
(response := client get) inspect. 

==>
{   "Url":"/signalr",
    "ConnectionToken":"R+VIw4INdqVBV3r43rVn2gKI+yhqe[...snip...]",
    "ConnectionId":"fa8d0fc5-b8d0-4925-bc63-7aa8984b1f4d",
    "KeepAliveTimeout":20.0,
    "DisconnectTimeout":30.0,
    "ConnectionTimeout":110.0,
    "TryWebSockets":true,
    "ProtocolVersion":"1.5",
    "TransportConnectTimeout":5.0,
    "LongPollDelay":0.0
}

Your feedback and enhancements will be appreciated.

This entry was posted in Uncategorized. Bookmark the permalink.

One Response to Pharo v. Cloudflare

  1. Pingback: Pharo v. Signalr | openInWorld

Leave a Reply