Tutorial: install Varnish + VMODs from sources & build a Paywall

Varnish Bunny

Varnish is well known as a highly effective proxy cache which is able to deal with all the HTTP request/response parts (URLs, headers, cookies, …) using VCL (Varnish Configuration Language). Moreover additional functionalities can also be embedded inside Varnish by using VMODs. This extends the VCL logic, which is kept simple for performance purposes, and so is quickly limited when we want to add some more complex logic in the code (loops, …). I have recently worked on a Paywall, and using Varnish as the server-side part of the Paywall seemed to me a good idea.

I’ve split this post into 2 main parts:

First, I will explain how to install Varnish 3.0.6 (for client’s compatibility reasons) from sources with 3 associated VMODs :

  • libvmod-curl (accesses HTTP and other resources from within VCL using libcurl),
  • libvmod-cookie (handles the content of the Cookie header without complex use of regular expressions)
  • and libvmod-redis (allows synchronous access to Redis from VCL).

I used an Ubuntu Trusty as the OS.

Second, I will explain the way I used all these tools to build a server-side Paywall, without implementing all the real rules obviously. Take this post more like an overall design and a proof of concept. The VCL configuration contains 2 simple rules based on an authorization service and a counter with a threshold. I will provide you throughout this post with a script I wrote to help you have a full environment ready for testing. This script configures Varnish with the rules I will explain below, installs an Apache server + mod PHP with some PHP scripts (providing content and authorization logic) and installs a Redis server. After executing this script on an Ubuntu Trusty (I used Vagrant + VirtualBox as the provider) you will be ready to play the way you want with the VCL and to bring your own logic to the proxy cache.

Varnish Paywall

You may know about the Varnish Paywall product from the .com website. After digging around a bit, I realized that in order to use this module we need to buy a « not so cheap » license.  This license allows us to use a lot of other modules we don’t need. I looked at the way it works, and eventually concluded that it won’t actually do anything more than I can do by myself using VCL and free (check each VMOD license in light of what you are building) modules, as described in this post. So I decided to build a Varnish Paywall from scratch, and I must admit that it was not so difficult in the end.

I built a Varnish 3.0.6 from source files to get headers which are required to build VMODs. Obviously, remember this is a proof of concept: in a production environment you will probably be more careful about not having dev packages on your servers.

Let It Go! 🙂

Take a careful look at the comments in the code snippets. I’ve written a lot of tips in them for you:

  • I’ve noted all the problems (and the solutions :o)) I encountered to help you deal with them, in case you’ve tried an installation by yourself,
  • I’ve explained in the VCL file snippet the configuration I’ve set up and the features it provides to the Paywall and the way you can improve these features.

On a pristine Ubuntu Trusty VM, you can execute all these snippets one after the other or copy/paste them into a single file you’ve made executable. Be sure to be in the home directory of your current user which must not be root but has access to root access using sudo without a password (vagrant in my case).

I started the installation script with a standard init:

#!/bin/bash

# Author:  Frederic FAURE (frederic.faure@ysance.com)
# Company: YSANCE (http://www.ysance.com/)
# Date:    2015-02-25
# Version: 1.0

INSTALL_LOGFILE=~/install.log
echo " " >> $INSTALL_LOGFILE
echo "--------- `date` ----------" >> $INSTALL_LOGFILE
cd
sudo apt-get update

Then I installed Varnish 3.0.6 from sources with all the needed dependencies and tested it to make sure that it works (the result of the test is in the install.log file):

################################
#### Install Varnish
wget http://repo.varnish-cache.org/source/varnish-3.0.6.tar.gz
sudo apt-get install -y autotools-dev automake1.9 libtool autoconf libncurses-dev xsltproc groff-base libpcre3-dev pkg-config libjemalloc-dev libedit-dev
gunzip varnish-3.0.6.tar.gz
tar -xvf varnish-3.0.6.tar
cd varnish-3.0.6
sh autogen.sh
sh configure
make
make check
sudo make install
cd
if [ -f "/usr/local/sbin/varnishd" ]; then echo "varnishd installed in /usr/local/sbin" >> $INSTALL_LOGFILE; else echo "varnishd not installed" >> $INSTALL_LOGFILE; exit -1; fi
if [ -f "/usr/local/etc/varnish/default.vcl" ]; then echo "varnish default configuration installed in /usr/local/etc/varnish" >> $INSTALL_LOGFILE; else echo "varnish default configuration not installed" >> $INSTALL_LOGFILE; exit -1; fi
echo "Varnish version: `varnishd -V 2>&1 | grep varnishd`" >> $INSTALL_LOGFILE

Next I installed Apache and PHP5 as the backend, to serve the content and provide access to a (very simple) service that holds the authorization logic:

################################
#### Install Apache and Php5
sudo apt-get install -y php5
echo "Apache version: `apachectl -v 2>&1 | grep version | cut -d":" -f2`" >> $INSTALL_LOGFILE
echo "PHP version: `php -v 2>&1 | grep "(cli)" | cut -d" " -f2`" >> $INSTALL_LOGFILE

After that I installed Redis to store the user status or session information on server-side. My purpose was to enhance VCL logic capabilities and cache authorization information got from the authorization service to allow faster execution (cache logic is not implemented in the proof of concept):

################################
#### Install Redis
sudo apt-get install -y redis-server
echo "Redis version: `redis-server -v 2>&1 | cut -d" " -f3 | cut -d"=" -f2`" >> $INSTALL_LOGFILE

Following that, I installed the Varnish modules. I began with the VMOD cURL to allow calls to HTTP services to externalize some logic that VCL cannot hold, either because of the VCL limitations (no loops, …), or because data (needed to take decisions) are stored elsewhere:

################################
#### Install VMOD cURL
sudo apt-get install -y git
git clone https://github.com/varnish/libvmod-curl
# When "sh configure" => error: Package requirements (libcurl) were not met: No package 'libcurl' found
# Solution => install "libcurl4-openssl-dev" - development files and documentation for libcurl (OpenSSL flavour)
# When "make" => You need rst2man installed to make dist
# Solution => install "python-docutils" - text processing system for reStructuredText (implemented in Python 2)
sudo apt-get install -y curl libcurl4-openssl-dev python-docutils
cd libvmod-curl
git checkout 3.0
git pull origin 3.0
# When "sh autogen.sh" => required file `./ltmain.sh' not found, Can't open configure
# Reason => http://www.gnu.org/software/automake/manual/html_node/Error-required-file-ltmain_002esh-not-found.html
# Solution => run "sh autogen.sh" twice
sh autogen.sh
sh autogen.sh
sh configure VARNISHSRC=$HOME/varnish-3.0.6
make
# When "make check" => Message from VCC-compiler: Could not load module curl /src/.libs/libvmod_curl.so: cannot open shared object file: No such file or directory
# Reason => variable in code 'import curl from "${vmod_topbuild}/src/.libs/libvmod_curl.so";' is not well replaced (empty): 'import curl from "/src/.libs/libvmod_curl.so";'
# Solution => put the "make check" into comments
# make check
sudo make install
cd
if [ -f "/usr/local/lib/varnish/vmods/libvmod_curl.so" ]; then echo "libvmod_curl installed in /usr/local/lib/varnish/vmods" >> $INSTALL_LOGFILE; else echo "libvmod_curl not installed" >> $INSTALL_LOGFILE; exit -1; fi

Then I needed the VMOD Cookie to deal more efficiently with cookies. Varnish can natively deal with cookies but at the price of complex regexps and I didn’t want to loose the main goal of the VCL logic in verbose side-code:

################################
#### Install VMOD Cookie
git clone https://github.com/lkarsten/libvmod-cookie
cd libvmod-cookie
# When "make" => No rule to make target `@VMODTOOL@', needed by `vcc_if.c'.
# Reason => libvmod-cookie split into 2 branches as of Varnish 4 => branch 4.0 is now the default
# Solution => Get the right Git branch and not try to build module v4 on a Varnish v3 => "git checkout 3.0"
git checkout 3.0
git pull origin 3.0
# Same as VMOD cURL
sh autogen.sh
sh autogen.sh
sh configure VARNISHSRC=$HOME/varnish-3.0.6
make
# Same as VMOD cURL
# make check
sudo make install
cd
if [ -f "/usr/local/lib/varnish/vmods/libvmod_cookie.so" ]; then echo "libvmod_cookie installed in /usr/local/lib/varnish/vmods" >> $INSTALL_LOGFILE; else echo "libvmod_cookie not installed" >> $INSTALL_LOGFILE; exit -1; fi

Finally, in terms of modules, I needed to reach Redis to store browsing information (global or « by topic » view count, …) from users to take decisions or to cache some authorization data got from HTTP services to allow faster execution:

################################
#### Install VMOD Redis
git clone https://github.com/carlosabalde/libvmod-redis
# When "sh configure" => configure: error: libvmod-redis requires libhiredis.
# Solution => install "libhiredis0.10" - minimalistic C client library for Redis
#                     "libhiredis-dev" - minimalistic C client library for Redis (development files)
sudo apt-get install -y libhiredis0.10 libhiredis-dev
cd libvmod-redis
git checkout 3.0
git pull origin 3.0
# Same as VMOD cURL
sh autogen.sh
sh autogen.sh
sh configure VARNISHSRC=$HOME/varnish-3.0.6
make
# Same as VMOD cURL
# make check
sudo make install
cd
if [ -f "/usr/local/lib/varnish/vmods/libvmod_redis.so" ]; then echo "libvmod_redis installed in /usr/local/lib/varnish/vmods" >> $INSTALL_LOGFILE; else echo "libvmod_redis not installed" >> $INSTALL_LOGFILE; exit -1; fi

Next, I created the 2 PHP files. The first one generates a content (full or truncated) depending on the header X-View-Authorized in the request. It simulates the backend, probably a CMS managing your contents. The second file simulates an authorization service. It merely tests if the user in the POST request is « bob »! If yes, access is granted, if not… You are not bob: (hit_for_) « pass » on your way! ;o)

################################
#### Create PHP files

#### Content generator (full or truncated)
cat <<EOF > content.php
<?php
foreach (getallheaders() as $name => $value) {
    if ($name == "X-View-Authorized") {
        if ($value == "1") {
            echo "<p>I am a long<br/>long<br/>long<br/>long<br/>long<br/>long<br/>long<br/>long<br/>text !</p>";
        } else {
            echo "<p>I am a long<br/>long<br/>...</p>";
        }
    }
}
?>
EOF
sudo cp content.php /var/www/html/content.php

#### Authorize to access full content or not
cat <<EOF > authorize.php
<?php
if ($_POST["user"] == "bob") {
    echo "1";
} else {
    echo "0";
}
?>
EOF
sudo cp authorize.php /var/www/html/authorize.php

Now the main dish! I configured the VCL file which deals with HTTP requests, backend responses and cache responses. The VCL contains the logic of the Paywall. Have a close look at the comments: a lot of details are given in them.

################################
#### Create VCL file
cat <<EOF > my_project.vcl
backend default {
    .host = "127.0.0.1";
    .port = "80";
}

import std;
import curl;
import cookie;
import redis;

sub vcl_init {
    # init(STRING tag, STRING location, INT timeout, INT ttl, INT retries, BOOL shared_contexts, INT max_contexts)
    redis.init("main", "127.0.0.1:6379", 500, 0, 0, false, 1);
}

sub vcl_recv {
    # Normalize "Accept-Encoding" to reduce "Vary".
    # Do this only once per request.
    if (req.restarts == 0) {
        if (req.http.Accept-Encoding) {
            # Make sure Internet Explorer 6 doesn't need to deal with compression (it's notoriously bad at it).
            if (req.http.User-Agent ~ "MSIE 6") {
                unset req.http.Accept-Encoding;
            # No point in compressing these.
            } elsif (req.url ~ ".(jpg|png|gif|gz|tgz|bz2|tbz|mp3|ogg)$") {
                unset req.http.Accept-Encoding;
            # If "Accept-Encoding: gzip,deflate" or "Accept-Encoding: deflate,gzip": a preference for gzip,
            # therefore this case first.
            } elsif (req.http.Accept-Encoding ~ "gzip") {
                set req.http.Accept-Encoding = "gzip";
            # Add support for some ancient HTTP clients.
            } elsif (req.http.Accept-Encoding ~ "deflate") {
                set req.http.Accept-Encoding = "deflate";
            } else {
                unset req.http.Accept-Encoding;
            }
        }
    }

    # No access authorized at start, just in case to overwrite client's crafted headers in the request.
    set req.http.X-View-Authorized = 0;
    # Get the cookies from the request.
    cookie.parse(req.http.cookie);

    # Ask an external PHP code for the authorization logic.
    #
    # Obviously in real logic, you will have a token (instead of the username), generated by the external PHP
    # service (which handle the identification and authorization logic), either at log in time for authenticated
    # users or the first time the client with no "token" cookie try to view a content (anonymous client).
    # So this token may or may not be linked to a real user depending if the client is authenticated or not.
    # Once a client is identified with a token, the information "authenticated or not" should be stored in
    # another cookie to avoid asking about authorization for each call:
    #   - if authenticated, the rights could be stored in Redis for faster access:
    #       o try to get rights from Redis,
    #       o if not stored here, ask the external PHP service, and then store it into Redis.
    #   - if not authenticated, then no right, therefore no need to spend time asking the external service or Redis.
    curl.post("http://127.0.0.1:80/authorize.php", "user=" + cookie.get("mp_user"));
    # Set the answer in the request "X-View-Authorized" header
    set req.http.X-View-Authorized = curl.body();
    curl.free();

    # If still not authorized, because the client has no right (when authenticated) or because
    # he is anonymous (not authenticated), he has 10 free views
    if (std.integer(req.http.X-View-Authorized, 0) == 0) {
        # Increments the number stored at key by one. If the key does not exist, it is set to 0 before performing the operation.
        redis.command("INCR");
        redis.server("main");
        redis.push(cookie.get("mp_user")+":view_count");
        redis.execute();
        set req.http.X-View-Count = redis.get_integer_reply();
        # In this case, we count each content access, even if the same content is accessed each time.
        # In a real case, we probably want to leave access to 10 different contents, and moreover with an expiracy (for
        # exemple 10 free contents for a month). Easy with Redis! We can use Sets:
        #   - Store items with "SADD key member": SADD <user/token>:view_count <content_id>. The <content_id> can be sent
        #     in a request header.
        #   - At first test if the content has already been viewed: SISMEMBER <user/token>:view_count <content_id> (returns
        #     the value 1 if the element is a member of the set or 0 if the element is not a member of the set, or if key
        #     does not exist). If you get "1", the content has already been viewed: grant the right.
        #   - Else, test the Set length SCARD <user/token>:view_count (returns the set cardinality - number of elements -
        #     of the set stored at key or 0 if key does not exist). If less than 10: grant the right and set (SADD) the hit
        #     on the content. Moreover if 0 (the key did not exist): set the expiracy (one month from now) with
        #     EXPIRE <user/token>:view_count <seconds> (set a timeout on key: after the timeout has expired, the key will
        #     automatically be deleted).
        if (std.integer(req.http.X-View-Count, 0) < 11) {
            set req.http.X-View-Authorized = 1;
        }
    }
    # Force caching by deleting all cookies
    unset req.http.cookie;
}

sub vcl_fetch {
    # Vary the content on 2 axis: encoding and authorization. Thanks to the normalization of the "Accept-Encoding" header,
    # we only have a few different "versions" of the same cached object: "full and gziped", "full and deflated",
    # "truncated and gziped", ...
    set beresp.http.Vary = "Accept-Encoding, X-View-Authorized";
    # Set the cache TTL to 1 minute. If you modify the content returned (in "content.php" file), it will take up to 1 minute
    # for a client's call to get the change.
    set beresp.ttl = 60 s;
    # If the "X-View-Count" header is set in the request and the right to access the content has been granted (based on this
    # information), we send it back to the client to tell him how many free contents it remains.
    # Be careful: this information is cached with the page, so it is shared for all the client's request during 1 minute...
    # See "vcl_deliver" for the trick.
    # Tip: use "X-Varnish" response header to know if you got the page from cache or not (and check the behaviour you are
    # waiting for). For a cache hit, "X-Varnish" will contain both the ID of the current request and the ID of the request
    # which populated the cache. Therefore:
    #   - X-Varnish:752368328 752368327 => got the page from cache,
    #   - X-Varnish:752368328           => got the page from backend.
    if (req.http.X-View-Count && std.integer(req.http.X-View-Authorized, 0) == 1) {
        set beresp.http.X-View-Count = req.http.X-View-Count;
    }
}

sub vcl_deliver {
    # Unset this dynamic header and set it to the right value if needed before sending the cached page to the client.
    unset resp.http.X-View-Count;
    if (req.http.X-View-Count && std.integer(req.http.X-View-Authorized, 0) == 1) {
        set resp.http.X-View-Count = req.http.X-View-Count;
    }
}
EOF
sudo cp my_project.vcl /usr/local/etc/varnish/my_project.vcl

Final steps. I checked that the Apache and Redis services have started up properly to give a response to Varnish:

################################
#### Check Apache and Redis Servers are started
#### Should be because they are managed by services
if [ -z "`ps aux | grep "/usr/sbin/apache2" | grep -v grep`" ]; then echo "Apache Server NOT started" >> $INSTALL_LOGFILE; exit -1; else echo "Apache Server started" >> $INSTALL_LOGFILE; fi
if [ -z "`ps aux | grep "/usr/bin/redis-server" | grep -v grep`" ]; then echo "Redis Server NOT started" >> $INSTALL_LOGFILE; exit -1; else echo "Redis Server started" >> $INSTALL_LOGFILE; fi

And last of all, I started the Varnish server itself. Here you are:

################################
#### Start Varnish Server
sudo varnishd -f /usr/local/etc/varnish/my_project.vcl -s malloc,128M -T 127.0.0.1:2000 -a 0.0.0.0:8080
if [ -z "`ps aux | grep "varnishd" | grep -v grep`" ]; then echo "Varnish Server NOT started" >> $INSTALL_LOGFILE; exit -1; else echo "Varnish Server started" >> $INSTALL_LOGFILE; fi

You’re now able to use the installed components to run some tests:

  • Have a look at the install.log file at the same level as the install script: you have to find a « Varnish Server started » at the end.
  • Open a browser and set a cookie called mp_user (it is required) using a plugin like EditThisCookie (for Chrome). Set « bob » as the value if you want to always be authorized (or anything else if you don’t), set the IP of your VM as the domain, and set ‘/’ as the path.
  • Use the URL http://192.168.33.10:8080/content.php to access the content through Varnish (update the IP with the IP of your VM):
    • if you are bob, you always have access to the full content,
    • if you are another user, 10 accesses to the full content are granted to you. Please note that the number of already-viewed contents is in the response headers for a display if needed (for example a bar with « You have read 5 of 10 free articles this month. »), even if the content is cached. Switch the value in mp_user cookie from one to another and see what you get: you always get the right number of already-accessed contents, even if the content comes from the cache (60s TTL). When the threshold is exceeded, you get the truncated content.

As you can imagine, there are a few more « trickier » rules in the real Paywall configuration I set up: I can deal with all of them with this installation (plus just a little bit of cliend-side JavaScript and the libvmod-dns to double check the bots), but it is not the intention of this post to present them all and anyway the rules depend on the specific needs of the client.

I hope this tutorial will help you take your first steps into the world of Varnish and Paywalls!

Frédéric FAURE @Twitter @Ysance

Thanks to Online syntax highlighter like TextMate for the syntax highlighting.

Publicités

Répondre

Entrez vos coordonnées ci-dessous ou cliquez sur une icône pour vous connecter:

Logo WordPress.com

Vous commentez à l'aide de votre compte WordPress.com. Déconnexion /  Changer )

Photo Google

Vous commentez à l'aide de votre compte Google. Déconnexion /  Changer )

Image Twitter

Vous commentez à l'aide de votre compte Twitter. Déconnexion /  Changer )

Photo Facebook

Vous commentez à l'aide de votre compte Facebook. Déconnexion /  Changer )

Connexion à %s