The Legend of Korra - a damn good cartoon. posted on 26 December 2014

I just watched the four seasons of The Legend of Korra, a sequel to Avatar: The Last Airbender, and it is a damn good cartoon.

The writers did a tremendous job. It felt a bit special (strange?) to me at the beginning not because Korra, the main character is a heroine, but because she is a person looking and forging her own path. The process is messy, and there is not a clear distinction between good and evil, like in the cartoons and animes I grew up with. This ambiguity probably made the cartoon more interesting as it was slightly harder to guess how things would turn out.

That being said, the story is well written and I quickly became addicted. As I grew attached to the characters, the cartoon dealt with more and more sensitive topics like social inequalities, war, post-traumatic stress disorder and eventually LGBT.

While I do not expect children to read too much between the lines, or one day quote The Legend of Korra in an argument about social inequalities, this cartoon is still for them a first exposure to some of today sensitive topics. Hopefully it will help them open their mind.

It was definitively worth my time. The Bryan Konietzko's tumblr is also quite interesting to read.

Thinky and its Raison d'être! posted on 20 December 2014

Disclaimer: This post reflects my opinion and just mine. I also used to work at RethinkDB.

I am the author of thinky. For those not familiar with it, thinky is a Node.js ORM for RethinkDB. I never really took the time to write about the philosophy behind thinky. Since I have some time during these holidays, here we are.

A bit about RethinkDB

NoSQL databases come in multiple flavors, and describing all of them is out of this article's scope. However, talking about thinky's philosophy without describing RethinkDB would be pretty hard. RethinkDB wrote an interesting blog post about where it stands. While I agree with their post, I think they don’t place enough emphasis on how developer friendly RethinkDB is.

  • RethinkDB is schemaless. A schemaless database is like a dynamic programming language; they are both easier to learn and allow for faster bootstrapping. Additionally, changing the format of your data doesn’t require any migration.

  • ReQL (RethinkDB Query Language) is embedded in the host language, meaning no more SQL strings or MongoDB JSON objects to build. In JavaScript, you end up with a chainable query language:

    var promise = r.table("users").get("67dc69ae-e235-4f55-a71b-6b87fe4df894")
        .update({name: "Michel"}).run(connection);
    
  • RethinkDB has efficient and distributed server-side joins. Nested structures are a poor answer for many-to-many relations, a situation that appears as soon as you try to model a social network with users having friends.

  • RethinkDB can set up shards and replicas with just a few clicks on a gorgeous and friendly web interface that natively ships with the server.

  • RethinkDB provides an easy way to push changes to clients -- broadcasting all the changes on the table data is as simple as this:

    var sockets = []; // All your SockJS connections
    r.table("data").changes().run(connection).then(function(feed) {
      feed.each(function(change) {
        var message = JSON.stringify(change.new_val);
        for(var i=0; i<sockets.length; i++) {
          sockets[i].write(message);
        }
      })
    })
    

    While Meteor and Firebase are hooked on MongoDB operation logs and Asana built their famous Luna Framework on top of Kraken (their distributed pubsub server), it is hard and complicated to build such systems. RethinkDB provides this real-time feature at no additional cost, without locking you to a whole stack.

What does thinky do?

Thinky works in harmony with RethinkDB to provide a frictionless experience for the developer. This is done by automating and reducing the work required for common operations.

  • Thinky validates your data before saving it. Flexible schemas mean faster iterations but corrupted data is any engineer’s worst nightmare; thinky does all the work to make sure that you only save valid documents. It can also easily generate default values for you.

  • Thinky handles connections under the hood in an optimal way. There’s no need for middleware to open/close connections, and no need for listeners to handle network errors

    Users.get("67dc69ae-e235-4f55-a71b-6b87fe4df894").run().then(function(user) {
      // do something with `user`
    }).error(...)
    
  • Define the relations once and automatically save/retrieve joined documents with the simple command, getJoin.

    var Post = thinky.createModel("Post", { id: String, title: String, content: String, idAuthor: String }); 
    var Author = thinky.createModel("Author", { id: String, name: String });
    Post.belongsTo(Author, "author", "idAuthor", "id");
    //                       |-> key where the joined document will be stored
    //                                  |-> left key
    //                                            |-> right key
    
    Post.get("67dc69ae-e235-4f55-a71b-6b87fe4df894").getJoin().run().then(function(post) {
      // post will have a field "author" that maps to its author.
    }).error(...);
    
  • Thinky encompasses all of ReQL’s powerful features. Anonymous functions that get serialized and sent to the server are still available, as are inner queries.

    // Return the grown up friends'id of a user.
    User.get("67dc69ae-e235-4f55-a71b-6b87fe4df894")("friend_ids")
        .filter(function(friend_id) {
          return Users.get("friend_id")("age").gt(18);
        });
    }).execute().then(function(friend_ids) {
      // ...
    }).error(...)
    
  • Thinky automatically creates tables for you. Spend time on things that matter, your code, not operations.

What does thinky not do?

Thinky is built as a genuine useful library. It does not try to do what it cannot, or give the user the illusion of a feature when it is not safe.

  • Thinky does not provide unique secondary indexes. To the extent of my knowledge, no databases support such a feature in a distributed scenario.

  • Thinky does not provide transactions as RethinkDB only provides atomicity per document.

Conclusion

Should you use RethinkDB?
There are a few use cases where you are better off with another database.

  • RethinkDB does not support transactions. If you need strong consistency (i.e you are building a bank system), use a SQL database with ACID properties like PostGreSQL (or maybe FoundationDB though it is not open source).
  • Flexible schemas come with a price. Documents are not stored as they are in a column oriented database. In some cases, you may be better off with something like HBase.
  • Operations on large clusters for RethinkDB do not properly scale; this seems to be fixed and should be released in the next version. Small clusters are pretty stable and the majority of you probably do not bigger clusters.
  • If you only need a key-value store, in my opinion Cassandra is a pretty good one.

That being said, if you are building a common web application (i.e Yelp, Feedly, etc.), you probably have a lot to gain from using RethinkDB.

Should you use thinky?
If you use RethinkDB and Node.js, yes. Thinky works in harmony with RethinkDB to make writing code easier and faster by doing common things like validation.

It is easy to learn if you know ReQL since the syntaxes are almost the same. Despite being simple, thinky is a powerful library:

Give it a shot, and if you have feedback/suggestions, open an issue on GitHub, ping me on Twitter via @neumino, or shoot me an email at [email protected].

Thinky 1.15.1 and a sneak peek at the future posted on 12 November 2014

Thinky 1.15.1 just got released! Sorry for the wait, I was quite busy these last weeks. Here is what comes with this new release:

  • More readable code

    • Running tests now takes about 30 seconds instead of 3 minutes. This is mostly because tests now re-use the same tables.
    • Some big and cumbersome methods have been refactored in smaller ones like save, hooks etc.
  • Better Performance, a simple benchmark that creates/validates documents runs about 3 times faster:

    • The code was optimized for v8 try/catch, bind, etc.
    • Generating virtual fields does not require a full traversal of a document, the comlexity being now O(n) where n is the number of virtual fields. The same goes for fields with default values. This was first mentioned by @colprog via the pull request 139
    • Less copying for documents and options.
  • Some bug fixes:

    • #134 -- save should not remove some foreign keys
    • #137 -- enforce_extra: remove should not save the fields in the database, not just hide them.

One important note is that now, we have the following behavior:

var values = {name: "Michel"};
var user = new User(values);
assert.strictEqual(user, values); // This used to be false before

// But do not worry, this still holds true
var userCopy = new User(values);
assert.notStrictEqual(user, userCopy);

About the next steps, there are two interesting things coming:

  • Introduce a new way to declare schema, sometihng similar to hapi/joi and unify the schema under the hood. This should eventually provide sightly better performance.
  • RethinkDB 1.16 is close, and it introduces point change feeds, meaning that thinky will be able to provide a document that will automatically update and emit an event when it is the case.
    This will probably come with a light module that will automatically hook thinky to sockjs. This is going to be pretty cool, trust me :)

Feedback/suggestions? Ping me on Twitter via @neumino or shoot me an email at [email protected].

RethinkDB and CoreOS: Navigating Digital Ocean Together posted on 03 October 2014

A new generation of databases are now sailing on new seas such as Digital Ocean, transported in Docker containers, and steered by CoreOS.

CoreOS is a terrific tool to deploy applications on multiple servers. However, running a RethinkDB cluster on CoreOS is a bit more complicated than running multiple Nginx servers with a load balancer since a RethinkDB instance must be given at least one server to join. This article illustrates one way, hopefully the right way, to do it on Digital Ocean.

First, set up a fleet (or sub-fleet) of CoreOS machines; in this example, we will suppose that we are building a 6-server RethinkDB cluster. If you have never done it before, begin by getting a discovery token on https://discovery.etcd.io/new.

Then boot a few CoreOS instances with the following cloud-config file and your ssh keys. Make sure that you enable private networking on Digital Ocean as RethinkDB does not provide encryption/security for cluster traffic yet.

#cloud-config

coreos:
  etcd:
    discovery: https://discovery.etcd.io/<token>
    addr: $private_ipv4:4001
    peer-addr: $private_ipv4:7001
  fleet:
    public-ip: $private_ipv4   # used for fleetctl ssh command
    metadata: group=rethinkdb
  units:
    - name: etcd.service
      command: start
    - name: fleet.service
      command: start

Note that these machines will be tagged with the metadata group=rethinkdb. I personally appreciate being able to group my servers depending on what their responsibilities are (or will be).

To create a RethinkDB cluster, we need to start the instances with the argument --join host:port where another instance of RethinkDB will be running. We will first create a discovery service where CoreOS servers will register their IP in etcd using etcdctl; we will force RethinkDB to run on these servers and provide them with all the IP addresses of the servers running the discovery service.

First, let's create a file rethinkdb-discovery@.service on one of your servers with the following content:

[Unit]
Description=Announce RethinkDB@%i service

[Service]
EnvironmentFile=/etc/environment
ExecStart=/bin/sh -c "while true; do etcdctl set /announce/services/rethinkdb%i ${COREOS_PRIVATE_IPV4} --ttl 60; sleep 45; done"
ExecStop=/usr/bin/etcdctl rm /announce/services/rethinkdb%i

[X-Fleet]
X-Conflicts=rethinkdb-discovery@*.service
MachineMetadata=group=rethinkdb

Then load the service with:

fleetctl submit rethinkdb-discovery@.service

Finally, start it with:

fleetctl start rethinkdb-discovery@{1..6}.service

Now create the service for RethinkDB:

[Unit]
Description=RethinkDB@%i service
After=docker.service
BindsTo=rethinkdb-discovery@%i.service

[Service]
EnvironmentFile=/etc/environment
TimeoutStartSec=0
ExecStartPre=-/usr/bin/docker kill rethinkdb%i
ExecStartPre=-/usr/bin/docker rm rethinkdb%i
ExecStartPre=-/usr/bin/mkdir -p /home/core/docker-volumes/rethinkdb
ExecStartPre=/usr/bin/docker pull dockerfile/rethinkdb
ExecStart=/bin/sh -c '/usr/bin/docker run --name rethinkdb%i   \
    -p ${COREOS_PRIVATE_IPV4}:8080:8080                        \
    -p ${COREOS_PRIVATE_IPV4}:28015:28015                      \
    -p ${COREOS_PRIVATE_IPV4}:29015:29015                      \
    -v /home/core/docker-volumes/rethinkdb/:/data/             \
    dockerfile/rethinkdb rethinkdb --bind all                  \
    --canonical-address ${COREOS_PRIVATE_IPV4}                 \
    $(/usr/bin/etcdctl ls /announce/services |                 \
        xargs -I {} /usr/bin/etcdctl get {} |                  \
        sed s/^/"--join "/ | sed s/$/":29015"/ |               \
        tr "\n" " ")'

ExecStop=/usr/bin/docker stop rethinkdb%i

[X-Fleet]
MachineMetadata=group=rethinkdb
X-ConditionMachineOf=rethinkdb-discovery@%i.service

The service is first going to fetch data from etcd and then start a Docker container with RethinkDB with the --join argument.

Note: Because you are running RethinkDB inside a container, you must provide the argument canonical-address or other instances will try to connect to the wrong IP address.

Run:

fleetctl start rethinkdb@{1..6}.service

And it's done! You now have a cluster of six machines running RethinkDB.

When RethinkDB provides auto-failover, in the event of a server failure, if you happen to have an extra CoreOS server, CoreOS will restart another RethinkDB instance and RethinkDB will automatically re-elect a master/backfill to prepare another replica without requiring any work on your end. Heavy refactoring of the clustering is being done right now, so hopefully this feature should ship soon (~2 months?).

Two things to finish this article:

  1. Thanks to @atnnn for helping me with some bash issues and Jessie for proofreading my Frenglish.
  2. Questions? Suggestions? Ping me on Twitter: @neumino.

Edit: dividuum on Hacker News pointed out that the private networking on Digital Ocean was not restricting other droplets from connecting to the cluster. I will follow up with another post to run some iptables commands to make sure that the cluster is safe.

Docker container for Firefox OS posted on 14 July 2014

I recently got a Flame, the developer reference phone for Firefox OS.

I created a Docker container to build Firefox OS, mostly because I didn't feel like installing Java 6 on my system (since it is not supported anymore). This post is about how to build such the container.

If you are interested in the image, you can find it on the Docker's hub once this issue will be solved.

Start it with

sudo docker run -t -i --privileged --expose 5037 -v /dev/bus/usb:/dev/bus/usb -v /host/data:/container/data ubuntu-firefoxos /bin/bash

Steps to create the container:

Start a Ubuntu container.

sudo docker run -t -i ubuntu:14.04 /bin/bash

Update the system.

apt-get update
apt-get upgrade

Install the dependencies as described in the docs.

dpkg --add-architecture i386
apt-get update
apt-get install --no-install-recommends autoconf2.13 bison bzip2 ccache curl flex gawk gcc g++ g++-multilib gcc-4.6 g++-4.6 g++-4.6-multilib git lib32ncurses5-dev lib32z1-dev zlib1g:amd64 zlib1g-dev:amd64 zlib1g:i386 zlib1g-dev:i386 libgl1-mesa-dev libx11-dev make zip libxml2-utils
apt-get install python
apt-get install android-tools-adb
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.6 1 
update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-4.8 2 
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.6 1 
update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-4.8 2 
update-alternatives --set gcc "/usr/bin/gcc-4.6" 
update-alternatives --set g++ "/usr/bin/g++-4.6" 

Note: All the instructions below are what I did to be able to build the OS for the Flame. Some steps may not be required for another phone (and some may be missing).

You need to specify some credential for git.

git config --global user.email <your_email>
git config --global user.name <your_name>

Then you have to install a few more things to be able to build, with first Java.

add-apt-repository ppa:webupd8team/java
apt-get update
install oracle-java6-installer

Building Firefox OS requires you to pull some blobs from your phone with adb.

apt-get install android-tools-adb
apt-get install libusb-1.0-0 libusb-1.0-0-dev
apt-get install usbutils # This may not be needed, I used it to debug a few things

Install a few more packages required by the build process.

apt-get install dosfstools libxrender1 libasound2 libatk1.0 libice6

Export a SHELL variable.

export SHELL=/bin/bash

Install unzip.

apt-get install unzip

Get your container id with:

sudo docker ps -a

Commit your changes.

sudo docker commit <container_id> ubuntu-firefoxos

Stop and remove the container.

sudo docker stop <container_id>
sudo docker rm <container_id>

Restart the container with a few more flags.

sudo docker run -t -i --privileged --expose 5037 -v /dev/bus/usb:/dev/bus/usb -v /host/data:/container/data ubuntu-firefoxos /bin/bash

The --privileged --expose 5037 -v /dev/bus/usb:/dev/bus/usb options are required for adb to be able to find your device.

Before building, make sure you enable the remote debugging mode on your phone.

Open the Settings app, then Device Information > More Information > Developer.
In the developer menu, check "Remote debugging".

Then you are good to go:

cd /container/data/B2G
./config.sh flame
./build.sh
./flash.sh

Rethinkdbdash for Node.js 0.10.26 posted on 29 March 2014

I just released two packages of rethinkdbdash

  • rethinkdbdash for Node.js 0.10.26
  • rethinkdbdash-unstable for Node.js 0.11.10 (and 0.11.9)

I wrote rethinkdbdash two months ago to improve the syntax of ReQL in the Node.js driver by providing

  • promises (and testing them with generators)
  • a native/automatic connection pool

While you cannot use generators with the stable version of Node.js, the connection pool is a reason good enough to make this driver available for the stable Node.js. You basically never have to deal with connections.

For those who want to know what the syntax looks like, here it is:

var r = require('rethinkdbdash')();

r.table("comments").get("eef5fa0c").run().then(function(result) {
    console.log(result);
}).error(function(error) {
    console.log(error);
})

Compared to the one with the official driver:

var r = require('rethinkdb');

r.connect({}, function(error, connection) {
    r.table("comments").get("eef5fa0c").run(function(error, result) {
        if (err) {
            console.log(error)
        }
        else {
            console.log(result);
        }
        connection.close();
    })
})

Note: If you were using rethinkdbdash with Node 0.11.10, please switch to rethinkdbdash-unstable.

First experience on Digital Ocean - Updating Archlinux posted on 14 March 2014

I have been using a dedicated server at OVH for a few years now, and the quality of their service has become worse and the last incidents prompted me to look for a new server.
Digital Ocean claims that they are user-friendly and since it is quite cheap, I just gave it a try.

Subscribing, setting up 2 factor authentification, starting a droplet was a blast. I picked Archlinux, and less than one minute after, my droplet was up and running.

The Arch image is quite old (June 2013) and updating the system is a little more tricky than just running pacman -Syu.
These instructions were written a few hours after the installation, so they may be slightly inaccurate.

First, update the whole system. Because Arch merged /bin, /sbin into /usr/bin and /lib into /usr/lib, you cannot just run pacman -Syu. Run instead:

pacman -Syu --ignore filesystem,bash
pacman -S bash
pacman -Su

Then remove netcfg and install netctl.

pacman -R netcfg
pacman -S netctl

Run ip addr to see your interface. In my case it was enp0s3

Create a config file /etc/netctl/enp0s3 with

Interface=enp0s3
Connection=ethernet
IP=static
Address=('<droplet_ip>/24')
Gateway='<gateway>'
DNS=('8.8.4.4', '8.8.8.8')

Enable the interface

netctl enable enp0s3

Then update the kernel via the web interface.

The network interface is going to change to something like ens3. Move /etc/netctl/enp0s3 to /etc/netctl/ens3 and change the Interface field.

Update /lib/systemd/system/sshd.service to be sure that the ssh daemon doesn't fail on boot

[Unit]
Description=OpenSSH Daemon
Wants=sshdgenkeys.service
#After=sshdgenkeys.service
After=network.target

[Service]
ExecStart=/usr/bin/sshd -D
ExecReload=/bin/kill -HUP $MAINPID
KillMode=process
Restart=always

[Install]
WantedBy=multi-user.target

Reboot and your server should be up to date.

And that's it for updating Arch. It was not the easiest updates, but nothing impossible. It would have been nice if Digital Ocean was provided an up to date Arch image though.


Note: You can probably directly set the network interface to ens3.
In the worst case you can still access your machine with Digital Ocean's web shell and fix things there.