Tuesday 25 November 2014

Creating a graph of software technologies with Docker, Neo4j, NodeJS and D3.js Part 1

Feedback highly appreciated


Okey! So what would I like to do?


I would like to experiment with some technologies, which were in my mind nowadays, and have fun. I might later swap some of the pieces (maybe Node to JVM platform or to another client side JS framework). The idea is to visualize the connections between software technologies using this stack.

The imagined problem and solution is the following. (It might turn up to be a good thing, but maybe it will just remain as a prototype.)

Usually we are interested in particular technologies and their relations to others so we start googling. We do this because we know what we want to achieve, but we don't know what tools do we have or we know some of the tools, but we want to see if there are comparable alternatives. We select usually by checking if the technology is free or not, is it mature enough for our purpose, is there a community behind it or maybe a known company, is it maintained frequently and we also sometimes need to predict for how long this technology will be around before a new comes. 

To visualize the technologies I would like to create a web application.

  • It will run in a browser so I need some JavaScript libs for sure. I always wanted to try out the powerful D3.js
  • On the server side I will go with Node.js for now as a middleware  (might swap to JVM platform later). It will provide an API for the client application. 
  • The data, as the whole model is basically a graph, will be stored in a graph database, for now, let's go with Neo4j.
  • I would like to put the server side pieces in lightweight containers. Docker will be perfect for this.
  • I will need to manage the containers and I don't want to do this manually, so I might use Fig or Flocker or Kubernetes or all :).

Let's start and see what happens :)


What types of connections we should consider?


Implements

It means that A technology (e.g. library, framework) implements the B specification or protocol.

Uses

This connection says that A technology uses the B. This is a transitive relation.

Extends

A technology extends B technology. This is a transitive relation.

Relates to

A technology is related to B technology. This is a symmetric relation.

A is an alternative of B

A technology is created for a similar purpose like B. This is a symmetric relation.

Later we can consider the inverted connections like contains, specifies etc. We have to do this carefully as it can speed up our queries because of more specific types, but it can also slow down in some cases. It really depends on the use cases and size. For now this is enough. To learn more about this, the Graph Databases book is a great reference.

We can model these relationships with a property model graph.


What do we need?

(Note: Installed Docker is required and some Linux distribution. Windows and Mac users have to use boot2docker and set up the port forwarding for their boot2docker controlled VM. Another sibling for Windows users is to use Spoon but that is not a Docker based platform.)
Lets build our stack from down to top.

We need a running Neo4j instance. Why not run it in a Docker container? We could move our instance whenever we want or reuse the image for staging environments, integration tests, or for adding new nodes. Unfortunately there is no official Neo4j Docker image on Docker Hub, but there is one which is quite popular created by tpires. Could be a good fit, but first let's check it's Dockerfile to see how the image was built.

It is based on dockerfile/java which is based on an ubuntu image.

Looks good!

To just try it if it works (and to get the dependencies) let's run what the author says:

We are saying here, hey Docker run tpires/neo4j image in a container please, and bind the host machine's 7474 port to this container's 7474 port. 

Great. Let's check if its running.

Yup!

Neo4j runs a webserver for us, so let's open our browser and type localhost:7474.


It works!

It would be good to separate the data from the functionality into two containers, not to loose the portability, so let's create a data-only container.
Create a Dockerfile.

Let's reuse ubuntu image and add a volume.

Build the image.

Run the image.

Bind the volume to our Neo4j container and run it. (Don't forget to stop it.)

Nice. 
Ok, so now we have a running database. Of course it is not production ready, but for this prototype it is enough.

Next we will add a middleware based on Node platform, which will call the Neo4j's REST API and add some business logic. We will do this in the next part.
(Note: If we would want, we could extend the Neo4j's REST API with writing extensions in Java using JAX-RS annotations.) 

Feedback highly appreciated 

9 comments:

  1. I tried this out but I don't seem to be able to access Neo4j browser on port 7474 on the host:

    $ docker ps
    CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
    034588bc6024 tpires/neo4j:latest "/bin/bash -c /launc 2 minutes ago Up 6 seconds 1337/tcp, 0.0.0.0:7474->7474/tcp neo4j2

    $ lsof -i :7474

    Any ideas?

    Cheers
    Mark

    ReplyDelete
    Replies
    1. Is localhost is your hosts file? Have you tried other commons like 127.0.0.1:7474 ?

      Delete
  2. Seems like I needed to use the VM's IP address and then the port number. Some weird Mac OS X-ism - http://viget.com/extend/how-to-use-docker-on-os-x-the-missing-guide

    ReplyDelete
    Replies
    1. Ah yes. Thanks for the url! I will edit that note about other OSs.

      Delete
  3. Hi Ogi,
    thank you for this really detailed blog post. I just had the same idea as you to set up a playground to be able to test some different graph databases. I followed all the steps but was not able to get localhost to answer to me :(
    After I read your blog I rechecked and recreated my steps and it works now - though not with localhost but with the ip of the VM (http://192.168.59.103:7474/browser/) which is much better for me
    I will start to play with my neo4j for now and will come back to your blog to compare our next steps later :)

    Thanks

    ReplyDelete
    Replies
    1. Hi Christina,

      Thank you for your feedback. I am glad this helped you.

      Let me know how it goes :)

      Thanks

      Delete
  4. This comment has been removed by a blog administrator.

    ReplyDelete

  5. Nodejs is an open source and cross platform environment for developing the server side web application. The main advantage of using nodejs is because of its fastness, single threaded and highly scalable.
    Node JS training in chennai | Node JS training institute in chennai

    ReplyDelete