This is an infoq video where Marty Weiner and Yash Nelapati talk about decisions they took during their journey from the beginning up to now, I’ve found it very interesting because they highlighted some very concepts having real and relevant impacts despite their triviality.
Watching another video at infoq, Apache mesos have been mentioned and then let’s tale a quick look. As from the site, Apache Mesos is a cluster manager that provides efficient resource isolation and sharing across distributed applications, or frameworks. It can run Hadoop, MPI, Hypertable, Spark, and other applications on a dynamically shared pool of nodes. It is a distributed computing platform or we could think of it as a sort of distributed OS. It implements a Master/Slave architecture and has the following components:
- Master(s):one master is elected (Zookeepr cluster) among available masters. Master doesn’t do much, it mainly manages resources (CPU, memory, …), launches tasks on slaves, forwards status messages between tasks and framework
- Slave(s): it monitors individual tasks and reports status to the master, ensures that tasks don’t exceed resource limits. It executes tasks submitted by frameworks.
- Framework(s): it is for instance your application, it receives resource offers from master and launches tasks.
Example of frameworks are:
- Hadoop – batch processing
- Storm – stream processing
- Chronos – task scheduling
- Marathon or Aurora – long running services
Pyres – a Resque clone
Resque is a great implementation of a job queue by the people at github, unfortunately 😛 it’s written in Ruby and someone who works in python ported the code to python creating PyRes.You can put jobs (that can be any kind of class) on a queue and process them while watching the progress via your browser.
Pandas is a python library for doing data analysis, it is dast and lets you do exploratory work really quicky. This is a cookbook that gives you some concrete examples for getting started with pandas.
Here you can find a list of available dataset for download, I hope they can be useful 😛
Docker is an open source project to pack, ship and run any application as a lightweight container. Some people in my office pointed me to this project and it seems quite interesting. Let’s start trying to better understand what it really is.
This is a short description from the site
Docker containers are both hardware-agnostic and platform-agnostic. This means that they can run anywhere, from your laptop to the largest EC2 compute instance and everything in between – and they don’t require that you use a particular language, framework or packaging system. That makes them great building blocks for deploying and scaling web apps, databases and backend services without depending on a particular stack or provider.
Typically you can distribute applications and sandbox their execution using a virtual machines, for instance VMWare, Oracle VirtualBox and Amazon EC2 ami. Using this solution a developer should be allowed to package its application and distribute / depoly it with little effort. In practice it does not happen mainly for these reasons:
- Size: they may be very large and thus difficult to store and transfer
- Portability: one VM instance does not play very well with competitor solutions
By contrast, Docker relies on a different sandboxing method known as containerization. Unlike traditional virtualization, containerization takes place at the kernel level.
Docker builds on top of these low-level primitives to offer developers a portable format and runtime environment that solves all 4 problems.
Docker containers are small (and their transfer can be optimized with layers), they have basically zero memory and cpu overhead, they are completely portable and are designed from the ground up with an application-centric design. In addition because
docker operates at the OS level, it can still be run inside a VM!