Week 43

Published on Author malmLeave a comment

Self-referential data visualisation V: Docker

Over the last four weeks I’ve developed a simple Python-based data visualisation pipeline for my blog stats.  This week, as promised at the end of last week’s installment, I will outline how to deploy this setup to a Digital Ocean virtual machine or ‘droplet’ as they refer to it in their documentation.   The starting point for this exercise will be the recipe for creating a droplet using Docker Machine outlined last month.  The droplet in this case will be called flaskapps. Once this recipe is followed, we should be able to see flaskapps appear as our active VM after employing the appropriate Docker environment variables:

$ docker-machine ls
NAME ACTIVE DRIVER STATE URL SWARM
flaskapps * digitalocean Running tcp://178.62.101.7:2376

Dockerising Flask

Now we can proceed with building a docker image called flaskapp that will run our previously developed Flask app.  To do that we start by creating a local subdirectory called web and a Dockerfile within it. The Dockerfile will encapsulates all the steps required to build a Docker image that will manage our Flask app together with all its dependencies. The two key dependencies are: i) a new Python utility script called blog6.py which is a very slightly modified version of the blog5.py covered last week, ii) an adjusted Flask file app.py that imports blog6.py.  If you want to follow the recipe, you will also need to create local .dbuser, .dbpasswd and .dbendpoint files corresponding to a Mongolab cloud database.  Then run blog6.py locally in order to populate that database.  The base image used to build our container is the standard Docker ubuntu image.   When building Docker containers it’s usually a good idea to start with a vanilla distro image and then apt-get and pip install all other required container components on top. Here is the complete Dockerfile:

# Set the base image to Ubuntu
FROM ubuntu
# File Author / Maintainer
MAINTAINER Mal Minhas
# Replace default command interpreter from sh to bash
RUN ln -snf /bin/bash /bin/sh
# Add the application resources URL
RUN echo "deb http://archive.ubuntu.com/ubuntu/ \
    $(lsb_release -sc) main universe" >> /etc/apt/sources.list
# Update the sources list
RUN apt-get update
# Create and set up the application folder which is
# where requirements.txt will be and CMD will execute
RUN mkdir -p /usr/src/flaskapp
COPY . /usr/src/flaskapp
VOLUME ["/usr/src/flaskapp"]
WORKDIR /usr/src/flaskapp
# Install basic applications
RUN apt-get update && apt-get install -y \
    build-essential
    python \
    python-dev \
    python-distribute \
    python-pip \
    python-pandas
# Do you really need virtualenv in a docker container?
RUN pip install virtualenv virtualenvwrapper
ENV WORKON_HOME /usr/src/flaskapp
RUN /bin/bash -c "source /usr/local/bin/virtualenvwrapper.sh \
    && mkvirtualenv flaskapp \
    && workon flaskapp"
# Now pip install all dependencies
RUN pip install -r requirements.txt
RUN python -m nltk.downloader stopwords
EXPOSE 5000
CMD ["python", "/usr/src/flaskapp/app.py"]

The requirements.txt file referenced here simply details all the Python dependencies to be pip installed into the image:

Flask >=0.10.1
Jinja2 >=2.7.3
MarkupSafe >=0.23
Werkzeug >=0.10.4
gunicorn >=19.3.0
requests >=2.8.1
pymongo >=3.0.3
beautifulsoup4 >=4.4.1
nltk >=3.0.3
bokeh >=0.10.0
python-nvd3 >=0.13.10
simplejson >=3.8.0
ggplot >=0.6.8
vincent >=0.4.4

The command used to build this flaskapp image from within the web subdirectory is as follows:

$ docker build -t flaskapp .

Building the flaskapp image will take a few minutes.  Once done, we can use this image to run a container called web in daemonised mode mapping port 5000 on the host to the container TCP port as follows:

$ docker run -d -p=5000:5000 --name web flaskapp

At this point you should be able to see the running web container in a docker ps listing thus:

$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7eab9f2b5d3f flaskapp "python /usr/src/flas" 24 hours ago Up 24 hours 0.0.0.0:5000->5000/tcp web

If you followed the recipe to this point, you should have successfully deployed a Docker container running a Flask app with non-trivial Python dependencies to your target Digital Ocean droplet.  It’s a useful idea to leverage the newly introduced Digital Ocean floating IP functionality to assign a floating IP address to the container and tie that floating IP to a DNS address.  The container setup I deployed is at http://labs.malm.co.uk:5000.  Both blog6.py and corresponding Flask app.py are available for inspection together with the Dockerfile.  The latter follows the same no-frills format outlined last week and has now been further modified to integrate the ggplot png as base64-encoded content.  That allows it to be served as a single file that contains everything needed to generate all required HTML output.  Besides being somewhat agricultural by design, this solution retains all the limitations highlighted last week.  The container also exposes raw Flask port 5000. For a production environment it is better to expose via an nginx container linked to the Flask one.  Next week I may look at that and also at making the site a bit nicer through the addition of CSS and JavaScript.

visualisationsFlask

Docker Images and Containers

The standard explanation of docker architecture doesn’t really provide deep insight into precisely how docker containers and images relate to each other.  It tends instead to focus on the distinction between docker client and host as follows:

DockerWhale

This is something of a problem because, as with any technology, it’s vital to understand what is really going on under the hood.  In the case of containers, the technology really harks back to the early days of UNIX and chroot jails. Docker has essentially revived and modernised the concept as a compelling alternative to more heavyweight VMs. In terms of the specific detail of how Docker operates with containers and images, this recent publication is hugely helpful. It provides a detailed visual explanation of how Docker works under the hood using a combination of read-only and read-write layers integrated via AUFS unification file system. Docker containers are built from read-only unioned Docker image layers:

Manufacturers and Devices

  • What happens next after smartphones?  Benedict Evans suggests that there still doesn’t appear to be any universal product category on the horizon that can rival the size and scale of the global phone industry.  Hyped arenas like IoT and wearables seem more like smartphone adjuncts than pastures new.
  • HTC received something of a critical panning for releasing a flagship phone the A9 that not only looks like an iPhone6 but is also named after its chipset.

wd-sd

Google and Android

16GB internal with 2GB RAM, Qualcomm Snapdragon 410, 64-Bit 1.2 GHz CPU, Android L v5.0, Upgradeable to Android M v6.0

81p5pwKNKhL._SL1500_

Apps and Services

FacebookChina

  • Facebook is so dominant right now across the world that it’s hard to imagine how it might ever be usurped.  This HuffPo opinion piece makes a game attempt nevertheless suggesting we look at Tsu.co, “a Facebook with a conscience , whose founders believe users should be compensated for the content they create.”  It may be insignificant today but the site has gained nearly four million users in just over six months.  Besides it’s worth considering the playbook followed by Facebook itself:

It’s almost unimaginable at the moment to think that people would abandon Facebook with the idea of switching to a more user-friendly, and user-protective site. But it wasn’t too long ago folks thought Myspace was the end-all-be-all of the social media landscape. .. When you compare Facebook to Tsu, it definitely appears as if Facebook is The Matrix, and Tsu is Neo.

  • The director of platform products at leading news site Quartz provides some interesting insights after 5 years spent developing mobile news apps.  Which basically boil down to an overall recommendation not to try to build one for the following reasons:
    • Native apps are difficult to build and maintain
    • Growing your customer base is slow and expensive
    • Most of your users won’t use your app
    • Anything a news app can do, a mobile website can do easier and better

paper-facebook-app

An app made to work with this headset will be available on November 5; the bundle of Google cardboard will be shipped to newspaper subscribers the following weekend.

  • Lookup is an interesting Indian startup offering a chat app that aims to connect retailers and customers.   As with fellow Asian OTT messaging giants WeChat and KakaoTalk, Lookup integrates messaging with retail functionality.  It has just secured Series A funding and it will be interesting  to see how Lookup develops within the promising “social retail” (“wetail”?) space over the coming months.

WeChat features

China and Russia

Xiaomi’s marketing chief Tony Wei says that new software sent to early testers will generate thousands of reports back overnight, and this openness in turn allows them to try out, test, and fix the critical source of their commercial advantage, a version of the operating system that lets them tie their own services to the user’s device. These are not mere bug reports, of the sort most software now generates automatically. These are user reviews, questions not just about the technical aspects of MIUI but about which features the user likes, dislikes, or wants to see in the future.

  • Both Huawei and Xiaomi have been major beneficiaries of a dramatic reduction in user smartphone lifecycle in China over the last few years.  A recent Technode article provides valuable insights into this so-called “smartphone upcycling” phenomenon. 

“Researches show that Chinese smartphone users change their smartphones once every 29-months in 2011, but the period has been shorted to 18 months now. Over 20% of Chinese users will update for a newer phone within one year, while only 8.4% would do so within two years.”

Without an official Chinese name, Facebook gets called a lot of things. But one of the easiest ways to transliterate the sounds of the English word is Feisibuke. The only problem? That means “must die/death is inevitable.”

Slide4

Russia’s ministry of communications and Roskomnadzor, the national internet regulator, ordered communications hubs run by the main Russian internet providers to block traffic to foreign communications channels by using a traffic control system called DPI. … The objective was to see whether the Runet – the informal name for the Russian internet – could continue to function in isolation from the global internet.

Security

a hacker within a few meters of a Fitbit device could exploit open Bluetooth ports to place an infected packet on to it, which would transfer to a computer upon syncing later.

if a car crashes after it’s been hacked, who’s liable – the driver, the manufacturer, the software developer, or the test house that assured it?  I should have been a lawyer…

  • One of the key problems large companies have to contend with in elevating the priority and visibility of data security concerns is how to communicate key concerns at Board level. The TalkTalk Oct 22nd cyber attack ought to underscore the existential threat companies face from catastrophic data breaches but one suspects many will continue to keep their fingers crossed and hope it doesn’t happen to them.  This post offers some useful tips on how they ought to proceed starting with a security risk baselining activity ideally conducted by an external expert and moving on to a regular status report built from appropriate threat analysis tools:

Conducting a risk assessment of company data is likely to go a long way towards bringing the C-suite up to speed with the most pressing security issues in your organisation, but a little external support is likely to lend extra credence to your arguments. … Enlisting a reputable third party to provide the board with a risk profile assessment could be a crucial factor in convincing the board of the need for greater investment in information security.

  • A couple of weeks ago the blog highlighted Ethereum, a next-generation software proposition now rebranded as “a decentralized platform that runs smart contracts: applications that run exactly as programmed without any possibility of downtime, censorship, fraud or third party interference“.  As with any new technology, there’s always a dark side to such power:

One example is a contract offering a cryptocurrency reward for hacking a particular website.  Ethereum’s programming language makes it possible for the contract to control the promised funds. It will release them only to someone who provides proof of having carried out the job, in the form of a cryptographically verifiable string added to the defaced site.

Cloud and Digital

“the generational shift long promised by cloud advocates is finally, irreversibly underway. … That shift is away from “legacy” data centers built on x86 servers, VMware-managed hypervisors, SQL databases from Oracle, and storage hardware provided by EMC. Replacing all that are web-scale (or at least wannabe web-scale) technologies based on containers, commodity hardware, NoSQL databases of various kinds, and flash storage. The new infrastructure is cheaper, easier to scale up to large volumes of data and computation, and more flexible and agile.

Machine Learning

  • Machine Learning is hot right now doubtless aided by articles like this one which quotes Sundar Pichai positioning the technology at the heart of what Google does next:

”Machine learning is a core, transformative way by which we’re rethinking everything we’re doing. … Our investments in machine learning and artificial intelligence are a priority for us. … We’re thoughtfully applying it across all our products, be it search, ads, YouTube, or Play.   We’re in the early days, but you’ll see us in a systematic way think about how we can apply machine learning to all these areas.”

  • Which in turn links to this very handy and approachable introduction to machine learning and deep neural nets:

there are, without naming names, some serious examples of analytics and BI companies taking the same old software and slapping a “machine learning” label on it simply because it sounds more robust or complex than data analytics.

  • Machine learning per se is not a new advance – the theory has been broadly understood for decades as “really just the very advanced application of statistics to learning to identify patterns in data and then make predictions from those patterns“.  What has changed is the amount of compute power that can be applied which is allowing machine learning to “reshape our world” in five key ways which will have increasingly profound ramifications:
    1. Machines can see.
    2. Machines can read.
    3. Machines can listen.
    4. Machines can talk.
    5. Machines can write.

“These skills are beginning to show that computers can now boldly go into realms that were once considered solidly the domain of humans. While the technology still isn’t perfect in many cases, the very concept of machine learning — that machines can continuously and tirelessly improve, they will get better.”

We’ll take a powerful, 140-million-parameter state-of-the-art Convolutional Neural Network, feed it 2 million selfies from the internet, and train it to classify good selfies from bad ones.

Wordsmith takes your data, follows your writing style and turns it into readable content

in fact, while many people fear a so-called “robot apocalypse” aimed directly at extinguishing our civilization, I personally feel that the real danger to our ongoing existence lies in the potential for us to be collateral damage as advanced AGIs battle it out for supremacy; we may find ourselves in the line of fire. Indeed, building a safe AI will be a monumental — if not intractable — task.

Wearables and the Internet of Things

The Q3 figures, published by the Federation of the Swiss Watch Industry, reveal an 9.9% drop in watch exports for September, leading to an 8.5% slide for the quarter overall.

  • Samsung’s SAMI is representative of the current state of play in terms of IoT platform thinking.   Right now, the market is in something of a state of flux in terms of technology options.  This dynamic will likely coalesce around the four main platform players with serious cloud scale (Microsoft Azure, AWS IoT, IBM Bluemix and Google App Engine) with the niche offerings such as SAMI likely to fall by the wayside in comparison:

“Data-driven Development (D3) platform for receiving, storing and sending data to/from IoT devices. Any device can send data in various formats which is then normalized into a JSON format and stored in the cloud.”

Data Visualisation and Analytics

  • Over the last few weeks, the blog has centred around various different data visualisation options.   As a bonus follow-on, this post on datavisualisation.com compares Tableau, SPSS, R, Excel, Matlab with the JavaScript D3 and a couple of the Python approaches I used:

  • Interesting post on how the International Consortium of Investigative Journalists use a web-based graph visualisation tool called Linkurious along with graph database Neo4j and ElasticSearch to visualise the links between data.  This technology is vital to progressing the rapidly developing field of data journalism.

InfluenceMapping

Software Engineering

Screen Shot 2015-10-16 at 10.39.59 AM

Product Management

  • Great post by Steven Sinofsky on how to organise product management from mission through goals to granular projects and tasks in order to “get the right stuff done“.   He emphasises the need for product management to operate within inevitable resource constraints and the crucial importance of underpinning all ongoing activity with data.  In his case, he suggests Excel should be your go-to tool as it offers “the right level of complexity for projects from 10 to 5000 people in my experience“.  Which is quite something when you think about the fact that Sinofsky used to lead the whole Windows division at Microsoft:

“Throughout this whole system there is ongoing telemetry that is called upon to support the company with reliable data upon which to make decisions. … The most successful organizations are also fully instrumented organizations. Everything about code, customers, and overall engagement has telemetry.”

“much of what allows great technology companies to grow incredibly fast and become so valuable also makes them almost impossible to turn around when they falter in any sort of traditional sense.”

Startups and Work

  • Interesting NYT article highlighting that the growth areas for job over the last few decades have mostly been in areas which require a combination of social and maths/technical skills.  It emphasises the importance of cultivating good social skills at a young age, not just concentrating on passing exams:

The extent to which jobs required social skills grew 24 percent between 1980 and 2012, he found, while jobs requiring repetitive tasks, like garbage collecting, and analytical tasks that don’t necessarily involve teamwork, like engineering, declined.

JobChanges

  • Startups don’t conform to platonic ideals when it comes to software:

Futurology

  • The investment editor from the Telegraph takes a look in his crystal ball and discerns that the following four megatrends are suitable for investment over the next few decades: robots, life extension, IoT and the sharing economy.

Culture and Society

  • HBR on the end of expertise at least in terms of its market value. It’s being eroded apparently by a combination of increasingly sophisticated AI and mass amateurism.  To adapt a key meme from Nathan Barley, perhaps the robots and the idiots have won?

Given the widening distance (economically, socially and geographically) between the super-rich and the rest of us, the solidifying barriers to entry into the upper echelons of professional and business employment, and the growing acceptability of demonising members of the “precariat” with the very least resources, the 21st century is likely to be marked by increasingly disruptive challenges to the social fabric. The old class war may be over: the new politics of class is only just beginning.

A friend calls unexpected connections with lost loved ones “winks,” and finding Google Maps photos of my mother felt like a wink of monumental proportions.

“This job teaches you a lot,” he said. “You learn whatever material stuff you have you should use it and share it. Share yourself. People die with nobody to talk to. They die and relatives come out of the woodwork. ‘He was my uncle. He was my cousin. Give me what he had.’ Gimme, gimme. Yet when he was alive they never visited, never knew the person. From working in this office, my life changed.”

  • One suspects Steve Jobs would have concurred.  Arguably the most powerful lesson Jobs took from his years of exploring Zen Buddhism “was to accept death as an inevitable part of life, which served him well when he learned that his own death was imminent“. And perhaps the greatest expression of that was in his famous Stanford commencement speech. It remains an enduring and remarkable testament:

No one wants to die. Even people who want to go to heaven don’t want to die to get there. And yet death is the destination we all share. No one has ever escaped it. And that is as it should be, because Death is very likely the single best invention of Life. It is Life’s change agent. It clears out the old to make way for the new. Right now the new is you, but someday not too long from now, you will gradually become the old and be cleared away. Sorry to be so dramatic, but it is quite true.

6216792564_c9347462a2_b-690x320