Sections
- Self-referential data visualisation V: Docker
- Manufacturers and Devices
- Google and Android
- Apps and Services
- China and Russia
- Security
- Cloud and Digital
- Machine Learning
- Wearables and the Internet of Things
- Data Visualisation and Analytics
- Software Engineering
- Product Management
- Startups and Work
- Futurology
- Culture and Society
Self-referential data visualisation V: Docker
[avatar user=”malm” size=”small” align=”left” link=”file” /]
Over the last four weeks I’ve developed a simple Python-based data visualisation pipeline for my blog stats. This week, as promised at the end of last week’s installment, I will outline how to deploy this setup to a Digital Ocean virtual machine or ‘droplet’ as they refer to it in their documentation. The starting point for this exercise will be the recipe for creating a droplet using Docker Machine outlined last month. The droplet in this case will be called flaskapps.
Once this recipe is followed, we should be able to see flaskapps
appear as our active VM after employing the appropriate Docker environment variables:
$ docker-machine ls NAME ACTIVE DRIVER STATE URL SWARM flaskapps * digitalocean Running tcp://178.62.101.7:2376
Dockerising Flask
Now we can proceed with building a docker image called flaskapp
that will run our previously developed Flask app. To do that we start by creating a local subdirectory called web
and a Dockerfile
within it. The Dockerfile
will encapsulates all the steps required to build a Docker image that will manage our Flask app together with all its dependencies. The two key dependencies are: i) a new Python utility script called blog6.py
which is a very slightly modified version of the blog5.py
covered last week, ii) an adjusted Flask file app.py
that imports blog6.py
. If you want to follow the recipe, you will also need to create local .dbuser
, .dbpasswd
and .dbendpoint
files corresponding to a Mongolab cloud database. Then run blog6.py
locally in order to populate that database. The base image used to build our container is the standard Docker ubuntu
image. When building Docker containers it’s usually a good idea to start with a vanilla distro image and then apt-get
and pip install
all other required container components on top. Here is the complete Dockerfile
:
# Set the base image to Ubuntu FROM ubuntu # File Author / Maintainer MAINTAINER Mal Minhas # Replace default command interpreter from sh to bash RUN ln -snf /bin/bash /bin/sh # Add the application resources URL RUN echo "deb http://archive.ubuntu.com/ubuntu/ \ $(lsb_release -sc) main universe" >> /etc/apt/sources.list # Update the sources list RUN apt-get update # Create and set up the application folder which is # where requirements.txt will be and CMD will execute RUN mkdir -p /usr/src/flaskapp COPY . /usr/src/flaskapp VOLUME ["/usr/src/flaskapp"] WORKDIR /usr/src/flaskapp # Install basic applications RUN apt-get update && apt-get install -y \ build-essential python \ python-dev \ python-distribute \ python-pip \ python-pandas # Do you really need virtualenv in a docker container? RUN pip install virtualenv virtualenvwrapper ENV WORKON_HOME /usr/src/flaskapp RUN /bin/bash -c "source /usr/local/bin/virtualenvwrapper.sh \ && mkvirtualenv flaskapp \ && workon flaskapp" # Now pip install all dependencies RUN pip install -r requirements.txt RUN python -m nltk.downloader stopwords EXPOSE 5000 CMD ["python", "/usr/src/flaskapp/app.py"]
The requirements.txt
file referenced here simply details all the Python dependencies to be pip installed
into the image:
Flask >=0.10.1 Jinja2 >=2.7.3 MarkupSafe >=0.23 Werkzeug >=0.10.4 gunicorn >=19.3.0 requests >=2.8.1 pymongo >=3.0.3 beautifulsoup4 >=4.4.1 nltk >=3.0.3 bokeh >=0.10.0 python-nvd3 >=0.13.10 simplejson >=3.8.0 ggplot >=0.6.8 vincent >=0.4.4
The command used to build this flaskapp
image from within the web
subdirectory is as follows:
$ docker build -t flaskapp .
Building the flaskapp
image will take a few minutes. Once done, we can use this image to run a container called web
in daemonised mode mapping port 5000 on the host to the container TCP port as follows:
$ docker run -d -p=5000:5000 --name web flaskapp
At this point you should be able to see the running web
container in a docker ps
listing thus:
$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7eab9f2b5d3f flaskapp "python /usr/src/flas" 24 hours ago Up 24 hours 0.0.0.0:5000->5000/tcp web
If you followed the recipe to this point, you should have successfully deployed a Docker container running a Flask app with non-trivial Python dependencies to your target Digital Ocean droplet. It’s a useful idea to leverage the newly introduced Digital Ocean floating IP functionality to assign a floating IP address to the container and tie that floating IP to a DNS address. The container setup I deployed is at http://labs.malm.co.uk:5000. Both blog6.py
and corresponding Flask app.py
are available for inspection together with the Dockerfile
. The latter follows the same no-frills format outlined last week and has now been further modified to integrate the ggplot png as base64-encoded content. That allows it to be served as a single file that contains everything needed to generate all required HTML output. Besides being somewhat agricultural by design, this solution retains all the limitations highlighted last week. The container also exposes raw Flask port 5000. For a production environment it is better to expose via an nginx container linked to the Flask one. Next week I may look at that and also at making the site a bit nicer through the addition of CSS and JavaScript.
Docker Images and Containers
The standard explanation of docker architecture doesn’t really provide deep insight into precisely how docker containers and images relate to each other. It tends instead to focus on the distinction between docker client and host as follows:
This is something of a problem because, as with any technology, it’s vital to understand what is really going on under the hood. In the case of containers, the technology really harks back to the early days of UNIX and chroot jails. Docker has essentially revived and modernised the concept as a compelling alternative to more heavyweight VMs. In terms of the specific detail of how Docker operates with containers and images, this recent publication is hugely helpful. It provides a detailed visual explanation of how Docker works under the hood using a combination of read-only and read-write layers integrated via AUFS unification file system. Docker containers are built from read-only unioned Docker image layers:
Manufacturers and Devices
- What happens next after smartphones? Benedict Evans suggests that there still doesn’t appear to be any universal product category on the horizon that can rival the size and scale of the global phone industry. Hyped arenas like IoT and wearables seem more like smartphone adjuncts than pastures new.
- HTC received something of a critical panning for releasing a flagship phone the A9 that not only looks like an iPhone6 but is also named after its chipset.
HTC A9. Named after the processor of the phone it mirrors lol. pic.twitter.com/iHRxAmXZM2
— Jonathan Morrison (@tldtoday) October 20, 2015
- Western Digital are acquiring SanDisk for $19 billion in a mega storage industry merger.
Google and Android
- The Information muses on the future of Android suggesting that operator-independent OEMs like BLU Products offer a likely view of where the platform is heading. BLU (which stands for “Bold Like Us”) is a US brand assembling high spec 4G LTE devices on leading edge Qualcomm silicon retailing at very low prices. They’re able to do this by having very little in the way of organisational infrastructure and resemble UK smartphone startup WileyFox in distributing unlocked products via Amazon Prime. BLU’s top end smartphone model at present is the BLU Vivo Air which has a very impressive spec at a sub-$200 price point:
16GB internal with 2GB RAM, Qualcomm Snapdragon 410, 64-Bit 1.2 GHz CPU, Android L v5.0, Upgradeable to Android M v6.0
Apps and Services
- Facebook founder Mark Zuckerberg’s talk to students at Tsinghua university in Beijing is remarkable and unprecedented for a Silicon Valley CEO of US origin for being conducted entirely in Mandarin for 20 minutes without notes. It will have a profound cultural impact within China.
- Facebook is so dominant right now across the world that it’s hard to imagine how it might ever be usurped. This HuffPo opinion piece makes a game attempt nevertheless suggesting we look at Tsu.co, “a Facebook with a conscience , whose founders believe users should be compensated for the content they create.” It may be insignificant today but the site has gained nearly four million users in just over six months. Besides it’s worth considering the playbook followed by Facebook itself:
It’s almost unimaginable at the moment to think that people would abandon Facebook with the idea of switching to a more user-friendly, and user-protective site. But it wasn’t too long ago folks thought Myspace was the end-all-be-all of the social media landscape. .. When you compare Facebook to Tsu, it definitely appears as if Facebook is The Matrix, and Tsu is Neo.
- The director of platform products at leading news site Quartz provides some interesting insights after 5 years spent developing mobile news apps. Which basically boil down to an overall recommendation not to try to build one for the following reasons:
- Native apps are difficult to build and maintain
- Growing your customer base is slow and expensive
- Most of your users won’t use your app
- Anything a news app can do, a mobile website can do easier and better
- Undeterred by that advice, it appears that Facebook are making second attempt to break the mobile news app space with a proposition called Notify that seems to broadly take aim at Twitter’s Moments. Their first was with standalone mobile newspaper app Paper.
- Meanwhile the NYT, which is fast morphing into an online property, has partnered with Google to offer its news readership a free Google Cardboard VR headset and partner app:
An app made to work with this headset will be available on November 5; the bundle of Google cardboard will be shipped to newspaper subscribers the following weekend.
- Lookup is an interesting Indian startup offering a chat app that aims to connect retailers and customers. As with fellow Asian OTT messaging giants WeChat and KakaoTalk, Lookup integrates messaging with retail functionality. It has just secured Series A funding and it will be interesting to see how Lookup develops within the promising “social retail” (“wetail”?) space over the coming months.
- It will face a daunting challenge competing with WeChat not only in terms of brand but also proposition completeness:
China and Russia
- Clay Shirky arch-proponent of the theory of mass amateurisation whereby “amateurs undertake increasingly complex tasks resulting in accomplishments that would seem daunting within the traditional institutional model” outlines how Xiaomi have spectacularly leveraged it to become China’s Apple overnight:
Xiaomi’s marketing chief Tony Wei says that new software sent to early testers will generate thousands of reports back overnight, and this openness in turn allows them to try out, test, and fix the critical source of their commercial advantage, a version of the operating system that lets them tie their own services to the user’s device. These are not mere bug reports, of the sort most software now generates automatically. These are user reviews, questions not just about the technical aspects of MIUI but about which features the user likes, dislikes, or wants to see in the future.
- Even so, Xiaomi are not having it all their own way. Canalys are reporting that “Huawei became China’s top smart phone vendor in Q3 2015, while Xiaomi fell to second place“.
- Both Huawei and Xiaomi have been major beneficiaries of a dramatic reduction in user smartphone lifecycle in China over the last few years. A recent Technode article provides valuable insights into this so-called “smartphone upcycling” phenomenon.
“Researches show that Chinese smartphone users change their smartphones once every 29-months in 2011, but the period has been shorted to 18 months now. Over 20% of Chinese users will update for a newer phone within one year, while only 8.4% would do so within two years.”
- Perspectives on how the Chinese see Western companies in terms of their names. Presumably Mark Zuckerberg is well aware of this now:
Without an official Chinese name, Facebook gets called a lot of things. But one of the easiest ways to transliterate the sounds of the English word is Feisibuke. The only problem? That means “must die/death is inevitable.”
- Essential report on the state of WeChat in China full of great data like this one outlining a distinct usage skew depending on city size:
- A Goldman Sachs analyst suggests that China’s economy is “totally distorted” under Xi Jinping with China’s ratio of investment to gross domestic product (GDP) particularly singled out.
- A few weeks ago the blog covered Russia’s attempts to assert “digital sovereignty” via the Red Web. This week another article outlines what appears to be a dry run by the Russian authorities in cutting the country off from the rest of the Internet:
Russia’s ministry of communications and Roskomnadzor, the national internet regulator, ordered communications hubs run by the main Russian internet providers to block traffic to foreign communications channels by using a traffic control system called DPI. … The objective was to see whether the Runet – the informal name for the Russian internet – could continue to function in isolation from the global internet.
Security
- Symantec’s CTO says failing to protect IoT devices represents the “biggest threat to tech“.
- Right on cue, a researcher demonstrated a theoretical walk-by hack of a Fitbit Flex using a combination of Bluetooth and Python:
a hacker within a few meters of a Fitbit device could exploit open Bluetooth ports to place an infected packet on to it, which would transfer to a computer upon syncing later.
- And the complexity of managing security vulnerability realisation will be an order of magnitude greater in tomorrow’s software-defined cars:
if a car crashes after it’s been hacked, who’s liable – the driver, the manufacturer, the software developer, or the test house that assured it? I should have been a lawyer…
- One of the key problems large companies have to contend with in elevating the priority and visibility of data security concerns is how to communicate key concerns at Board level. The TalkTalk Oct 22nd cyber attack ought to underscore the existential threat companies face from catastrophic data breaches but one suspects many will continue to keep their fingers crossed and hope it doesn’t happen to them. This post offers some useful tips on how they ought to proceed starting with a security risk baselining activity ideally conducted by an external expert and moving on to a regular status report built from appropriate threat analysis tools:
Conducting a risk assessment of company data is likely to go a long way towards bringing the C-suite up to speed with the most pressing security issues in your organisation, but a little external support is likely to lend extra credence to your arguments. … Enlisting a reputable third party to provide the board with a risk profile assessment could be a crucial factor in convincing the board of the need for greater investment in information security.
- A couple of weeks ago the blog highlighted Ethereum, a next-generation software proposition now rebranded as “a decentralized platform that runs smart contracts: applications that run exactly as programmed without any possibility of downtime, censorship, fraud or third party interference“. As with any new technology, there’s always a dark side to such power:
One example is a contract offering a cryptocurrency reward for hacking a particular website. Ethereum’s programming language makes it possible for the contract to control the promised funds. It will release them only to someone who provides proof of having carried out the job, in the form of a cryptographically verifiable string added to the defaced site.
Cloud and Digital
- devd is “a small, self-contained, command-line-only HTTP server for developers“. Very useful for even smaller scale web hacking use cases than Flask.
- What Netflix’s plan to shut down it’s last data center reveals about the future of enterprise technology – it’s all going to be cloud:
“the generational shift long promised by cloud advocates is finally, irreversibly underway. … That shift is away from “legacy” data centers built on x86 servers, VMware-managed hypervisors, SQL databases from Oracle, and storage hardware provided by EMC. Replacing all that are web-scale (or at least wannabe web-scale) technologies based on containers, commodity hardware, NoSQL databases of various kinds, and flash storage. The new infrastructure is cheaper, easier to scale up to large volumes of data and computation, and more flexible and agile.
Machine Learning
- Machine Learning is hot right now doubtless aided by articles like this one which quotes Sundar Pichai positioning the technology at the heart of what Google does next:
”Machine learning is a core, transformative way by which we’re rethinking everything we’re doing. … Our investments in machine learning and artificial intelligence are a priority for us. … We’re thoughtfully applying it across all our products, be it search, ads, YouTube, or Play. We’re in the early days, but you’ll see us in a systematic way think about how we can apply machine learning to all these areas.”
- Which in turn links to this very handy and approachable introduction to machine learning and deep neural nets:
- Dataversity also consider whether we’re entering “the golden age of machine learning“ or just seeing a lot of folks opportunistically rebranding existing data products and features:
there are, without naming names, some serious examples of analytics and BI companies taking the same old software and slapping a “machine learning” label on it simply because it sounds more robust or complex than data analytics.
- Machine learning per se is not a new advance – the theory has been broadly understood for decades as “really just the very advanced application of statistics to learning to identify patterns in data and then make predictions from those patterns“. What has changed is the amount of compute power that can be applied which is allowing machine learning to “reshape our world” in five key ways which will have increasingly profound ramifications:
- Machines can see.
- Machines can read.
- Machines can listen.
- Machines can talk.
- Machines can write.
“These skills are beginning to show that computers can now boldly go into realms that were once considered solidly the domain of humans. While the technology still isn’t perfect in many cases, the very concept of machine learning — that machines can continuously and tirelessly improve, they will get better.”
- This post from Andrej Karpathy outlines an experiment of massive scale for testing computer vision using deep learning:
We’ll take a powerful, 140-million-parameter state-of-the-art Convolutional Neural Network, feed it 2 million selfies from the internet, and train it to classify good selfies from bad ones.
- And this TNW post outlines Wordsmith an AI proposition that writes articles for you:
- Where it all leads to is the great question, of course. Ultimately there has to be a level of disquiet about any advanced system that is explicitly designed to improve itself. The nagging concern is well articulated in this post somewhat baldly entitled “Here’s how a recursively self-improving AI could transform itself into a superintelligent machine” which introduces a new (at least to me) reason to be worried:
in fact, while many people fear a so-called “robot apocalypse” aimed directly at extinguishing our civilization, I personally feel that the real danger to our ongoing existence lies in the potential for us to be collateral damage as advanced AGIs battle it out for supremacy; we may find ourselves in the line of fire. Indeed, building a safe AI will be a monumental — if not intractable — task.
Wearables and the Internet of Things
- Guardian profile of Nest founder Tony Fadell and his plans to extend beyond smart thermostats.
- Clear evidence that the Apple Watch is starting to bite into the Swiss watch market:
The Q3 figures, published by the Federation of the Swiss Watch Industry, reveal an 9.9% drop in watch exports for September, leading to an 8.5% slide for the quarter overall.
- Samsung’s SAMI is representative of the current state of play in terms of IoT platform thinking. Right now, the market is in something of a state of flux in terms of technology options. This dynamic will likely coalesce around the four main platform players with serious cloud scale (Microsoft Azure, AWS IoT, IBM Bluemix and Google App Engine) with the niche offerings such as SAMI likely to fall by the wayside in comparison:
“Data-driven Development (D3) platform for receiving, storing and sending data to/from IoT devices. Any device can send data in various formats which is then normalized into a JSON format and stored in the cloud.”
Data Visualisation and Analytics
- Over the last few weeks, the blog has centred around various different data visualisation options. As a bonus follow-on, this post on datavisualisation.com compares Tableau, SPSS, R, Excel, Matlab with the JavaScript D3 and a couple of the Python approaches I used:
- Interesting post on how the International Consortium of Investigative Journalists use a web-based graph visualisation tool called Linkurious along with graph database Neo4j and ElasticSearch to visualise the links between data. This technology is vital to progressing the rapidly developing field of data journalism.
- Good HBR post on essential principles for understanding analytics. The article suggests these include ensuring you spend time on problem framing, work effectively with quants and understand and explore your data at least at a high level.
Software Engineering
- The irrepressible Adrian Rosebrock of PyImageSearch uses cv2 to process 91 years of Time Magazine cover images and produce the following decade by decade comparative amalgam:
- Swift “is supplanting Objective C as the de facto language for iOS development a lot faster than anyone expected” based on the latest popular Tiobe index data. Mind you it also has assembly language above both of them which seems rather counter-intuitive. The top 10 all seem about right though. Check out number 14 and 15 in the list:
Product Management
- Great post by Steven Sinofsky on how to organise product management from mission through goals to granular projects and tasks in order to “get the right stuff done“. He emphasises the need for product management to operate within inevitable resource constraints and the crucial importance of underpinning all ongoing activity with data. In his case, he suggests Excel should be your go-to tool as it offers “the right level of complexity for projects from 10 to 5000 people in my experience“. Which is quite something when you think about the fact that Sinofsky used to lead the whole Windows division at Microsoft:
“Throughout this whole system there is ongoing telemetry that is called upon to support the company with reliable data upon which to make decisions. … The most successful organizations are also fully instrumented organizations. Everything about code, customers, and overall engagement has telemetry.”
- Why it’s hard to fix failing tech companies once they hit their growth inflection point:
“much of what allows great technology companies to grow incredibly fast and become so valuable also makes them almost impossible to turn around when they falter in any sort of traditional sense.”
Startups and Work
- VC is increasingly being disrupted by technology-enabled crowdfunding. It still remains a largely undemocratic club though.
- Interesting NYT article highlighting that the growth areas for job over the last few decades have mostly been in areas which require a combination of social and maths/technical skills. It emphasises the importance of cultivating good social skills at a young age, not just concentrating on passing exams:
The extent to which jobs required social skills grew 24 percent between 1980 and 2012, he found, while jobs requiring repetitive tasks, like garbage collecting, and analytical tasks that don’t necessarily involve teamwork, like engineering, declined.
- Startups don’t conform to platonic ideals when it comes to software:
Software engineering at a startup. It's basically this. pic.twitter.com/GKkUj6oOkF
— Theodore Dziuba (@dozba) October 27, 2015
Futurology
- Unsurprisingly, Elon Musk’s view of the future involves autonomous cars and trips to Mars. And future generations that will “expect things from machines that we would never have thought to ask for“.
- The investment editor from the Telegraph takes a look in his crystal ball and discerns that the following four megatrends are suitable for investment over the next few decades: robots, life extension, IoT and the sharing economy.
Culture and Society
- HBR on the end of expertise at least in terms of its market value. It’s being eroded apparently by a combination of increasingly sophisticated AI and mass amateurism. To adapt a key meme from Nathan Barley, perhaps the robots and the idiots have won?
- The Guardian published an opinion piece on inequality and a new class war that will be fought over a ‘class ceiling’ separating the affluent from everyone else in society:
Given the widening distance (economically, socially and geographically) between the super-rich and the rest of us, the solidifying barriers to entry into the upper echelons of professional and business employment, and the growing acceptability of demonising members of the “precariat” with the very least resources, the 21st century is likely to be marked by increasingly disruptive challenges to the social fabric. The old class war may be over: the new politics of class is only just beginning.
- NYT on how the dead live on digitally for those left behind to “simultaneously brighten and ruin their days“:
- N.R. Kleinfield’s mournful and elegiac study published in the NYT explores the death of George Bell who died alone in New York unnoticed earlier this year. The slow piecing together of a life that declined to the point of singularity is beautifully handled and brings to mind Paul Auster’s brilliant debut The Invention of Solitude. This insight comes from one of the investigators tasked with dealing with the aftermath:
“This job teaches you a lot,” he said. “You learn whatever material stuff you have you should use it and share it. Share yourself. People die with nobody to talk to. They die and relatives come out of the woodwork. ‘He was my uncle. He was my cousin. Give me what he had.’ Gimme, gimme. Yet when he was alive they never visited, never knew the person. From working in this office, my life changed.”
- One suspects Steve Jobs would have concurred. Arguably the most powerful lesson Jobs took from his years of exploring Zen Buddhism “was to accept death as an inevitable part of life, which served him well when he learned that his own death was imminent“. And perhaps the greatest expression of that was in his famous Stanford commencement speech. It remains an enduring and remarkable testament:
No one wants to die. Even people who want to go to heaven don’t want to die to get there. And yet death is the destination we all share. No one has ever escaped it. And that is as it should be, because Death is very likely the single best invention of Life. It is Life’s change agent. It clears out the old to make way for the new. Right now the new is you, but someday not too long from now, you will gradually become the old and be cleared away. Sorry to be so dramatic, but it is quite true.