Elements of Machine Learning I: Regression as function estimation
[avatar user=”malm” size=”small” align=”left” link=”file” /]
Sharp Sight Labs “one concept machine learning” post provides an excellent illustration why instead of “just diving in and building something“, it pays to spend time first understanding what is really going on under the hood. In the case of machine learning, the author suggests it comes down to one core concept:
The essential problem of machine learning is function estimation. When you’re doing machine learning (specifically, supervised learning), you’re essentially using computational techniques to reverse engineer the underlying function from the data points alone. You’re estimating an underlying function based only on training observations in a dataset.
The approach the author uses to illustrate his point involves examining a 2D data set with a progression of regression models from linear through polynomial. A regression model in this context meaning an equation that is able to provide a continuous prediction for the corresponding output value y for any given input value x. In other words produce a curve for y=f(x). The article provides some reference code in R. I thought it would be a useful exercise to try and re-implement the exercise in Python using numpy, scikit-learn, pandas and matplotlib. The diagrams below show my results – the top graph is the raw data, the second shows a linear regression and the third shows various degrees (eg. quadratic, cubic and other) of polynomial regression. The models are generated using scikit-learn (sklearn) and plotted using matplotlib. Here’s a snippet showing how the polynomial regression curves are created using a Ridge (least squares) classifier:
from sklearn.linear_model import Ridge from sklearn.preprocessing import PolynomialFeatures from sklearn.pipeline import make_pipeline plt.scatter(X, y, color='black') for deg in [2,3,4,5]: model = make_pipeline(PolynomialFeatures(degree=deg), Ridge()) model.fit(X, y) y_ = model.predict(X) plt.plot(X, y_, linewidth=2)
The key point here is that a number of progressively more sophisticated models can be used to generate graphs to see how close we can get to the “true function” that best represents the data distribution. The code used to generate the corresponding graphs below is freely available for inspection and modification on Bitbucket:
Artificial Intelligence and Machine Learning
- In a move of great significance for Google’s longer term ambitions, DeepMind are moving from Torch to TensorFlow as their core Machine Learning platform:
The move suggests that some of Google’s brightest AI minds are convinced of the promise of Google’s own open source software; TensorFlow is now good enough for DeepMind.
- The Nvidia Tesla P100 is a 15 billion transistor GPU aimed at improving the performance of the sort of deep learning applications TensorFlow is designed to support:
- Useful post explaining the concept of autoencoders (neural networks designed for data compression) and how they relate to word embeddings (vector representations of words in a text that ‘compress’ their relationships and meanings). The latter are a core element of many text processing machine learning models:
- Why are there so many hot AI startups in the UK right now? Mainly it seems because it’s the home of four of the best research universities in the field, Cambridge, Imperial, Oxford and UCL.
- Advances in the power of the technology are bound to impact the world of work. Truck drivers have been highlighted in the past as a likely early casualty of machine automation and that point is reiterated this week by a TechCrunch contributor involved in making it happen:
There are currently more than 1.6 million Americans working as truck drivers, making it the most common job in 29 states. … The loss of jobs representing 1 percent of the U.S. workforce will be a devastating blow to the economy. And the adverse consequences won’t end there. Gas stations, highway diners, rest stops, motels and other businesses catering to drivers will struggle to survive without them.
- Douglas Rushkoff suggests “replacing labor with algorithms” is dangerous and ultimately bad for your business though it’s hard to see companies resisting the tide on this:
If you’re using algorithms and big data to figure out your next product line rather than designers, what’s your competitive advantage? The other company is using that same data and probably hiring the same big data analytics company to figure out the future trend. So now you’ve been turned into a commodity.
- Law Profession John Danaher raises the additional concern that shorn of the meaning we get from our work, humanity may face a profound existential crisis and shift en masse to hedony in “one giant orgy of drugs and sadness.”
“For better or worse, a lot of our self-worth is connected to our work, So what would happen if traditional forms of labor were no longer available?”. In his recently published paper, “Will Life Be Worth Living in a World Without Work? Technological Unemployment and the Meaning of Life,” Danaher imagines a future where robots can fill pretty much any job on the planet.
- The US military at least won’t have to worry about technological unemployment any time soon – the Pentagon has indicated that “AI will not be used to fight wars alone” .
the Pentagon isn’t looking to create anything that would be able to rewrite its own code. Development will focus more on technology that aids a human in making a faster, better decision.
Apps and Services
- VentureBeat on why chatbot archives could represent a major security risk given time. It’s fair to assume that any conversation you have with a chatbot could in theory eventually be exploited:
we will tell chatbots our secrets. We will share information with them that we would never share with our friends. We will use them as repositories for important data that we know we need to remember. .. So data needs to be stored. And stored data can be hacked. It can be snooped on. It can be surveilled. It can be used for nefarious purposes. And it will. It’s only a matter of when.
- TheInformation present evidence that “shopping apps have the best chance of keeping users engaged“. More so that News or Banking apps.
WhatsApp is planning later this year to offer banks, airlines and other businesses the ability to send one-way messages to customers, people familiar with the plans said. It will be WhatsApp’s first official move to lure businesses onto the messaging app.
Mobile and the Internet of Things
- SirinLabs is a joint Western-Chinese consortium that have announced a $15k privacy-focussed cipher phone apparently going by the name Solarin. They have secured $72 million in seed funding according to some accounts. One of the main players behind the venture is Israeli entrepreneur Moshe Hogeg who was coincidentally one of the founders of the notorious “Yo” app. Hopefully he’s not planning to pull a similar sort of stunt with this device.
- Fairphone, the ethical smartphone startup, have released their Android-based OS as OSS. Fairphone 2 “Open Operating System” is free of Google software which also means you have to find alternative service clients.
- The Indian telecommunications ministry has indicated that from next year, all mobile phones sold in the country must include a ‘panic button’:
“The emergency feature would be activated by pressing a designated key on a smartphone or holding down the numbers ‘5’ or ‘9’ on a basic device.”
- Nokia acquired health wearable outfit Withings for €170 million after recently announcing that it wasn’t going back to devices.
- Vivint “wants to be the Apple of smart homes” and just secured $100million of funding to fuel its ambitions.
- Will.i.am has a new smartwatch called the Dial available for pre-order. It is designed to be entirely standalone and comes complete with a 3 SIM and its own digital assistant. The UI looks strange and unpromising – it is, after all, the successor to the widely panned Puls released a year ago.
The Dial is being touted as a “voice-first” device with its own virtual assistant, called AneedA. Like Siri, Cortana and Alexa, the idea is that you can ask for information and execute tasks while your hands are full. How AneedA compares to those alternatives is, for now, a mystery.
- The world’s most dangerous consumer product:
- The WISP is a battery free computer that harvests energy from RF:
- Why digitalisation “changes Entrepreneurship and everything” from Erkko Autio, Chair in Technology Venturing and Entrepreneurship at Imperial College.
- McKinsey on rethinking the rules of reorganisation suggests, among other things, it is imperative to “shake the core of the organisation“. It spends too much time on mature, low-growth activity.
- Fastest growing tech skills in 2016 include Cassandra, AWS, Jira, Salesforce, Azure and top of the list, Spark.
- The first rule of pricing is that you don’t talk about pricing, you feel it. This and other great insights into the GoodBetterBest model and how to use it to fix SaaS subscription pricing in this excellent Medium post.
- How many people does it take to ship software? Many more than are really needed with all too many of them non-practitioners warming their hands by the fire:
One place I worked as an architect had a project I estimated at 2 person-weeks of coding. Six months and dozens of meetings later to write a 120 page requirement documents resulted in the same estimate. We could have built the app 10 times over. But what would all the people at those meetings have done?
- The limitations of using IBM’s Swift in the browser were exposed for me this week when I tried to use Foundation to implement some code to do a simple HTTP GET and found the environment saying no:
- Do experienced programmers use Google frequently? Hell yes.
- Cyber warfare meets politics in the story of David Vincenzetti and his infamous Hacking Team:
the Hacking Team is among the world’s few dozen private contractors feeding a clandestine, multibillion-dollar industry that arms the world’s law enforcement and intelligence agencies with spyware. Comprised of around 40 engineers and salespeople who peddle its goods to more than 40 nations, the Hacking Team epitomizes what Reporters Without Borders, the international anti-censorship group, dubs the “era of digital mercenaries.”
- Two bytes to $951billion. The fascinating inside story of one of the biggest bank heists in history executed by compromising a SWIFT server in Bangladesh:
- Federated SSO to AWS using Google Apps – an admirably detailed worked example recipe courtesy of Amazon.
The End of the Affair?
- As Apple reported their first fall in sales for 13 years, the Guardian published a surprisingly tart “guide to everything that’s annoying about Apple” taking in repairs, demos, port differences and even their business model:
Dave Eggers’ dystopian novel details a utopian-sounding tech corporation whose ambitions extend to every aspect of people’s lives, anticipating, fulfilling and creating their every desire, to the extent that people never need to step outside the closed loop of control. Then find they can’t even if they want to. Apple has done its best to dispel such comparisons by building a massive new headquarters – in the shape of a circle.
- Quartz just focussed on just one, iTunes: “13 years old and still awful“. The signs of a cooling towards Apple is also evident in Mainland China where Q2 saw them nursing a 26% decline in sales revenue with the prospect of further falls on top in Q3.
- A brace of very good Benedict Evans posts outline: a) why the best in a particularly product category is often the last one before being overtaken by something even better, and b) why we are at the end of one major wave of mobile and entering another one with the mobile environment resembling the cut-throat commodity PC clone market of the 1980’s:
That world in due course led to companies like Dell – people who embraced the volume, low-margin commodity model and found an angle of their own. We’re starting to see equivalent model-creation now.
- The palpable sense that the entire consumer computing industry is at a critical transition point was emphasised by Google CEO Sundar Pichai in a prophetic letter to Alphabet shareholders. Pichai’s comments also underlined the priority for his own company:
“Over time, the computer itself — whatever its form factor — will be an intelligent assistant helping you through your day. We will move from mobile first to an AI first world.”
- There is also a sense of startup energy moving into different arenas and causes beyond mobile as is the case of Stemcentrx, a Silicon Valley startup trying to eliminate cancer.
- A more inchoate sense of bewilderment and disillusion with the trajectory of travel of computing technology is apparent in a devastating short post called Drifting by Tariq Kim. In it he calls time on the lack of ownership, obsession with algorithmic choice and the inability to slow anything down in the tech work suggesting it is collectively leading us to calamity as a species. He’d like to see an alternative emerge with our collective help – an “organic” technology movement if you like:
The uncomfortable truth is that I fell out of love with the technology world and that I am not excited by the future anymore. At least the future that is being built today. … In the world of technology, we are taught to build things fast. Sometimes too fast. And we spend so little time studying the consequences of what we build. … We need to give people access to other choices, other life narratives, other tools, and other ideologies. A sort of “organic sustainable slow technology” that fights this commoditization of everything online and offline. I feel it’s time to build this and for that I want to stop drifting and get back to building products that make me love the future again.
- The mental as well as physical benefits of regular running seem likely to receive greater praise in the coming years and would seem to represent the ideal antidote to sitting around focussing on computers and tech all day:
“About three decades of research in neuroscience have identified a robust link between aerobic exercise and subsequent cognitive clarity, and to many in this field the most exciting recent finding in this area is that of neurogenesis.”
- Neil deGrasse Tyson says it’s ‘very likely’ the universe is a simulation:
- London CrossRail is a “$21billion test of virtual modelling”.
- Brace yourself America, it’s really happening. HuffPo consider the potential of confrontation between a President Trump and the US military. This doesn’t need any killer military AI to make it any scarier:
“If you take the man at his word,” said Michael Breen, the president of the Truman National Security Project and a decorated former Army officer, “we have a presidential candidate who seems to have committed himself to triggering what would probably be the greatest crisis in civil-military relations since the American Civil War.”
- With his nomination momentum seemingly restored after a ropey couple of weeks, Trump comes across almost (at least by his standards) presidential in his attempt to communciate what an “America First” foreign policy would look like.
Society and Culture
- As Lenny Henry reminisced of the time he “sang with Prince and Kate Bush“ Quartz asked the question why people grieve celebrities they’ve never met. The answer lies in a confusing mix part memory of a lost younger self, part keeping up appearance and part the sharp reminder of mortality:
You can die in an elevator alone no matter how rich you are and no matter how talented you are
- 27 years on from Hillsborough and the momentous revelation of the truth of the terrible events that day, another reminder of the abject failure of media objectivity. Billy Bragg performing Never Buy the Sun in 2011:
Under the hipsters’ watch, dance music has become tedious and diluted. A monstrous cabal of overpaid circuit DJs titillating a precious and unimaginative bunch of wimpy pseudo-hedonists at a carefully designed ‘safe space’. In broad daylight. If that’s your idea of raving, you can keep it. I’m out.
- Presumably the author would apply the same withering tone to Further Future a “Burning Man for the 1%“. Featuring Alphabet Chairman Eric Schmidt in his party hat. Form an orderly queue now please:
“This is top-league networking and business folks are all here in the guise of having fun. It’s designed around the music, but it’s about the business. A ton of business will get done here. Entrepreneurs will get funded, investors will find their trajectories, service companies will meet and mix it up.”
- Fraser Clark, the Godfather of Rave and founder of Megatripolis would have had a field day there. The description of his funeral sounded like more of a laugh than anything on offer at Further Future:
Mr Clark’s coffin was decorated with coloured ribbons for the “journey that has no beginning and no end”, which saw more than 150 colourfully dressed mourners cross Finchley Road for a ritual burial and “g-RAVE-side ceremony” in Hampstead Cemetery. … Prayers were made aloud for Mr Fraser’s spirit, with one woman repeatedly asking for sexual energy.
- Leicester City’s triumph in the Premier League deserves its own section next time around. Following this weekend’s dramatic finale, one wonders how many other teams will resort to Buddhism in a desperate bid to emulate the Fearless Foxes:
after the 94-year-old’s death in 2012, “the monk’s body was washed, treated by two mummification experts, and sealed inside a large pottery jar in a sitting position,”
- Meanwhile, elsewhere in China, the monks have taken a more over tech route and built a robot called Xian’er (or “Turn Around”) “to spread the teachings of Buddhism“. He sounds pretty Zen:
Q: “How old are you?”
Xian’er: “Robots don’t have an age.”
Q: “What is Buddhism?”
Xian’er: “Everything is Buddhism.”