Redefining power distribution using big data

power_distribution_alex.ch_FlickrWhen I first hear of a new open source project that might help me solve a problem, the first thing I do is ask around to see if any of my friends have tested it. Sometimes, however, the early descriptions sound so promising that I just jump right in and try it myself — and in a few cases, I transition immediately (this was certainly the case for Spark).

I recently had a conversation with Erich Nachbar, founder and CTO of Virtual Power Systems, and one of the earliest adopters of Spark. In the early days of Spark, Nachbar was CTO of Quantifind, a startup often cited by the creators of Spark as one of the first “production deployments.” On the latest episode of the O’Reilly Data Show Podcast, we talk about the ease with which Nachbar integrates new open source components into existing infrastructure, his contributions to Mesos, and his new “software-defined power distribution” startup.

Ecosystem of open source big data technologies

When evaluating a new software component, nothing beats testing it against workloads that mimic your own. Nachbar has had the luxury of working in organizations where introducing new components isn’t subject to multiple levels of decision-making. But, as he notes, everything starts with testing things for yourself:

“I have sort of my mini test suite…If it’s a data store, I would just essentially hook it up to something that’s readily available, some feed like a Twitter fire hose, and then just let it be bombarded with data, and by now, it’s my simple benchmark to know what is acceptable and what isn’t for the machine…I think if more people, instead of reading papers and paying people to tell them how good or bad things are, would actually set aside a day and try it, I think they would learn a lot more about the system than just reading about it and theorizing about the system. Read more

Tags: , ,

How to create a Swarm cluster with Docker

Editor’s note: this is an Early Release excerpt from Chapter 7 of Docker Cookbook by Sébastien Goasguen. The recipes in this book will help developers go from zero knowledge to distributed applications packaged and deployed within a couple of chapters. One of the key value propositions of Docker is app portability. The following will show you how to use Docker Machine to create a Swarm cluster across cloud providers.

Problem

You understand how to create a Swarm cluster manually (see Recipe 7.3), but you would like to create one with nodes in multiple public Cloud Providers and keep the UX experience of the local Docker CLI.

Solution

Use Docker Machine to start Docker hosts in several Cloud providers and bootstrap them automatically to create a swarm cluster.

Read more

Tags: ,

Turning Ph.D.s into industrial data scientists and data engineers

graduation_caps_John_Walker_Flickr

Editor’s note: The ASI will offer a two-day intensive course, Practical Machine Learning, at Strata + Hadoop World in London in May.

Back when I was considering leaving academia, the popular exit route was financial engineering. Many science and engineering Ph.D.s ended up in big Wall Street banks; I chose to be the lead quant at a small hedge fund — it was a natural choice for many of us. Financial engineering was topically close to my academic interests, and working with traders meant access to resources and interesting problems.

Today, there are many more options for people with science and engineering doctorates. A few organizations take science and engineering Ph.D.s, and over the course of 8-12 weeks, prepare them to join the ranks of industrial data scientists and data engineers.

I recently sat down with Angie Ma, co-founder and president of ASI, a London startup that runs a carefully structured “finishing school” for science and engineering doctorates. We talked about how Angie and her co-founders (all ex-physicists) arrived at the concept of the ASI, the structure of their training programs, and the data and startup scene in the UK. [Full disclosure: I’m an advisor to the ASI.] Read more

No tags for this post.

3 simple reasons why you need to learn Scala

Editor’s Note: If you’re a Java developer these days, one who is fully entrenched within the Java SE or Java EE development environment, you’ve grown accustomed to waiting for new features and updates. Change happens at the speed of dial-up, which is a blessing for legacy code, servers, and software infrastructure that thrive on maintaining profitable grace through clunky predictability. You may have even dabbled with a JVM language, such as Scala or Clojure, thinking you could do more with less code — and you can — but then you’ve realized the barrier to entry is steep compared with the needs of meeting day-to-day responsibilities. Why learn something new, you’ve thought, when there’s no strong incentive to change?

With Scala Days nearly upon us, the Fort Mason Center in San Francisco will be awash with developers excited to share ideas and explore the latest use-cases in this “best of both worlds” language. Scala has come a long way from its humble origins at the École Polytechnique Fédérale de Lausanne, but with the fusion of functional and object-oriented programming continuing to pick up steam across leading-edge enterprises and start-ups, there’s no better time than right now to stop dabbling with code snippets and begin mastering the basics. Here are three simple reasons why learning Scala will help you grow as a Java developer, as excerpted from Jason Swartz’s new book Learning Scala.

1. Your code will be better

You will be able to start using functional programming techniques to stabilize your applications and reduce issues that arise from unintended side effects. By switching from mutable data structures to immutable data structures and from regular methods to pure functions that have no effect on their environment, your code will be safer, more stable, and much easier to comprehend.
Read more

Tags: , ,

What should replace the project paradigm?

bubble_wrap

The problems caused by using the project paradigm to delivering software systems are severe. The effect of projects on downstream teams such as release and operations were, for my money, most succinctly articulated in Evan Bottcher’s article “PROJECTS ARE EVIL AND MUST BE DESTROYED“. The end result — complex, heterogeneous production environments that are hard to change or even keep running — is due to the forces Charles Betz identifies in Architecture and Patterns for IT Service Management, Resource Planning, and Governance: Making Shoes for the Cobbler’s Children:

Because it is the best-understood area of IT activity, the project phase is often optimized at the expense of the other process areas, and therefore at the expense of the entire value chain. The challenge of IT project management is that broader value-chain objectives are often deemed “not in scope” for a particular project, and projects are not held accountable for their contributions to overall system entropy.

Furthermore, bundling work up into projects combines low-value features with high-value features in order to deliver the maximum viable product that is the inevitable result of the large-batch death spiral. This occurs when product owners try and stuff as many features as possible into the next release so they don’t have to wait for the one after in order to get them delivered. As a result, the median cycle time for delivering features is often poorly correlated with their priority — a highly undesirable outcome.

Why do we stick with it? Because our budgeting processes are designed to operate on projects, and project managers and the PMO know how to deliver them.

Since these are clearly poor reasons, what should we do instead?

Read more

Tags: , ,

From innovation to mass market: Lessons for the IoT

Editor’s note: This is an excerpt by Claire Rowland from our upcoming book Designing Connected Products. This excerpt is included in our curated collection of chapters from the O’Reilly Design library. Download a free copy of the Designing for the Internet of Things ebook here.

In 1962, the sociologist Everett Rogers introduced the idea of the technology lifecycle adoption curve, based on studies in agriculture. Rogers proposed that technologies are adopted in successive phases by different audience groups, based on a bell curve. This theory has gained wide traction in the technology industry. Successive thinkers have built upon it, such as the organizational consultant Geoffrey Moore in his book Crossing the Chasm.

In Rogers’ model, the early market for a product is composed of innovators (or technology enthusiasts) and early adopters. These people are inherently interested in the technology and willing to invest a lot of effort in getting the product to work for them. Innovators, especially, might be willing to accept a product with flaws as long as it represents a significant or interesting new idea.

The next two groups — the early and late majority — represent the mainstream market. Early majority users might take a chance on a new product if they have seen it used successfully by others whom they know personally. Late majority users are skeptical and will adopt a product only after seeing that the majority of other people are already doing so. Both groups are primarily interested in what the product can do for them, unwilling to invest significant time or effort in getting it to work, and intolerant of flaws. Different individuals can be in different groups for different types of product. A consumer could be an early adopter of video game consoles, but a late majority customer for microwave ovens. Read more

Tags: ,

Democratizing biotech research

The convergence of software and hardware, and the growing ubiquitousness of the Internet of Things is affecting industry across the board, and biotech labs are no exception. For this Radar Podcast episode, I chatted with DJ Kleinbaum, co-founder of Emerald Therapeutics, about lab automation, the launch of Emerald Cloud Laboratory, and the problem of reproducibility.

Subscribe to the O’Reilly Radar Podcast

TuneIn, iTunes, SoundCloud, RSS

Read more

Tags: ,