#1 Barriers to Analytics is Organizational Lack of Understanding

In 2010, a joint MIT and IBM study found that of the top 10 barriers driving companies from using analytics, the top 3 were organizational.  They were not related to access to data or lack of funding (click graphic to enlarge).

20130217 MIT IBM Analytic Barriers v 2

In the years since, the likelihood that ‘access to data’ has broken into the top three is less rather than more likely.

What does this mean?  Imagine if one were to understand better the drivers to the #1 barrier to the use of analytics.  It was ‘Knowing how to use analytics to improve the business’.  This wording assumes the end users–the managers and analysts–know how to improve the business.  Thus it could be that the managers and analysts understood the analytics completely, and knew how to improve the business, but not how to use the analytics to improve the business.  Or it could be that the end users did not understand the analytics.  Let’s take each of these in turn.

Some analytic users are empowered to improve the business, others are not.  The ones who are not are applied to analytics by management in the assumption that the end user can generate insights or analytics that can be consumed by folks empowered to improve the business.  In which case if one assumes the managers who can improve the business are end users ‘once removed’ from the analytics, then one can simply ignore, for the purpose of this discussion, the human layers of intermediation between analytics and those who can improve the business.

As analytic systems are built and delivered, there is a ‘breaking in’ phase where those who can and will use output from analytics will slough off or stop investment of analytics where the analytic output has no relevance on improving the business.  There is a survival of the fittest type pressure on use of capital — different in each business — that will lead to at least some if not all of the investment in analytics (from hardware to time consumption of end users) going away.  It is typical that the analysis that goes away last, or ‘survives’ and is thus the fittest, is first management reporting, because it is a ‘requirement of accountability and measurement’.  The second type to go away are analytics that improve performance to be measured.

This means that over time, evolutionary pressures lead to analytics that are not management reporting–and some would say management reporting should not even be lumped with analytics — being those that can improve the business.  Those analytics that cannot be translatable to business improvement die.

So if understanding how to use the analytics to improve the business is a barrier to use of analytics, then the issue is that the end user either a) is new to the business or b) does not understand the analytics.  It cannot be that there is so much new hiring or internal re-organization that 38% of firms say the #1 barrier to use of analytics is based that the consumers are ‘new’.  Which means at least a strong share or majority of the barrier is understanding the analytics themselves.

There are many signals which support the concept that the analytic output is getting more sophisticated and users are getting less sophisticated.  American math and science education is providing worse outcomes over time.  Many can debate whether this is due to quality of teaching, lack of interest on the part of students, or other factors.  There is a widening gap between the number of science, technology, engineering and math students requested and required by business and the number coming out of school.  This gap was the topic of a U.S. Presidential paper commissioned by President Obama’s administration.

Meanwhile, the phenomenon of ‘Big Data’ has emerged.  More data means more variables to analyze, and more math required to slim the results to meaningful output.  The most successful recent analytic technology firms–Tableau comes to mind–have made their value proposition around the concept of making analytics easier to understand.  A new genre of ‘data visualization’ and ‘infographics’ has penetrated business intelligence and Big Data conferences that were not there only a few years ago.

Another way to think about this change is in terms of IBM Watson.  There are too few Jeopardy champions, Jeopardy runners up, and even Jeopardy applicants for business to hire to understand and translate the analytics.  The new model is to buy a Watson computer or make Apple Siri smarter. Or do something different.  While investing in tools like Tableau or visualization to help today’s analysts and managers understand the analytics.  Ultimately, making business people who don’t want to hurt their heads with math be math-savvy is not going to work.  We are spoiled by Google and instant results.  We no longer hire translators, we use ‘instant translation’.  The role of analysts to translate results will evolve to ‘instant translation’ using compute power.

The data will not stop coming.  It will not stop getting wider, with more and more variables to compare to the business result.  The capabilities of technology will not get worse.  Compute cycles will not get more expensive or less of a commodity.  Innovation will not stop.  And more and more available business students emerging from school will find ways to add value without learning statistics.  As a result, analysis tomorrow will be more and more bound by organizational factors than by data or financial ones.

Predicting the Future of the Business Intelligence User Experience, or A Different Heuristic for BI

Defining the scope of the trend prediction
Little has been written about the evolution of business intelligence (BI) reporting specific to macro trends in user experience (UX) in the last twenty-five years. If we take a decades-long view, an interesting picture emerges.  If this view is combined with trends in social media, gamification, and leading web apps, one may be able to predict what BI UX will look like twenty-five years from now.

Please note this view is limited to

-   Structured data.  Unstructured data and search have different paradigms of how a BI user engages with the analytics.

-  Focus on user engagement.  It aggregates all types of users and most types of structured analytics.

More scope: Shorter term UI vs. Longer term UX

Let’s imagine user interface (UI) to refer to how a BI end user leverages features and functions on their mobile or PC device to view, modify and create structured analytics.  Let’s imagine user experience (UX) to be an aggregated concept of how BI end users think, talk, share, sense, respond, and use the UI, the application as a whole, and other related applications.

User interface (UI) changes are important. They may be more important than total UX to the day-to-day experience of the end user.  Especially the feature/function parts of UI.  The give users capabilities to do things.  However, the quarter to quarter and year to year feature/function battle between BI vendors* tends to co-opt UI progression.  Topics like drill down, drill across, maps, inclusion of statistical functions in a primarily SQL environment are of low significance in a long-horizon UX look at trend(s).  (In fact, it’s been surprising how longstanding many feature challenges have remained in BI tools. One could argue that in 1995 and in 2012 there is still no low-impact and easy way to see maps, text, images and other ‘variety-centric’ types of Big Data in traditional and even new BI tools.  Another way of saying this is that the UX of reporting and analyzing these so-called ‘supported features’ is roughly equivalent now as it was ten years ago.  In two words, not optimal.)  This look into a macro view of BI UX trending for the most part ignores feature/function.

Even more scope: Why ignoring Big Data in BI UX trending is right

To predict how BI UX will change over the next twenty-five years, one must ignore feature/function and the energy around Big Data.   Not because Big Data is hype or trend–it’s not–but because extending BI tools (at least the BI tools that encompass most of the market share in BI usage) to Big Data will fairly quickly or already devolve into existing struggles to accommodate this data size and speed.  For this trend, we’re looking at structured data only so the Big Data axis of ‘variety’ can be ignored.  End users have tried to use on-premise IT-approved BI tools to accommodate Big Data for years before Big Data reached the tipping point.  Thus, accommodating it as a high-priority feature due to the Big Data market hype vs. accommodating it years ago as a nice-to-have is really the same discussion, just with more gusto.  Today’s tools are more capable in meeting users’ needs where the data queried is hundreds of terabytes.  Thus

-  We ignore the axis of variety as we’re interested in structured.  Observations show that 80% or more of analytics in today’s large and medium businesses occur with structured data.

-  We ignore the axis of volume. Most of the BI UIs can stay the same and handle larger data behind the scenes without needing to change the UI.

-   Velocity has been the most interesting driver of UI changes—aside from the creativity in self-service BI—in the most structured BI has required data loads of no more than once a day.  With data ‘in motion’ or ‘on the wire’, previous UI don’t work.  However, the macro BI UX trend results seen here seem to accommodate structured in motion analytics nicely.

If one can extract ourselves to a truly high-level view, we can assume we’ll absorb evolution and perhaps revolution in terms of velocity, variety, volume and other axes that we can’t imagine today.

The 200,000 foot view: From Report to User, Individual to Crowd

Early BI was focused on the concept of ‘the report’.  We’ve seen an evolution already in the collaboration axis of line-of-business/enterprise analytics.  In the 1990’s, the bulk of analytic work was between a single end user and a report.  It was only in the late 1990’s that tools began to have the ability for other team members to see reports of others.  BI vendors were experimenting with ‘groups’, subscriptions, and not close to enabling multiple analysts working on the same report.  The center of the universe was the concept of the ‘report’ that would be built by a single user.  In those days, the way to share a report with others was to print it out and hand it to them. Really!

I recall our analytics business in those days had a huge amount of intellectual property in data collection, cleansing, preparation, visualization and more.  And we happened to sell software that included pre-canned reports.  It became frustrating that the most often question asked by users was ‘Show me your reports’, ‘what reports do you have’, report, report, report.  We could have eschewed work on infrastructure, support, reliability and so much more; built the largest number of canned reports; and sold more software.  It was as if we’d built and offered the best kitchen, utensils, food inventory, food delivery, and the prospect wanted to see the menu of left-overs in the fridge ready for reheating.  As a teaser for where we are going, they never asked about who else was using the kitchen, who might be making the left-overs, or certainly not about pizza delivery.
From 2000 to 2010 the industry has seen a very positive change to this approach.  The central ‘object’ and focus is still the report.  However the migration has been along this path

  • Report viewable by the creator
  • Report viewable by the company (this step actually allowed for IT to be used more often than before as ‘report creators’)
  • Report viewable to team members

Report as ‘golden calf’

Then a disjunctive transition to report object as central viewable site or target.  Almost as if the report itself was a location, a phenomenon.  This went along with the advent of Tableau Public–and there may not be too many analogs to Tableau Public–and led to our current phenomenon of infographics.  Reports as a water cooler.  Reports as central, statement-making attractions that are easily locatable, sharable, viewable, comment-able, and that may push the ‘crowd’ toward a singular or common idea or action.  Our common technology advances in Web 2.0, social media, and general web UX have led to transparency, communication, and the ability to achieve something closer to ‘worship’ for the best, single report.  This is ‘report as golden calf’**.  While there is a huge focus on infographics, the irony is that they are fairly non-useful for the bulk of analytics work.  They cannot be pasted into a Powerpoint slide (too big).  They have evolved to require a graphic designer (too expensive).  They require research (too time-consuming).  They are based on external research, web data or stats (not BI analytics).  So perhaps infographics are out of scope.  However, they could also be viewed as the unstable pinnacle—and final stage—of our obsession with ‘the report’.  The report will never go away. It is amazingly valuable.  In some sense, at least with today’s technology, perhaps the only way to derive most insight.  However, just as nothing looks the same as twenty-five years ago, as we go into the future, some day we will look back and say “That was analytics in the way-back past.  Archaic! How did anyone derive value from that?”

Symbiotics of Report as Golden Calf

At the risk of driving the quants, data scientists, Cartesian, logical readers crazy, there are a host of right-brain adjectives and phenomena that signify and represent the evolution to report as golden calf. These have been consistently gaining traction and attention!  Perhaps best described as milestones, flags, or tailwinds, these are symbols of the pedestal we’ve been able to achieve in bringing data into single viewable insight-transmission devices.  These include (and ideally presented in a tag cloud!)

  • crowdsourcing
  • color graphics, infographics
  • multiple reports, widgets in one report
  • visualization
  • sharing
  •  data marketplaces
  • merging publicly available data with private data
  • analytics on publicly available data
  • cloud security being more conducive to group analytics on public data than private data
  • Netflix Prize
  • reports as Flash objects
  • reports as videos showing time series progression and data layering onto the chart–the report is a ‘movie’ and movies in our culture are watched by many, a social event as much as information transfer

What happens next?

The next phase of the process is placing the ‘user’ at the center of the process.  No longer will the question be: what reports do you have?  The question will be: what users do you have?  Who is changing, adding, commenting, analyzing and sharing?  Who is speaking?  What voices add to the report?  Who is improving the report?

An interesting place to look outside of BI is the process of ‘gamifying’ applications that are not games.  In the gamification of applications the end user is often ‘placed’ at the center of play.  Further, in our culture in 2012, the center of our digital engagement is ‘me’.  The profile on Facebook, the profile on LinkedIn, always the profile.  And a network of engagement with friends, brands, colleagues, and content.  This next wave will focus on the power of engagement, drawing many more users into BI–just as Facebook’s simplicity and ‘Me’ model has drawn many more citizens than we’d ever thought possible into digital communication.  The stream of data and analytics will be network-centric.  Imagine a Facebook where instead of friends we have business analysts, data scientists or insightful people, and the stream of things they say are ‘reports’.  Thus more communications about the data before the report, the context, the insight, the drill-downs, the next logical report…all the things that actually make a structured BI report valuable.

The Facebook-ification of BI

In Facebook, it is super easy to start and super easy to engage.  For even the ‘least capable’ of computer or mobile users.  As the components needed to support a central ‘Me’ rather than a central ‘Report’ are solidified, it will be super easy to start and engage.  Start and engage will look like

  • creating a profile
  • be incentivized through rewards, fun and other draws to participate in others’ analytics and insights via comments, derivative analytics, derivative visualizations (this step requires no data of one’s own)
  • select from public data for rudimentary analysis
  • upload one’s own data
  • do analysis
  • share reports
  • present insights
  • comment on others’ reports
  • comment on data
  • clean other people’s data
  • make reports on their data
  • and so on…

Unlike previous BI, at each step of this engagement process, the step is its own reward.  Some users will stop at an early step and find enough value to stay engaged.  Just as a simple Facebook user with limited computer skills may simple be on Facebook and never post but just read family updates, a simple user may just read analytic updates.  However, with gamification components such as relevant/friend/motivation-adjusted leaderboards, points, avatars, mastery, badges, and more, the ‘Me’ BI tools will slowly start to drive users to
a) understand and trust the ‘mayors’ in their own organizations and spheres, as they do now to some degree but to a much wider scale and effect
b) move up the path to BI mastery, driving quality and frequency of insights and action increasingly
c) find more relevant peers to connect with related to similar datasets, action potentials, vertical industries and more
d) see increased business performance from the increased action
e) reach states of cleaner data as there is more reward from self-cleaning and cleaning others’ data
f) enjoy analytics

The move to center focus on the host of ‘Me’’s all around the world driving analytics, doing analytics is a key step.  Without this step, there is little shared knowledge in a relative sense.  When suddenly we have a vast Twitter-like communication noise about insights, insights will be so prevalent that one will have a harder time steering clear of insights and cherry picking than of trying to find them like finding needles in the haystack.

Post-‘Me’ BI UX

There is certainly a chance that the ‘Insight’ phase could take the central role rather than the ‘Me’.  In some ways, were the insight to take the central role it would be more value-added to the universe of users.  But it is too soon for that to happen.  Without a solid underpinning of the ‘who’ there is no platform for insight to be understood, validated, and put into context.  Unfortunately, it is likely that our society of BI users will have to progress and learn through at least a decade of ‘Me’ before the insight and action can come to the fore.

When the ‘Insight’ phase occurs, it will actually be single sentences of truth.  An insight should be able to be summarized and stated in a single sentence.  If it cannot be, then it is not (yet) an insight.  This will be transmitted to the user via a ‘Siri’-like voice or a single sentence of text or via some other method we’ve yet to see.  A data set should show just as many interesting insights as a look at a beautiful park or a walk through a museum, or watching a business presentation.  In every example, there are perhaps fifteen to forty specific points/exhibitions/sights/objects, and the normal human will walk away perhaps experiencing joy, education or wonder at all of them at the time, but will remember only one over a long period.

Post-‘Insight’ BI UX

No one can say just what analytics will look like in 2030 or 2040.  There will be reports. There will be analysts.  And there will be insights.  All that said, it is very likely that the gravitational pull of the data will absorb the insights and actions to its very source.  That data will not exist outside of action.  The very systems producing the data will be enabled to absorb, analyze and act on the data.  The question being asked at that time of every part of our lives may be ‘How involved do humans really have to be in this?’

How will it appear 30 years out

To the BI tool user fifty years out, the progression should look something like this

  • Data (1975-1995)
  • Reports (object or data focus) (1990-2015)
  • Users or ‘Me’ (profile or community focus) (2015-2025)
  • Insight  (2025-2030)
  • Action/Embedded…the end of analytics (2030-   )

As Moore’s and other non-linear laws of compute progress, the first or second derivative of these should apply to the BI progression, making each phase shorter than the one before.  Further, during each phase, all of the objects of the previous phases continue to grow and evolve.  One can imagine that ‘Big Data’ is a transformative evolution within the previous phase object ‘Data’; likewise during the ‘Me’ phase of socializing BI, there should be a transformative evolution of what a Report is.  Maybe it like a magical garment, when shared it will adjust its size, shape and color to the business question, data set, and capabilities of the user to whom it is transferred.

* BI tools include applications for large and SMB enterprises such as MicroStrategy, SAP Business Objects, IBM Cognos, Tableau, Birst, Domo, GoodData, Qlikview and similar firms.

Marketplaces a Non-Starter for Product Data Science Startups?

Crowdsourcing, ‘app stores’, innovation competitions, and the rise of the open source statistical language ‘R’ are all small but significant parts of today’s landscape in data science.  In the last five years, a number of new phenomena have sprung on the data science scene in a way that cannot be ignored.  Many focus on the novelty, permanence, and the actual impact of their precision and success.  Kaggle is a ‘poster child’ for many of these movements, for good reason.  Kaggle is a marketplace for data scientists to vie to win paid contests for the best solution to an analytic problem.  But a question remains that is ignored by many in the press, the data science community, institutions, data vendors, and other parties.  That is, would a marketplace be a viable way to monetize ‘platform intellectual property’ (IP)  for a startup?

To set the stage, imagine you have solved a uniquely hard problem in data science.  Instantly, the question arises as to what % of variance of all data science problems does your solution apply?  One on end of the spectrum is the low % of variance: you have designed a specific test or algorithm that applies to one set of data for one type of users answering one question.  This is hardly a platform really.  On the other end of the spectrum imagine you have designed a statistical method which covers many data types, industries and questions.  Along this spectrum, there are a number of firms who have implemented known statistical methods—usually just one method, or perhaps a handful.

With the advent of cloud, these firms have the ability to implement their IP ‘on top of’ a ‘big data’ stack at relatively low cost.  If their IP can apply to multiple data sets and industries, use of their IP is a plausible destination for an independent software vendor or ‘app developer’.  This ISV or app developer could more cheaply deliver to market a solution leveraging the platform company’s data science innovation.  And what if that platform provider decided to make a marketplace: aggregate ISVs, provide data hosting and some cloud services, handle billing or other operation functions?  Is this a valid ‘route to market’ for a platform provider?

Will app developers want to build on your platform?  This is a key criterion for the data science innovator wondering if a marketplace is a viable way to monetize their IP.  There is undeniable advantage to the right kind of app developer to use the platform proposed.  Using the platform means the developer can focus on the data set, their application of the data science, optimizing the solution, and focusing on sales, marketing and delivery.  However, there are a number of drawbacks that may constrain the platform provider’s attempt to garner a critical mass of app developers.

  1. The app developer will be limited by any limitations in the data science methods or offerings of the platform.  If the platform provider offers a single statistical test or method that can only answer 2%, 5% or even 10% of the total business questions in the prospect base, then the economic upside is limited.  This may be a non-issue.  For example, if an app developer is an expert in Monte Carlo simulations, and Monte Carlo is the state of the art, for example, in measuring portfolio risk, then using a marketplace provided by a platform that is Monte Carlo makes sense.  But if the app developer wants to offer clients multiple approaches to measuring risk, one of which is Monte Carlo, then the Monte Carlo platform provider is likely not the best option.
  2. The app developer is constrained to have similar pricing to other app developers.  An app developer does not want to charge $20,000 a month to a customer if all the other apps on the data science marketplace charge $50 to $250 per month.  They will be perceived as expensive, an outlier, and perhaps even that the price is a mistake.  Thus as data sizes or processing needs of customers rise drastically, pricing an app to near the median marketplace price becomes impossible.
  3. The app developer will be in a proprietary model and thus is committed to the vendor and has costs to leave the vendor (vendor lock-in).  Lock-in situations are often the right decisions if all other factors align.  Even though ‘lock-in’ has become a dirty word based on the leverage it offers to the vendor, the reason it abounds is the large cost savings it provides if implemented as designed.  That said, in this rapidly changing domain of business that is data science and big data—and with the prevalence of open source approaches—it is hard to imagine lock-in always being a cost app developers will be willing to bear.
  4. Some types of buyers will not buy on a marketplace.  In fact, many corporations are not familiar or comfortable using a marketplace for their needs.  They are expecting to do a transaction with a single software provider who has a one-off contract where the corporation can dictate standards of confidentiality, warranty, non-infringement, and other business terms.  The marketplace enables app developers to have a layer of disintermediation with the customer that hides information about who they are and brings distance between the corporation and the provider.  While there are ways to ease some of these issues, they are so numerous and powerful that they can never be completely relieved.  Corporations would rather pay more to stay within corporate standards and have visibility and accountability from the app developer.  A marketplace is great for an app developer who does not want to purchase commercial insurance, does not want to incorporate, and does not want to do many of the things a corporate buyer wants from them.  And while the presence of multiple customer ‘ratings’ can act as a surrogate for a warranty in some cases, corporate customers will not want to share reviews of what they are doing as it provides competitive intelligence to competitors.
  5. Some buyers will not want to use the cloud.  In a one-to-one relationship between the app vendor and the corporate client, the needs of an on-premise client are even higher.  More ‘touches’ are expected and, unfortunately, needed.  A marketplace-only approach makes these buyers not part of their addressable market.
  6. App developer is reliant on the growth of the platform provider to bring eyeballs/prospects.  And on the platform provider surviving.  Measuring the marketplace itself instantly brings to the fore questions like ‘how much is the marketplace investing in advertising’? How many other app developers or ISVs will commit to the marketplace—each of whom will bring eyeballs from their marketing? Will the marketplace survive and thrive?

These factors will dampen a utopian perspective of the ease of creating a large marketplace of app developers on a data science platform.  This in turn will affect the calculations and predictions of volume and sales achievable by the platform provider, and the costs they can pass on to the app developers.

Another challenge is that marketplaces today often act as the intermediary to bring customers and providers together, but not to actually deliver the service itself.  In the Netflix Prize the contest did deliver a solution, however the solution was never implemented by Netflix.  The requirements, costs, and other investments by the customer once an approach is decided on may need to have wide differences from end customer to end customer.  These are fine if the marketplace is a ‘matchmaker’ but not fine if the marketplace is also the delivery mechanism.

Kaggle, for example, is a great way to find the best approach; it is not a great way to deliver the best approach.  This article describes or implies that many of the members in the Kaggle ‘marketplace’ are not using Hadoop or other scalable methods to design their approach, but are using ‘in the living room’ approaches to design the solution:

“The added irony is that Kaggle’s data scientists don’t even use Hadoop. Hadoop is an open source platform that runs across clusters of thousands of servers, but for the most part, Kaggle’s scientists solve their problems using a single machine. Momchil Georgiev uses his home desktop, with help from the SQL Server database and R, the open source data analytics language. Jeremy Howard (Kaggle Chief Scientist) works much the same way.”

Altogether, a marketplace may be one way for a data science IP provider to go to market along with other approaches.  It is unlikely that a marketplace-only approach for the designer of a scalable data science method can succeed financially.

Data Scientist in a Box or Go Hungry?

A number of new firms have taken on this ‘elephant in the room’ (no, not Hadoop) problem that the US is going to be up to 1 million data scientists short of our needed supply over the next decade.  A number of firms have arisen to tackle this, including those in a recent article called ‘Want to ditch your data scientist?’   (of course this title assumes you have or could afford a data scientist to begin with…).

A friend and I were calling this concept ‘data scientist in a box’ a few years back.  I suspect we weren’t the first.  Anyone in the industry of providing large and medium sized businesses with Big Data solutions could already see the missing link was the decreasing availability of those with math skills.  We saw that the inventory of existing statisticians working in SAS statistical software might be square pegs when it came to unleashing open-source tools to data with a scale large enough to break old school client server-type stats apps.

Some feel negative about data scientist in a box, however this ignores the reality that there aren’t enough data scientists to go around.  I would call this a Marie Antoinette syndrome.  If data scientists were food, people who take the attitude that you shouldn’t try a data scientist-in-a-box approach are saying ‘Let them eat cake’.  They are saying ‘Let them eat cake’ to those without the funds or glitz of company or size of company to attract a data scientist.  If you can afford a data scientist, great. If you can’t, you will need and want these data scientist in a box solutions.  These firms give those business people the choice between doing some Big Data Science as opposed to doing nothing, which is what happens today.

One organization, DataKind recognizes that because ‘data science skills are so in demand’, they will provide free data science volunteers to work on projects pro bono if the projects are focused on social good.  This is very much like attorneys working pro bono cases for clients who cannot afford fee-for-service.  In pro bono legal work, there is no hope the clients will ever get a law degree or ‘teach themselves to fish’; there is no chance they will ever be able to afford the level of service that others do—it is clear that unless given the option to receive highly subsidized or free legal services that they will receive no advocacy.  We need to realize that organizations who do not have the means to afford data scientists will end up with no data science.  There are no other avenues (unless their data is aligned to the social good and they can qualify for the help that DataKind offers).  DataKind cannot scale to even a tiny fraction of the socially good Big Data projects, much less commercial ones.

Data science is critical.  It is firmly on the path and progression that our community is moving as we add insights to all the processes surrounding our lives.  It seems likely that when there is a real need for Big Data science and not enough individual practitioners, someone will figure out how to packaged up the science to as large a degree possible.  When faced with packaging it or just ‘going without’, clearly something—once mature—is better than nothing.  Most in the US would agree we have, in healthcare, a shortage of nurses.  This is no longer controversial.  No one calls it ‘a shortage of good nurses’.  The shortage is big enough we have a shortage of any nurses.  We already have a shortage of data scientists before Big Data, before R, before predictive analytics came along.

The data scientist-in-a-box firms are already tackling—in a software user interface substrate—the problem of how to explain science approaches to driving insights out of data.  Firms like ClearStory, BigML, DataHero are new and doing this today.   The ’99 percenter’ user community does not understand the science.  But they believe in the science.  They have seen it work with Google, LinkedIn, Facebook, and Netflix and use it daily.  With a hungry and bought-in user community, it is a no-brainer for these firms to tackle this problem.  The firms who tackled this in the past were the business intelligence firms.  However, they had a natural stopping point at the limits of SQL.  The ‘declarative language’ of SQL.  In the world of a NoSQL or Hadoop approach to Big Data, we have broken away from those limits and are in the world of coding universally taught coding languages.  Which means it is actually easier to embed data science in Big Data applications, and there is no natural ‘fence’ corralling the type of analysis to ‘easy science’ that business people are more likely to understand.

So the floodgates of potential math to drive insights are open, at a time when scale is increasing fast, or has already increased. And at a time when math skills are dropping in the user community.  Will we as a business community continue to keep applications ‘dumb’ to the lowest common denominator of the end user?  It won’t happen.  That is not our tradition. That is not our history.  That is not our way.  Our way is to hide the complexity and give the end user the answer.  (One might argue a better way is to ‘give a man a fish’ and teach business users the math skills to understand deviations and outliers and regressions and even more complexity, but in our society that’s been spoiled by the speed of getting answers via Google and goes much faster, let’s be real, our business analysts and managers are not going to sit through the process to learn it.)

The data scientist-in-a-box firms are paving the way for making data science accessible to the masses.  So if you can afford a personal chef, good for you.  If you can afford a data scientist, good for you.  For everyone else, get ready to someday have a Google interface, do a search, get a result based on science you don’t understand.  I bet the users will trust the answer as directionally correct and better than no answer at all.

Cloud-washing, the False Cloud, and the Hodge-podge Migration

Salesforce.com CEO Marc Benioff, admittedly not an unbiased principal, called false clouds ‘not democratic, not economical, not efficient’ .  It is not clear enterprises have been driven by how democratic or even efficient an IT problem solution is.  But economics first with, hopefully, speed second are the priorities in servicing line-of-business application needs.  So when a type of cloud is proffered that offers little or none of the amortized experience, scale, infrastructure,elasticity, and investment of a public or a private hosted cloud, it is not surprising the cloud vendors rise up.  No one would expect a cloud vendor to allow something clearly not cloud to use the term.  Hence ‘False Cloud’.

There is actually an applicable term for taking on-premise traditional IT infrastructure, adding some virtualization, and calling it a cloud.  It’s called ‘cloud-washing’.  The Cloud Credential Council  says “Be skeptical of cloud claims,” when they define it .

“Cloud washing… is the purposeful and sometimes deceptive attempt by a vendor to rebrand an old product or service by associating the buzzword “cloud” with it.”

The most obvious example of cloud washing is when a vendor with a definitively non-cloud, on-premise, do-it-yourself offering calls it a cloud.  It is akin to a weatherman calling a desert tumbleweed a cloud: “…because folks, it gets off the ground, moves overhead with the wind, and has to be, at its roots, moister than the air around it”.  A tumbleweed is not a cloud.  And the practice of in-house IT professionals buying hardware, racking it, buying software, loading it, and provisioning this infrastructure is about as far away from cloud as you can get, as GoGrid so neatly summed up in this video.

According to the National Institute of Standards in Technology a cloud is

“ a shared pool of computing resources that can be configured, provisioned and released quickly and easily, without the help of a service provider. The cloud is “elastic”, which means services, such as processing power or storage, can be scaled up or down very easily depending on user need.”

It will be the normal state of affairs within an enterprise to migrate some existing business applications to the cloud quickly and some slowly.  As such, any snapshot of the existing application portfolio may appear to be a hodge-podge of models with no clear defining approach.  That snapshot masks what is likely a structured march away from a DIY approach.  Large percentages of new business applications will be SaaS and cloud; the smallest percentages for cloud migration will be the oldest, most operational, least analytic, most transactional, most ‘mainframe’-type applications.  Taken together, it might be akin to a birds-eye view of an advancing guerilla army, replete with stragglers, varied uniforms, but together moving to a common endpoint.

It will take many years for the migration to occur.  That is the sensible reality, and the only one that fits with the large and medium size enterprise business cycle.  But when existing vendors of any architectures more modern than mainframes try to re-brand themselves to the new approach in order to save their investments in current inventory, R&D, people assets, brand and customers, it’s a sign a turning point has been reached that will brook no reversal.

Execs & employees believe social media improves company culture

Watch for two emerging streams of social media data, both revolving around the workplace!

A recent Deloitte survey called “Core Beliefs and Culture” found that both executives and employees believe that use of social media in the workplace improves company culture.  Considering many executives may not fully understand the capabilities of social media, the fact that 41% of them said social media improves workplace culture is significant.  What did almost half of the 303% of executives have in mind?

The Deloitte study focused on internal communication among and to employees.

A writer on human resources (HR) topics describes the ease and benefits:

“One of the more effective and increasingly acceptable forms of communication with employees is the use of social media. Social media is something that most employees already use on a daily basis. Communicating with employees in a manner and a format they are accustomed to will make it simpler to get the point across as quickly and efficiently as possible. Twitter, for example, can provide a means for explaining small-scale changes, whereas other social media platforms like blogs might be needed to communicate more complex or lengthy policy changes.”

An equally key area may be the social media communications that employees have with the public at large or their ‘friend or follower’ networks.

One news publication cited a survey by Right Management  that showed people use social networks to assess prospective employers.  Thus it’s key for firms not only to successfully factor social networks into brand reputation and customer activities, but also to focus on the postings, tweeting, and communication of employees. Thirty-one percent of respondents said they use social media to see if prospective employers value employees. Thirty percent of respondents use social media to educate themselves on career development opportunities.

Companies whose employees are ‘trashing’ the company may be turning off high-quality prospective recruits.  If this is happening, it’s happening without the companies seeing it happen.  Twitter hashtags like #lovemyjob, #hatemyjob are popular.  They are used hundreds of times a week.  Tags like #lovemyboss and #hatemyboss are somewhat less frequent.  The net is that prospective recruits can search the twitter universe and other social media channels to find data points about employee sentiment.

What’s a company to do?  If the need is strategic, ongoing, cross-company, and the social media data showing employee sentiment wants to be integrated with other internal company data such as surveys, companies should use the firm Gnip. Gnip will license you the entire Twitter ‘firehose’ or a smaller sample.  That bulk data can be used for data mining, searching, integrating with other data and more.  For companies with more point in time inquiries or who want to learn in bits and pieces, they can use DataSift.  DataSift provides an easy user interface to perform searches in a pay-by-the-drink model.

Some want the internal culture-building, virtual water cooler, generating conversations, and communicating with employees.  Others will want the employee sentiment monitoring.  Either way (or both) the HR division will need to be on board the social train sooner rather than later.

I Like ClearStory Data

Do I like the new startup ClearStory Data?  Yes.  I like the concept so much that I tried to start a company to do what ClearStory does….in 2008.  While putting together funding, I chose instead to join the hot big data company Netezza, who was soon acquired by IBM in a $1.5 billion dollar transaction.  While I was a little surprised to see it, I am glad someone is coming along and validating the idea.  My original idea was for a cloud-based, SaaS, Big Data company where a user can upload their business data and let an automated data scientist scour it for insights and deliver those to a time-starved and non-mathematical business person.

The idea, and ClearStory’s – which has a bit more focus on publicly available data- is compelling because

  1. Recognizes lack of data scientists.  The President’s Advisory Council on the topic believes in the next two decades we will be up to one million data scientists short of the supply we need.  One million.  So if your company’s belief is that you can invest in a tool which will or may require either hiring or contracting with a data scientist to translate a business user’s questions into code or a user interface, think again.
  2. Targets Line of Business users, not IT.  Enterprises are starved for IT talent and IT is and needs to be focused on security, PCs, keeping the business running, and keeping basic canned reports flowing.  It is definitely the rare bird to have a firm who has IT to spend time as data scientists or report builders or insight seekers.  The norm is that business people cannot leverage non-existent IT to build them an insights tool.  As such, the only workable model—think salesforce.com—is to build a tool that is usable by the business and requires no IT.
  3. SaaS pricing model possible.  Perpetual licenses models of enterprise software may make sense in the case where the technology works, is adopted and liked by users, and meets its promises.  This is not always the case.  Today, business people like—and can find—solutions where they can opt out of renewing if the tool is not right.  They don’t have to commit forever.  Oh yeah, and if they have to make a large capital up front purchase in the perpetual model, they have to, for organizational reasons, include IT in the decision, which means the process will take many months and IT may bring in their own preferred tool.  That tool might be easier for them to support and fit their ideas of what software should do, but often neglects the business people’s preferences for usability.
  4. Built in the cloud. Again, no on-premise so no IT.  Anything on-premise, if it becomes successful, attracts the (unwanted?) attention of IT.  Often if an app is in the cloud, IT has organizational permission to ignore it if it is secure.  Some believe modern large cloud applications may be as or more secure than on-premise applications.
  5. Leverages cheap storage & processing.  Storage is only going to get cheaper as will processing.  The ClearStory approach, and the approach of any ‘self-driven’ insights tool, requires this to be true.  One can only look at every possible combination of data with no a priori hypotheses if processing is cheap.  In fact, the phenomenon of Big Data in general could never have happened without this trend.
  6. Solves the ‘time-starved user’ problem.  Today’s business people have no more math training than ten or twenty years ago.  In many cases, less.  It is definite that they have less time than decades ago.  The concept that a manager is going to sit at a desk for even an hour or two to ‘interrogate’ data using a slicing and dicing tool is quaint.  Google has trained us to expect answers in seconds.  In packaged, easy-to-act-on formats.  So no, even self-service BI requires either we hire a mathematician with lots of time or we ourselves and going to hunt for insights for hours of time.  An approach where the application will do the searching and wrap up the haystack needles with a bow is the only plausible end state.
  7. Achieves what everyone is going for: insights.  The business person wants to improve their business.  The way this happens is change.  Making a business change—or not making a change.  Deciding on no change or the change to make for improvement requires an insight.

So based on these trends of fewer data scientists, time-starved non-mathematical business analysts and managers, in a time of decreasing costs of storage and processing, where IT is not adding value to this specific type of application, while SaaS/cloud models exist leads to a compelling idea.  A cloud, Big Data company where a user can upload their business data and let an automated data scientist scour it for insights.  Seems like ClearStory Data is taking this concept and putting a big emphasis on the cloud-ready public data stores.  Make no mistake, the value is more in the data in the Excel files of the business user’s PC and I predict ClearStory will end up pulling those haystacks up to the cloud very soon

To learn more about ClearStory Data, check out articles in The Register or ComputerWorld or their web site.