Applied Data Science Panel: BigData Tools: The Myths and the Reality

The overview article on the Applied Data Science Track in this blog series can be found here.

KDD-2016 – Applied Data Science – Invited Panel

I am excited about being the moderator of the KDD-2016 Applied Data Science Panel:  Big Data Tools & Solutions: The Myths and the Reality.  The panel is intended to address a fundamental issue — What roles are the commercial data mining tools having in enabling Data Science?

In a world, where most new Data Science is utilizing open source R as the primary tools library, and where in large enterprises analysis efforts are heavily reliant on established classical tools like SAS and MathWorks/MatLab, a question poses itself: Have there been recent changes or successes enabled by new tools? Are tools essential?  Do we have the right tools for the modern #BigData world?

I plan to explore these issues with our panelists. However, I would like to hear from YOU on what questions to ask!   I will share the list of questions I have in mind below, but if you have a burning question you’d like to pose to these tools providers, please send it to me – you may message me on LinkedIn or comment on this blog, or send me a tweet @usamaf or direct message me if you prefer. I can attribute the question to you if I use it, or if you prefer, I can maintain your anonimity.

The panelists represent a mix of companies representing classic established enterprise tools (SAS and MathWorks) and “new generation” tools addressing machine learning (Salford Systems) and BigData (RapidMiner).  The panelists are:

My Proposed Tough Questions to the Panel

I had come up with an original list of questions, and also asked the panelists about suggestions. I list these question here, but would love to hear from YOU on proposed questions, or topics you would like to see this panel address.  Please place your questions in the comments below, or send me a direct message or Tweet at @usamafwith a suggestion.

  1. BigData has been undergoing a lot of hype the last 7 years or more, with lots of startup companies funded and many enterprises talking about how it will change the worlds. With the exception of the big Internet players (Google, Amazon, Netflix, FB, etc…)  has BigData really been a major factor driving change or disruption in traditional enterprises?  What are your views/experiences on this?
  2. When it comes to BigData tools, we see a lot of infrastructure players, mainly storage and BigData O/S (e.g. Hadoop) players. Have the DATA ANALYSIS companies really thought about BigData or are we just trying to push the “oldies” into the new world of BigData?   I have not seen a major development of analysis tools specifically for BigData
  3. How much of the DATA ANALYSIS and Machine Learning in your worlds is still happening pretty much on structured data only?  It seems like the BigData is reduced to structured first and then analyzed. Is DATA ANALYSIS directly over BigData a myth or a reality?
  4. Do you have examples where BigData analysis has meant more than the VOLUME reference: how about Variety or Velocity?  Do we address these directly?  Are we really working on truly higher dimensionality data these days? Give examples
  5. What do you or your companies think about DEEP LEARNING?  Is it hype, is it reality? is it a game changer?
  6. What does a company need to do effective Algorithmic Data Analysis over BigData?  Have we gotten to production-class data like we did in the traditional structured data world?
  7. Whatever happened to model management and maintenance? What happens to models over time? Who is best at managing them?
  8. The DATA LAKE:  a really useful sustainable source of information and learning, or a TOXIC SWAMP waiting to happen after the first few data loads?  What is missing in Data Lake implementations?  Why are they so messy?
  9.  [via Richard Rovner] Data analytics is especially valuable when it drives real-time decision making, in which “real-time” matches the velocity of the BigData.  Can people do this today?  Are you seeing real examples of deployed analytics in real-time systems?
  10. [via Richard Rovner]  There is far more machine-generated data than human-generated data and the IOT will only accelerate the trend.  How are you seeing this being addressed in BigData applications today?
  11. [via Ingo Mierswa] What is holding us back?  Why is still not every decision made automatically or augmented by machine learning?  Is complexity of the field or the tools and issue here?
  12. [via Ingo Mierswa] How to deal with ethical concerns and how can tools help to overcome those?
  13. [via Ingo Mierswa] How did open source change the tool landscape for big data analytics?  Is it actually important at all or just another and more modern delivery/business model?
  14. [via Ingo Mierswa] How can tools support the complete analytics lifecycle from prototyping to getting into production use?  Is this even realistic?


Please Send in your questions!

Have a better suggestion for a question?  Please send it in:  Please place your questions in the comments below, or send me a direct message or Tweet at @usamaf with a suggestion.  See you all on Tuesday 2:45PM – August 16, 2016 at the Yosemite room in the Hilton San Francisco…

The introductory article in this blog series can be found here.

Read the next Blog in the series by clicking HERE
Read about the Invited Panel: BigData Needs Big Dreamers: Lessons from BigData Investors

Online Source  / PDF

Leave a Reply