Re: Conversation with Eric Siegel on Predictive Analytics World

Peter Skomoroch — Thu, 29 Jan 2009 20:51:31 -0000

Mike, good to hear from you - I'll check out that meetup. I was looking for some more details around Hadoop as well. I get the feeling that Hadoop and mapreduce haven't yet penetrated too deeply into the traditional SAS/SPLUS/Matlab predictive modeling world. I've heard some of anecdotal evidence from folks at Amazon and other places about large companies outside the web/academic sectors using Hadoop and EC2 for processing big data. Also, outside of the web world, I think many companies are often reluctant to publicize their secret sauce. Plenty of industries have been using MPI on big data for a while. At the same time, I'm often contacted by people who think they have a large data problem, but really don't ("large" sometimes turns out to be slightly too big to fit into Excel or Matlab).

What I was trying to get at with the Norvig and Netflix references is that sometimes it turns out that downsampling and running on your desktop machine isn't enough - you will actually get different(better) answers with larger datasets. My background is in physics, so my first urge is often to simplify a problem down to the point where I can do some quick calculations on my laptop and come up with an answer. In the world of big messy data, that approach doesn't always work.

Your comment about "companies for whom data is an integral part of their business" also got me thinking. I wouldn't exclude traditional firms like automakers and retailers. There is a sea of data out there now waiting to be tapped by anyone who needs to make better decisions faster, even if they don't produce all that data themselves.

Re: Conversation with Eric Siegel on Predictive Analytics World

Mike Driscoll — Thu, 29 Jan 2009 20:20:19 -0000

Pete - Great post and thanks for the shout-out regarding our R Users' Meetup at PAW!

I was surprised that Eric doesn't yet see Hadoop and other Big Data tools playing a big role in the business analytics space.

I think this is dependent on the data being analyzed: customer and sales data is high value, but (relatively) small scale for all but the largest of firms.

In contrast, firms for whom data is an integral part of their business -- Netflix, financial firms, life sciences firms -- have much larger data sets, and need bigger tools.

Data Wrangling Blog - Latest Comments in Conversation with Eric Siegel on Predictive Analytics World

Re: Conversation with Eric Siegel on Predictive Analytics World

Re: Conversation with Eric Siegel on Predictive Analytics World