<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>Data Wrangling Blog - Latest Comments in Conversation with Eric Siegel on Predictive Analytics World </title><link>http://datawranglingblog.disqus.com/</link><description></description><atom:link href="https://datawranglingblog.disqus.com/conversation_with_eric_siegel_on_predictive_analytics_world/latest.rss" rel="self"></atom:link><language>en</language><lastBuildDate>Thu, 29 Jan 2009 20:51:31 -0000</lastBuildDate><item><title>Re: Conversation with Eric Siegel on Predictive Analytics World </title><link>http://www.datawrangling.com/conversation-with-eric-siegel-on-predictive-analytics-world#comment-11078491</link><description>&lt;p&gt;Mike, good to hear from you - I'll check out that meetup.  I was looking for some more details around Hadoop as well.  I get the feeling that Hadoop and mapreduce haven't yet penetrated too deeply into the traditional SAS/SPLUS/Matlab predictive modeling world.  I've heard some of anecdotal evidence from folks at Amazon and other places about large companies outside the web/academic sectors using Hadoop and EC2 for processing big data.  Also, outside of the web world, I think many companies are often reluctant to publicize their secret sauce.  Plenty of industries have been using MPI on big data for a while.   At the same time, I'm often contacted by people who think they have a large data problem, but really don't ("large" sometimes turns out to be slightly too big to fit into Excel or Matlab).&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;What I was trying to get at with the Norvig and Netflix references is that sometimes it turns out that downsampling and running on your desktop machine isn't enough - you will actually get different(better) answers with larger datasets.  My background is in physics, so my first urge is often to simplify a problem down to the point where I can do some quick calculations on my laptop and come up with an answer.  In the world of big messy data, that approach doesn't always work.&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;Your comment about "companies for whom data is an integral part of their business" also got me thinking.  I wouldn't exclude traditional firms like automakers and retailers.  There is a sea of data out there now waiting to be tapped by anyone who needs to make better decisions faster, even if they don't produce all that data themselves.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Peter Skomoroch</dc:creator><pubDate>Thu, 29 Jan 2009 20:51:31 -0000</pubDate></item><item><title>Re: Conversation with Eric Siegel on Predictive Analytics World </title><link>http://www.datawrangling.com/conversation-with-eric-siegel-on-predictive-analytics-world#comment-11078490</link><description>&lt;p&gt;Pete - Great post and thanks for the shout-out regarding our R Users' Meetup at PAW!&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;I was surprised that Eric doesn't yet see Hadoop and other Big Data tools playing a big role in the business analytics space.&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;I think this is dependent on the data being analyzed:  customer and sales data is high value, but (relatively) small scale for all but the largest of firms.&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;In contrast, firms for whom data is an integral part of their business -- Netflix, financial firms, life sciences firms -- have much larger data sets, and need bigger tools.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Mike Driscoll</dc:creator><pubDate>Thu, 29 Jan 2009 20:20:19 -0000</pubDate></item></channel></rss>