DISQUS

Data Wrangling Blog: Slides & Thoughts from Hadoop World NYC

  • bearrito · 2 months ago
    Finding the article extremely informative.

    The only modification I have had to make so far has been the following:

    Under the Hive portion

    Change: LOAD DATA INPATH 'wikidump/' OVERWRITE INTO TABLE redirect_table;

    To : LOAD DATA INPATH 'hdfs://<localhost>:8020//user/root/wikidump' OVERWRITE INTO TABLE redirect_table;

    The difference here being the port number needs to be included otherwise semantic analysis fails.
  • Chris · 1 month ago
    Looking forward to the video of your talk. Love the blog and have been enjoying looking through the data sets you have listed. How come trending topics stops in August? Is it too expensive to keep it live?
  • pskomoroch · 1 week ago
    Chris: Sorry for the delay getting back to you, I like your blog as well. I froze the updates in August while I was out of the country and haven't turned them back on yet. I moved out to the west coast when I got back and have been busy with some new projects, but I'm planning on turning the updates back on after Christmas, possibly with some additional tracking of government trends.

    Cheers,

    -Pete