<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>Data Wrangling Blog - Latest Comments in Wikipedia Page Traffic Statistics Dataset</title><link>http://datawranglingblog.disqus.com/</link><description></description><atom:link href="https://datawranglingblog.disqus.com/wikipedia_page_traffic_statistics_dataset/latest.rss" rel="self"></atom:link><language>en</language><lastBuildDate>Tue, 27 Apr 2010 05:26:16 -0000</lastBuildDate><item><title>Re: Wikipedia Page Traffic Statistics Dataset</title><link>http://www.datawrangling.com/wikipedia-page-traffic-statistics-dataset#comment-46894856</link><description>&lt;p&gt;Where can i get the number of views for each article. I want to get the views for many articles.  Is there any such data set available? &lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Vinod</dc:creator><pubDate>Tue, 27 Apr 2010 05:26:16 -0000</pubDate></item><item><title>Re: Wikipedia Page Traffic Statistics Dataset</title><link>http://www.datawrangling.com/wikipedia-page-traffic-statistics-dataset#comment-38711580</link><description>&lt;p&gt;good job!&lt;/p&gt;&lt;p&gt;but how did you get the stats of wikipedia?&lt;/p&gt;&lt;p&gt;greetz, tobe&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">tobe</dc:creator><pubDate>Tue, 09 Mar 2010 13:55:25 -0000</pubDate></item><item><title>Re: Wikipedia Page Traffic Statistics Dataset</title><link>http://www.datawrangling.com/wikipedia-page-traffic-statistics-dataset#comment-27101607</link><description>&lt;p&gt;The easiest way would be to access the data from a Linux instance like Ubuntu... With some legwork, you should be able to use Samba somehow to access the volume from Windows - I try to stay away from Windows these days, too many headaches:  &lt;a href="http://polishlinux.org/linux/ext3-reiserfs-xfs-in-windows-thanks-to-colinux/" rel="nofollow noopener" target="_blank" title="http://polishlinux.org/linux/ext3-reiserfs-xfs-in-windows-thanks-to-colinux/"&gt;http://polishlinux.org/linu...&lt;/a&gt;&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Pete Skomoroch</dc:creator><pubDate>Wed, 23 Dec 2009 14:36:03 -0000</pubDate></item><item><title>Re: Wikipedia Page Traffic Statistics Dataset</title><link>http://www.datawrangling.com/wikipedia-page-traffic-statistics-dataset#comment-27068542</link><description>&lt;p&gt;Hi i am trying to access the wiki page traffic data but cannot access it.I wanted to access the data via Amazon Management Console via Windows 7 but could not. I first created an instance and then attached the snapshot on that instance. When I logged into that Instance via Remote Desktop, I did not see the datasets anywhere.&lt;/p&gt;&lt;p&gt;I am very new to this realm and do not have too much of knowledge about how the AWS works. I have coded heavily in matlab and Winbugs pertaining to statistics but this seems like a totally different ball game.&lt;/p&gt;&lt;p&gt;Your advice would really help me as (how to access via Amazon Management Console) I wanted to use the page traffic data for an academic project.&lt;/p&gt;&lt;p&gt;&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">avi</dc:creator><pubDate>Wed, 23 Dec 2009 03:41:15 -0000</pubDate></item><item><title>Re: Wikipedia Page Traffic Statistics Dataset</title><link>http://www.datawrangling.com/wikipedia-page-traffic-statistics-dataset#comment-12861856</link><description>&lt;p&gt;Malar,&lt;/p&gt;&lt;p&gt;Lots of medical time series data is available, check my &lt;a href="http://del.icio.us" rel="nofollow noopener" target="_blank" title="del.icio.us"&gt;del.icio.us&lt;/a&gt; links:&lt;/p&gt;&lt;p&gt;&lt;a href="http://delicious.com/pskomoroch/dataset" rel="nofollow noopener" target="_blank" title="http://delicious.com/pskomoroch/dataset"&gt;http://delicious.com/pskomo...&lt;/a&gt;&lt;br&gt;&lt;a href="http://delicious.com/pskomoroch/dataset+timeseries" rel="nofollow noopener" target="_blank" title="http://delicious.com/pskomoroch/dataset+timeseries"&gt;http://delicious.com/pskomo...&lt;/a&gt;&lt;/p&gt;&lt;p&gt;Cardiac, EKG, neuron spike trains, there are a bunch of datasets out there depending on what you are trying to do.   Some good neural time series data is here: &lt;a href="http://www.crcns.org/data-sets" rel="nofollow noopener" target="_blank" title="http://www.crcns.org/data-sets"&gt;http://www.crcns.org/data-sets&lt;/a&gt;&lt;/p&gt;&lt;p&gt;also check out:&lt;/p&gt;&lt;p&gt;&lt;a href="http://www.neural-forecasting-competition.com/" rel="nofollow noopener" target="_blank" title="http://www.neural-forecasting-competition.com/"&gt;http://www.neural-forecasti...&lt;/a&gt;&lt;/p&gt;&lt;p&gt;-Pete&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Pete Skomoroch</dc:creator><pubDate>Sat, 18 Jul 2009 00:54:46 -0000</pubDate></item><item><title>Re: Wikipedia Page Traffic Statistics Dataset</title><link>http://www.datawrangling.com/wikipedia-page-traffic-statistics-dataset#comment-12861524</link><description>&lt;p&gt;Hi,&lt;br&gt;Now i am working on temporal data mining.I need temporal(Time series)  data set related to medical field or share market.Can you provide some data sets for me....&lt;br&gt;&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">malar</dc:creator><pubDate>Sat, 18 Jul 2009 00:28:53 -0000</pubDate></item><item><title>Re: Wikipedia Page Traffic Statistics Dataset</title><link>http://www.datawrangling.com/wikipedia-page-traffic-statistics-dataset#comment-11078501</link><description>&lt;p&gt;Rene,&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;I should have mentioned in the blog post, the Wikidump directory of the dataset has the raw wikipedia content already parsed into hadoop readable tab delimited files:&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;692M page.txt&lt;br&gt;115M redirect.txt&lt;br&gt;987M revision.txt&lt;br&gt; 17G text.txt&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;I'll edit the post to reflect this...&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;-Pete&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Peter Skomoroch</dc:creator><pubDate>Sun, 14 Jun 2009 13:10:04 -0000</pubDate></item><item><title>Re: Wikipedia Page Traffic Statistics Dataset</title><link>http://www.datawrangling.com/wikipedia-page-traffic-statistics-dataset#comment-11078500</link><description>&lt;p&gt;Hi,&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;I'm currently trying out hadoop - for fun/learning. I'm parsing the full .xml.bz2 datasets (&lt;a href="http://simple.wikipedia.org" rel="nofollow noopener" target="_blank" title="simple.wikipedia.org"&gt;simple.wikipedia.org&lt;/a&gt; / &lt;a href="http://en.wikipedia.org" rel="nofollow noopener" target="_blank" title="en.wikipedia.org"&gt;en.wikipedia.org&lt;/a&gt;).&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;You might be interested in the reader &lt;a href="http://github.com/rtreffer/hadoop-wikimedia-fun/tree/9770c428128fbfc897fd6cde97442f3d18b27791/src/de/measite/wiki/input" rel="nofollow noopener" target="_blank" title="http://github.com/rtreffer/hadoop-wikimedia-fun/tree/9770c428128fbfc897fd6cde97442f3d18b27791/src/de/measite/wiki/input"&gt;http://github.com/rtreffer/...&lt;/a&gt; - it seems to work (12% on &lt;a href="http://en.wikipedia.org" rel="nofollow noopener" target="_blank" title="en.wikipedia.org"&gt;en.wikipedia.org&lt;/a&gt; atm) but requires bzip2 split support (trunk + Hadoop-4012-version9.patch).&lt;br&gt;It should be easy to alter (drop any text). Edit counts would be quite easy with this...&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;Would be interesting to see how edit counts compare to page views....&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;Please note that I'm quite new to hadoop :)&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;Regards,&lt;br&gt;  Rene&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">treffer</dc:creator><pubDate>Sun, 14 Jun 2009 11:04:14 -0000</pubDate></item></channel></rss>