<?xml version="1.0" encoding="utf-8"?>
<rss xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title>Data Wrangling Blog - Latest Comments in Some Datasets Available on the Web</title><link>http://datawranglingblog.disqus.com/</link><description></description><atom:link href="https://datawranglingblog.disqus.com/some_datasets_available_on_the_web/latest.rss" rel="self"></atom:link><language>en</language><lastBuildDate>Mon, 18 Nov 2013 10:36:52 -0000</lastBuildDate><item><title>Re: Some Datasets Available on the Web</title><link>http://www.datawrangling.com/some-datasets-available-on-the-web#comment-1128157470</link><description>&lt;p&gt;one more link: &lt;a href="http://endb-consolidated.aihit.com/datasets.htm" rel="nofollow noopener" target="_blank" title="http://endb-consolidated.aihit.com/datasets.htm"&gt;http://endb-consolidated.ai...&lt;/a&gt; random 10,000 worldwide companies sampled from HitCompanies (all data in this DB extracted and updated automatically from WWW using AI and Machine Learning): company name and aliases, company description, industry tags, industry codes, registration numbers, addresses, phone numbers, VAT numbers, website, number of about/contact/management/product pages, incorporation date, team size, number of clients and partners, number of emails, number of key changes (client/partner changes, contact changes, people changes), and many more.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Yuri Burger</dc:creator><pubDate>Mon, 18 Nov 2013 10:36:52 -0000</pubDate></item><item><title>Re: Some Datasets Available on the Web</title><link>http://www.datawrangling.com/some-datasets-available-on-the-web#comment-281137723</link><description>&lt;p&gt;Where can i find example datasets for inner workings of an insurance company or bank ? &lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Modarres Zadeh</dc:creator><pubDate>Tue, 09 Aug 2011 07:38:02 -0000</pubDate></item><item><title>Re: Some Datasets Available on the Web</title><link>http://www.datawrangling.com/some-datasets-available-on-the-web#comment-281108468</link><description>&lt;p&gt;can any one help me to get a dataset which can describe the effort estimation of web project including the attributes&lt;br&gt; TypeProj- Categorical Type of project (new or enhancement).nLang- Ratio Number of different development languages used.DocProc -Categorical If project followed defined and documented process.ProImpr- Categorical If project team involved in a process improvement programme.Metrics -Categorical If project team part of a software metrics programme.DevTeam- Ratio Size of a project’s development team.TeamExp -Ratio Average team experience with the development language(s) employed.TotEffort -Ratio Actual total effort in person hours used to develop an application.EstEffort -Ratio Estimated total effort in person hours to develop an application.Accuracy -Categorical Procedure used to record effort data.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Sanjay Kushwaha</dc:creator><pubDate>Tue, 09 Aug 2011 06:12:39 -0000</pubDate></item><item><title>Re: Some Datasets Available on the Web</title><link>http://www.datawrangling.com/some-datasets-available-on-the-web#comment-32777719</link><description>&lt;p&gt;hi &lt;br&gt;can anyone tell me how i can obtain data sets for movie ratings to import into WEKA&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Vinoth</dc:creator><pubDate>Fri, 05 Feb 2010 15:50:47 -0000</pubDate></item><item><title>Re: Some Datasets Available on the Web</title><link>http://www.datawrangling.com/some-datasets-available-on-the-web#comment-27722017</link><description>&lt;p&gt;i need Boolean dataset for association mining for my MTech project.Please provide me the address for the same. It will be a great help&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">manoj</dc:creator><pubDate>Fri, 01 Jan 2010 01:03:12 -0000</pubDate></item><item><title>Re: Some Datasets Available on the Web</title><link>http://www.datawrangling.com/some-datasets-available-on-the-web#comment-25914641</link><description>&lt;p&gt;I need dataset image fingerprint for free,,,where I can find it?&lt;br&gt;thank u&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">endah</dc:creator><pubDate>Tue, 15 Dec 2009 21:10:07 -0000</pubDate></item><item><title>Re: Some Datasets Available on the Web</title><link>http://www.datawrangling.com/some-datasets-available-on-the-web#comment-17261198</link><description>&lt;p&gt;I need RFID Supply Chain data for my research. Where can I find it??&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Mandy </dc:creator><pubDate>Wed, 23 Sep 2009 21:47:29 -0000</pubDate></item><item><title>Re: Some Datasets Available on the Web</title><link>http://www.datawrangling.com/some-datasets-available-on-the-web#comment-12681520</link><description>&lt;p&gt;this is rajeswari., &lt;br&gt;i am  doing a project in association rule mining in datamining for time related data. so i need a dataset with time related data ie,, temporal &lt;a href="http://data.so" rel="nofollow noopener" target="_blank" title="data.so"&gt;data.so&lt;/a&gt; please send me time related &lt;a href="http://data.it" rel="nofollow noopener" target="_blank" title="data.it"&gt;data.it&lt;/a&gt;'s really helpful for my data&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">rajeswari</dc:creator><pubDate>Wed, 15 Jul 2009 06:39:22 -0000</pubDate></item><item><title>Re: Some Datasets Available on the Web</title><link>http://www.datawrangling.com/some-datasets-available-on-the-web#comment-12612208</link><description>&lt;p&gt;Dear all,&lt;/p&gt;&lt;p&gt;where can i find the dataset for manufacturing?? such as 'defect' or 'not defect' prediction..please ..help me..i am in urgent condition...thx b4 &lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">halim</dc:creator><pubDate>Mon, 13 Jul 2009 20:43:46 -0000</pubDate></item><item><title>Re: Some Datasets Available on the Web</title><link>http://www.datawrangling.com/some-datasets-available-on-the-web#comment-12432470</link><description>&lt;p&gt;the site u mentioned contains the required data for my project.i thank u for this great help&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">uma</dc:creator><pubDate>Fri, 10 Jul 2009 04:22:01 -0000</pubDate></item><item><title>Re: Some Datasets Available on the Web</title><link>http://www.datawrangling.com/some-datasets-available-on-the-web#comment-12303076</link><description>&lt;p&gt;The FDA has some data like that:&lt;/p&gt;&lt;p&gt;&lt;a href="http://www.fda.gov/Drugs/GuidanceComplianceRegulatoryInformation/Surveillance/AdverseDrugEffects/ucm082193.htm" rel="nofollow noopener" target="_blank" title="http://www.fda.gov/Drugs/GuidanceComplianceRegulatoryInformation/Surveillance/AdverseDrugEffects/ucm082193.htm"&gt;http://www.fda.gov/Drugs/Gu...&lt;/a&gt;&lt;/p&gt;&lt;p&gt;Some analogous time series data might be worth looking at as well:&lt;/p&gt;&lt;p&gt;&lt;a href="http://delicious.com/pskomoroch/timeseries+dataset" rel="nofollow noopener" target="_blank" title="http://delicious.com/pskomoroch/timeseries+dataset"&gt;http://delicious.com/pskomo...&lt;/a&gt;&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Pete Skomoroch</dc:creator><pubDate>Wed, 08 Jul 2009 03:33:11 -0000</pubDate></item><item><title>Re: Some Datasets Available on the Web</title><link>http://www.datawrangling.com/some-datasets-available-on-the-web#comment-12301290</link><description>&lt;p&gt;where can i get dataset for mining unexpected temporal association rules(eg:application in adverse drug reaction)&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">uma</dc:creator><pubDate>Wed, 08 Jul 2009 01:48:29 -0000</pubDate></item><item><title>Re: Some Datasets Available on the Web</title><link>http://www.datawrangling.com/some-datasets-available-on-the-web#comment-11078441</link><description>&lt;p&gt;A personal favorite: ITRDB: &lt;a href="ftp://ftp.ncdc.noaa.gov/pub/data/paleo/treering/" rel="nofollow noopener" target="_blank" title="ftp://ftp.ncdc.noaa.gov/pub/data/paleo/treering/"&gt;ftp://ftp.ncdc.noaa.gov/pub...&lt;/a&gt;&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Ken</dc:creator><pubDate>Fri, 13 Feb 2009 18:11:49 -0000</pubDate></item><item><title>Re: Some Datasets Available on the Web</title><link>http://www.datawrangling.com/some-datasets-available-on-the-web#comment-11078440</link><description>&lt;p&gt;Tim,&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;I like the site, and the capability to download in various formats.  A REST or soap API would be nice, or at least an index page for each format with direct paths to the individual downloads.&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;-Pete&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Peter Skomoroch</dc:creator><pubDate>Mon, 10 Nov 2008 20:49:53 -0000</pubDate></item><item><title>Re: Some Datasets Available on the Web</title><link>http://www.datawrangling.com/some-datasets-available-on-the-web#comment-11078439</link><description>&lt;p&gt;what do you think of &lt;a href="http://data.un.org" rel="nofollow noopener" target="_blank" title="http://data.un.org"&gt;http://data.un.org&lt;/a&gt; ?&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Tim</dc:creator><pubDate>Mon, 10 Nov 2008 20:33:01 -0000</pubDate></item><item><title>Re: Some Datasets Available on the Web</title><link>http://www.datawrangling.com/some-datasets-available-on-the-web#comment-11078434</link><description>&lt;p&gt;I don't know whether you've seen &lt;a href="http://www.ckan.net/" rel="nofollow noopener" target="_blank" title="http://www.ckan.net/"&gt;CKAN (Comprehensive Knowledge Archive Network)&lt;/a&gt;. This is a project started by the Open Knowledge Foundation (of which I'm a part) and was launched about a year ago and seeks to perform exactly the type of registry task you've started upon here (though limited to &lt;a href="http://opendefinition.org/" rel="nofollow noopener" target="_blank" title="http://opendefinition.org/"&gt;open material&lt;/a&gt; only). As the blurb on the front-page says:&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;CKAN is the Comprehensive Knowledge Archive Network, a registry of open knowledge packages and projects (and a few closed ones). CKAN is the place to search for open knowledge resources as well as register your own – be that a set of Shakespeare's works, a global population density database, the voting records of MPs, or 30 years of US patents.&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;Those familiar with freshmeat or CPAN can think of CKAN as providing an analogous service for open knowledge.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Rufus Pollock</dc:creator><pubDate>Thu, 10 Apr 2008 05:29:06 -0000</pubDate></item><item><title>Re: Some Datasets Available on the Web</title><link>http://www.datawrangling.com/some-datasets-available-on-the-web#comment-11078433</link><description>&lt;p&gt;What about including the wiki &lt;a href="http://www.numberzoom.com/" rel="nofollow noopener" target="_blank" title="http://www.numberzoom.com/"&gt;http://www.numberzoom.com/&lt;/a&gt; which is a user-contributed phone numbers database.  It's mosty reverse Caller ID for looking up what telemarketers or collection agencies have called, but there is no reason why other numbers wouldn't be on the site.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Lili</dc:creator><pubDate>Wed, 09 Apr 2008 19:32:14 -0000</pubDate></item><item><title>Re: Some Datasets Available on the Web</title><link>http://www.datawrangling.com/some-datasets-available-on-the-web#comment-11078432</link><description>&lt;p&gt;Brent sorry I missed that, this data will be useful for some identity matching projects I'm testing.  I just found the programmable web description of your api as well: &lt;a href="http://www.programmableweb.com/api/cogmap" rel="nofollow noopener" target="_blank" title="http://www.programmableweb.com/api/cogmap"&gt;http://www.programmableweb.com/api/cogmap&lt;/a&gt;&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Peter Skomoroch</dc:creator><pubDate>Wed, 09 Apr 2008 15:37:26 -0000</pubDate></item><item><title>Re: Some Datasets Available on the Web</title><link>http://www.datawrangling.com/some-datasets-available-on-the-web#comment-11078431</link><description>&lt;p&gt;It's in there!  &lt;a href="http://www.cogmap.com/blog/2008/03/04/cogmap-apis/" rel="nofollow noopener" target="_blank" title="http://www.cogmap.com/blog/2008/03/04/cogmap-apis/"&gt;http://www.cogmap.com/blog/2008/03/04/cogmap-apis/&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;-- brent&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Brent</dc:creator><pubDate>Wed, 09 Apr 2008 15:29:03 -0000</pubDate></item><item><title>Re: Some Datasets Available on the Web</title><link>http://www.datawrangling.com/some-datasets-available-on-the-web#comment-11078430</link><description>&lt;p&gt;Brent,&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;Just added Cogmap to my dataset bookmarks... any chance on releasing a raw dataset or REST api to fetch raw orgchart data?&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;-Pete&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Peter Skomoroch</dc:creator><pubDate>Wed, 09 Apr 2008 14:24:48 -0000</pubDate></item><item><title>Re: Some Datasets Available on the Web</title><link>http://www.datawrangling.com/some-datasets-available-on-the-web#comment-11078429</link><description>&lt;p&gt;The omission of cogmap makes me sad!  Cogmap provides organization chart data for thousands of companies and exposes it all through a variety of web services.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Brent</dc:creator><pubDate>Wed, 09 Apr 2008 13:50:48 -0000</pubDate></item><item><title>Re: Some Datasets Available on the Web</title><link>http://www.datawrangling.com/some-datasets-available-on-the-web#comment-11078428</link><description>&lt;p&gt;Looks like Google is going to start providing access to loads of open sourced data sets (&lt;a href="http://blog.wired.com/wiredscience/2008/01/google-to-provi.html)" rel="nofollow noopener" target="_blank" title="http://blog.wired.com/wiredscience/2008/01/google-to-provi.html)"&gt;http://blog.wired.com/wired...&lt;/a&gt;.&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Guest</dc:creator><pubDate>Sat, 19 Jan 2008 11:50:21 -0000</pubDate></item><item><title>Re: Some Datasets Available on the Web</title><link>http://www.datawrangling.com/some-datasets-available-on-the-web#comment-11078427</link><description>&lt;p&gt;skj,&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;The WebBase Project link includes some chat data. It would be pretty easy to crawl for that data, provided terms of use for the chat sites are followed.  Here is a recent list of hosts Stanford WebBase crawled, which includes chat sites (this link might not be permanent):&lt;br&gt;&lt;br&gt;&lt;a href="http://dbpubs.stanford.edu:8090/~testbed/doc2/WebBase/crawl_lists/crawled_hosts.0403" rel="nofollow noopener" target="_blank" title="http://dbpubs.stanford.edu:8090/~testbed/doc2/WebBase/crawl_lists/crawled_hosts.0403"&gt;http://dbpubs.stanford.edu:8090/~testbed/doc2/WebBase/crawl_lists/crawled_hosts.0403&lt;/a&gt;&lt;br&gt;&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Peter Skomoroch</dc:creator><pubDate>Sat, 19 Jan 2008 08:50:14 -0000</pubDate></item><item><title>Re: Some Datasets Available on the Web</title><link>http://www.datawrangling.com/some-datasets-available-on-the-web#comment-11078426</link><description>&lt;p&gt;civilian,&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;The LDC site was up yesterday.  It may have been hammered by reddit/&lt;a href="http://del.icio.us" rel="nofollow noopener" target="_blank" title="del.icio.us"&gt;del.icio.us&lt;/a&gt; users?  I think some of the datasets they have are extremely large (for example the google N-grams), so there is a handling fee for non-commercial researchers.  As far as commercial use fees, many data providers restrict use entirely.  Open access to more data would be great ... except where privacy issues are involved.  Sometimes there are also competitive reasons for restrictive licenses.&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;See more on the issues here:&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="http://en.wikipedia.org/wiki/Open_data" rel="nofollow noopener" target="_blank" title="http://en.wikipedia.org/wiki/Open_data"&gt;http://en.wikipedia.org/wiki/Open_data&lt;/a&gt;&lt;br&gt;&lt;a href="http://en.wikipedia.org/wiki/Data_privacy" rel="nofollow noopener" target="_blank" title="http://en.wikipedia.org/wiki/Data_privacy"&gt;http://en.wikipedia.org/wiki/Data_privacy&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;related discussion:&lt;/p&gt;&lt;p&gt;&lt;br&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="http://news.ycombinator.com/item?id=100197" rel="nofollow noopener" target="_blank" title="http://news.ycombinator.com/item?id=100197"&gt;http://news.ycombinator.com/item?id=100197&lt;/a&gt;&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Peter Skomoroch</dc:creator><pubDate>Sat, 19 Jan 2008 08:35:17 -0000</pubDate></item><item><title>Re: Some Datasets Available on the Web</title><link>http://www.datawrangling.com/some-datasets-available-on-the-web#comment-11078425</link><description>&lt;p&gt;Are there any datasets of chat logs?  Chat conversations (from IRC or otherwise)?&lt;/p&gt;</description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">skj</dc:creator><pubDate>Sat, 19 Jan 2008 05:44:20 -0000</pubDate></item></channel></rss>