And that got me wondering: just what other interesting data sets are out there? 2018!
  • How should an outline for a paper look like, Ideas for data set paper


    on-line. Challenges : This dataset is a great challenge. Challenges : This is a difficult data set. The winner of the KDDcup99 competition used C5 decision trees in

    combination with boosting and bagging. Also, we are interested in detecting 'bad' behavior as soon as possible. What you really should do is think about this last item yourself, and then work on a great paper using your data. You can browse World Bank data sets directly, without registering. Given the extremely high dimension of the input (5000 voxels times 8 images) to the classifier, it is sensible to explore methods for reducing this to a small number of dimension. When the genes are treated as attributes, the dimensionality of the feature space is very high compared to the number of cases. This challenge asks you to predict student performance on mathematical problems from logs of student interaction with Intelligent Tutoring Systems. UCI Machine Learning Repository The UCI Machine Learning Repository is one of the oldest sources of data sets on the web. Classify sky objects as stars or galaxies (use the sdss classification as label). Objections: This is an artificial data set, not a genuine data-mining problem. The data mining task is to predict whether a gene belongs to one of the 5 functional classes, based on its expression levels. Available locally at challenges: Semi-structured time series data with a textual component (with all problems of ambiguity etc). Each pixel represent an area on the earth's surface of 80*80 paper metres. US Weather History historical weather data for the. Have a lot of nuance, and many possible angles to take. Are trying to predict more or less cluster together. There are 38 different attack types, belonging to 4 main categories. The data describes the activity of some (hidden) biological system in yeast cells.

    Which used to for run a" The dataset provides a variety of paper details about the several genes of one particular type of organism. The 2 MB trainin data and 119 MB test data References. This is proposed by one of the CDT parterns 191779 records, large number of features, however. Also, feature extraction will be necessary, and not available in data set form. Task, a good description of difficulties with the data can also be found here 2 MB, then compare at least two different classifiers to identify the kind of leukemia of the sample. An increasing amount is generated in realtime. A credit card company might be interested in identifying customers that are likely to go bankrupt. Train and compare at least two classifiers 95412 training cases and 96367 test cases 481 attributes 236.

    Business idea : figure out what sort of information gets leaked in the emails.Well, Stanford has put online Arxiv s High Energy Physics paper.

    And some of them change more than others. The" also, contestants must pay attention to temporal relationships and as well as conceptual relationships among items. Astronomical language so it is quite difficult to understand what the data are all about and how the previous research has been caried out. This is the fifth post in a series of posts on how to build a Data Science Portfolio. Try at least 2 different classifiers. Which means that splitting into a training and a test set is not really a good option although it has been done for the very similar leukemia dataset. Attributes on Wikipedia change over time. Data are given about the previous and the current mailing campaign. Challenges, these are astronomical data, the 2000 Text Retrieval Conference,.

    That version of the data set is also available locally at Task: Predict whether a flight will arrive on time based on carrier, route, date, etc.Train at least two classifiers to predict the probability of Correct First Attempt for given task.Also the remaining 8 attributes consist of discrete variables, most of them related to the proteins coded by the gene,.g.