The 2015 U-M Undergraduate Data Mining Competition

The U-M Department of Statistics is sponsoring a data mining competition open to all U-M undergraduates. Students may participate in the competition either as individuals or as part of a team.

Each participant or team will analyze a data set (described further below) and prepare a report. The reports will be judged by a panel of experts. Prizes will be awarded as follows:

When a team is awarded a prize, the prize amount will be divided equally among the team members.

Participants are encouraged to think creatively when exploring the data set. The goal is to identify an interesting, surprising, or insightful finding based on the data. This finding should then be carefully described, interpreted, and justified using quantitative data analysis methods.

The data set

Link to the data set

All contestants will analyze a data set containing information about over 100,000 "notable individuals" who lived at any time from antiquity to the modern era. The data set contains the following fields:

Variable         Description
PrsID Person-specific identifier
PrsLabel Name of the individual
BYear Year of birth
BLocLabel Birth location
BLocID Identifier of birth location
BLocLat Latitude of birth location
BLocLong Longitude of birth location
DYear Year of death
DLocLabel Location of death
DLocID Identifier of death location
DLocLat Latitude of death location
DLocLong Longitude of death location
Gender The individual's gender

The data set also includes the following indicator variables reflecting the activities for which the person is notable:

Variable Description
PerformingArts Performing arts activities
Creative Creative activities
Gov/Law/Mil/Act/Rel Activities relating to government, military, etc.
Academic/Edu/Health Academic or educational activities
Sports Sport-related activities
Business/Industry/Travel      Business or industry-related activities

Contest rules