Datasets
Below you can find all the datasets. You can order them differently and filter them by tag easily.
Below you can find all the datasets. You can order them differently and filter them by tag easily.
Election Sport Politics STV MTurk Matching Combinatorial Experiment
Sort by: Newest first · Oldest first · Dataset name (A-Z) · Series Number
00026
The 2002 French Presidental Election Dataset was collected by Jean-Francois Laslier and Karine Van der Straeten. It consists of 2,597 approval ballots collected in parallel to the actual election in 6 different districts in France.
The approval votes were collected at a set of polling stations in France during the first round of voting in the 2002 French National Election. Voters in these districts were informed prior to the election that they would have the ability to cast an approval ballot along with their normal ballot for the election. Overall, over 75% of those who turned up to vote participated in the experiment. Each of the files represent one district voting on the same election. There are between 367 and 476 voters (2,597 in all) and 16 candidates. Additional details the method used to collect the data and results of analysis can be found in the required citation for the use of this dataset.
Consists of 6 data files: GylesNonains Orsay1 Orsay5 Orsay6 Orsay7 Orsay12
00037
This dataset contains the bids of reviewers over papers from the 2015, 2016 and 2021 Autonomous Agents and Multiagent Systems Conference.
For the years 2015 and 2016, inclusion in these data sets were explicitly opt-in; 2015 contains 9,817 bids of 201 reviewers over 613 papers; this represents about 40% of the actual 22,360 bids of 281 reviewers over 670 papers. The 2016 data contains 161 out of 393 reviewers with bids over 442 out of 550 papers. For the year 2021, 526 submissions, 71 SPC members, and 596 regular PC members passed the checks (not opting-out, etc...).
The bidding language for these conferences is yes/maybe/no/conflict. In order to make these more useful for PreLib users, we have converted them to categorical data of the form {yes} > {maybe} > {no response} > {no} > {conflict}. Note that not all years have the same categories. We are deeply grateful to the IFAAMAS board and Rafael Bordini, Edith Elkind, John Thangarajah, and David Shield for approving, coordinating, and providing this dataset. The 2021 data has been generously provided by Ulle Endriss.
Consists of 3 data files: AAMAS 2015 AAMAS 2016 AAMAS 2021 with roles
00009
This dataset contains the results of surveying students at AGH University of Science and Technology about their course preferences. Each student provided a rank ordering over all the courses with no missing elements. There are 9 courses to choose from in 2003 and 7 in 2004.
The data on this page has been donated by Piotr Faliszewski.
Consists of 2 data files: 2003 Registration 2004 Registration
00062
This dataset contains the results of a simple experiment regarding voting over landscape images with varying displaying order. There are 19 agents, each voting in two rounds. Eight images (alternatives) are denoted by A through H. In the first round, the images were displayed in the sequence A, B, C, D, E, F, G, H, while in the second round, the sequence was D, C, B, A, H, G, F, E.
To allow identify voters from the first to the second round, in addition to our standard file formats, we provide a CSV file that provides the preferences submitted by a voter in both rounds.
These data were donated by Honorata Sosnowska from the SGH Warsaw School of Economics. The work concerned with the data was supported by the SGH Warsaw School of Economics grant KAE/S21 and the National Center for Science grant UMO-2018/31/B/HS4/01005 Opus 16.
Consists of 3 data files: First alternative ordering: A, B, C, D, E, F, G, H Second alternative ordering: D, C, B, A, H, G, F, E Ranking of a respondent across the data files
00028
This dataset contains the results of the elections of the American Psychological Association between 1998 - 2009. The voters are allowed to rank any number of the 5 candidates without ties. Each of these elections have 5 candidates and between 13,318 and 20,239 voters.
These data were donated by Michal Regenwetter and Anna Popova from the University of Illinois at Urbana-Champaign. The work that analyzed this data was supported by National Science Foundation grants SES # 08-20009, ICES # 1216016 (PI: M. Regenwetter), the University Library at the University of Illinois at Urbana-Champaign (PI: A. Popova), and the Basic Research Program at HSE (S. Popov). We thank the American Psychological Association for permitting access to its election ballot data. More information and tools for analyzing this data can be found at the website of the Decision Making Laboratory at the University of Illinois.
Consists of 12 data files: APA_1998 APA_1999 APA_2000 APA_2001 APA_2002 APA_2003 APA_2004 and 5 more.
00016
The 2009 Aspen Data contains the results from the mayoral and city council elections held in Aspen, CO in 2009. The data contains two different elections with about 2,500 votes each over 5 and 11 candidates.
Note that these elections were conducted under a ranked voting system which allowed blank entries. In processing this data for PrefLib we have ignored blanks and only report the order over the candidates.
The data on this page was donated by Jeffrey O'Neill who runs the site OpenSTV.org.
Consists of 4 data files: Aspen City Council 2009 Aspen City Council 2009 Aspen Mayor 2009 Aspen Mayor 2009
00017
The 2010 Berkley Data contains the results from a city council election (District 7) in Berkley, CA. The set contains about 4,000 votes over 4 candidates.
Note that these elections were conducted under a ranked voting system which allowed blank entries. In processing this data for PrefLib we have ignored blanks and only report the order over the candidates.
The data on this page was donated by Jeffrey O'Neill who runs the site OpenSTV.org.
Consists of 2 data files: 2010 Berkeley City Council - District 7 2010 Berkeley City Council - District 7
00041
This dataset contains an election generated from board game charts.
The underlying board games data (kaggle.com/mseinstein/bgg_top2000) contains a weekly ranking of the 2000 most popular board games on boardgamegeek.com between October 2018 and December 2021. We created a single election where each game is a candidate and each vote corresponds to the ranking of the games in one week.
The patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete.
This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.
Consists of 1 data file: Alltime
00042
This dataset contains elections generated from boxing world rankings. The underlying boxing data (kaggle.com/martj42/ufc-rankings) contains the Ultimate Fighting Championship rankings of the top 16 fighters in twelve different weight classes in different weeks between February 2013 and August 2021. For each year and weight class, we created an election where each fighter is a candidate and each vote corresponds to the ranking of the fighters in one week.
Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete. The name of each patch starts with the relevant weight class followed by the year.
Other sports world rankings have been used to create the table tennis world rankings and tennis world rankings datasets.
This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.
Consists of 99 data files: Bantamweight 2013 Bantamweight 2014 Bantamweight 2015 Bantamweight 2016 Bantamweight 2017 Bantamweight 2018 Bantamweight 2019 and 92 more.
00035
This dataset was collected by Green and Rao (1972), and is a standard example in the literature on multidimensional unfolding (which is about embedding preferences in Euclidean space). They obtained strict preference rankings over a collection of 15 sweet breakfast items (such as toast, muffins, donuts) from "a group of 42 respondents, 21 Wharton MBA students and their wives. The questionnaire was self-administered separately by husband and wife. All subjects independently filled out the same questionnaire and received compensation for their efforts."
Green and Rao asked for these preferences in 6 different situations: "Overall preferences", "When I'm having a breakfast consisting of juice, bacon and eggs, and beverage", "When I'm having a breakfast consisting of juice, cold cereal, and beverage", "When I'm having a breakfast consisting of juice, pancakes, sausage, and beverage", "Breakfast, with beverage only", "At snack time, with beverage only". The rankings for each of these situations are provided in separate data patches. A .csv file presents the ranking of a same respondent across the different data files; rankings in odd positions (1st, 3rd, ...) come from the MBA students, the ranking in the following line from that student's wife.
This dataset was digitized by Dominik Peters from the listings provided in the appendix of the 1972 book.
Consists of 7 data files: Ranking of a respondent across the data files Overall preference Juice, bacon and eggs, beverage Juice, cold cereal, beverage Juice, pancakes, sausage, beverage Breakfast, with beverage only At snack time, with beverage only
00005
The 2009 Burlington, Vermont Mayoral Election Data is posted online at www.rangevoting.org. It contains a number of interesting features when evaluated with the IRV method. Namely, the majority candidate in the first round does not emerge as the winner of the election.
The 2006 Burlington, Vermont Mayoral data presented here was donated by Jeffrey O'Neill who runs the site OpenSTV.org.
Consists of 4 data files: 2006 Burlington Mayoral Election 2006 Burlington Mayoral Election 2009 Burlington Mayoral Election 2009 Burlington Mayoral Election
00059
The dataset consists of two pre-camp surveys, conducted for youth summer camps in 2022 and 2023 in Poland. Several weeks before each camp, campers were asked to fill out a survey, which included (among others) two questions related to CCM-genre music pieces. Responses to each question form an approval election. So, for each year, we obtained two elections.
In the first question, survey participants--that is, voters---were presented approximately 80 song titles---candidates. The participants were asked to select at least 15 of their favorite ones to sing during camp activities. The lower bound on the number of selections was not enforced. In the second question, which involved far fewer songs (around ten), the participants were asked to select songs they would like to learn. This time, there was no indication of the number of choices, hence, some participants selected none.
The questions remained the same in all years, however, the presented songs were different. Specifically, in 2022, the first and second questions involved 78 and 8 songs, respectively. A year later, the respective numbers were 82 and 10. The survey had 39 participants in 2022 and 56 in 2023.
The data on this page has been donated by Andrzej Kaczmarczyk.
Consists of 4 data files: Camp CCM Songs 2022 Favorite Camp CCM Songs 2022 New Camp CCM Songs 2023 Favorite Camp CCM Songs 2023 New
00034
This dataset contains noisy input from two surveys, one about cost of living and one about population, of 392 individuals over 36 alternatives for cost of living and 48 alternatives for population. Each individual provided a ranking of six given cities in terms of cost of living and a ranking of six countries in terms of population.
The data were collected among participants of the 3rd PatrasIQ research and technology exhibition, in Patras, Greece in April 2016. We received input from 392 volunteers; each of them was given a random bundle of six cities (from a pool of 36) and a random bundle of six countries (from a pool of 48), and was asked to give a strict ranking of the given cities and countries in terms of his/her estimation about their cost of living indices and population (in decreasing order), respectively.
In the cost of living treatment each city appears in at least 57 and at most 70 bundles/votes. The alternative ids define a ground truth, i.e., a strict ranking of all 36 cities according to cost of living index data retrieved from numbeo.com in April 2016. In the population treatment Each country appears in at least 47 and at most 52 bundles/votes. The alternative ids define a ground truth, i.e., a strict ranking of all 48 countries according to population data retrieved from wikipedia.org in April 2016.
The data on this page has been donated by Iannis Caragiannis.
Consists of 2 data files: Cost of Living Population
00015
This dataset contains the results of comparing websearches across Bing, Google, Yahoo, and Ask. This data is provided by Robert Bredereck at TU Berlin. Robert provides tools to compute Kemeny rankings on this data at his website at TU Berlin.
These data files differ from the other set of web data in that these files are forced to be complete. This means that the results are restricted to only those candidates (sites) that appear in all three datasets. The data files marked big contain around 200 (max 242) candidates each while the data files marked small contain between 10 and 50 candidates. The search querys are shown in the names of the individual data files below. For the WebImpact files the number of search results for a particular term were used to creage a complete ranking over the search terms. These files measure the webimpact of various world cities and countries. We have extended this data into tournament graphs and weighted majoirty graphs.
Consists of 79 data files: webimpact capitals4 complete webimpact capitals5 complete webimpact nations4 complete webimpact nations5 complete webimpact richest4 complete webimpact richest5 complete websearch big Death+Valley complete and 72 more.
00055
This dataset contains elections generated from weekly power rankings. Specifically, the underlying power ranking data (kaggle.com/masseyratings/rankings) contains weekly power rankings of college basketball teams (between 2001 and 2021), college baseball teams (between 2010 and 2021), and college American football teams (between 1997 and 2021) from different media outlets and ranking systems.
For each of the three sports (basketball, baseball, American football), for each season, we created an election where each vote corresponds to the power ranking of the teams in one of the weeks of the season according to one of the ranking systems.
Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete. The name of each patch starts with the relevant sports followed by the relevant year.
The season power rankings and weekly power rankings datasets were generated from the same data.
This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.
Consists of 53 data files: Baseball 2010 Baseball 2011 Baseball 2012 Baseball 2013 Baseball 2014 Baseball 2015 Baseball 2016 and 46 more.
00039
This dataset contains the bidding data from 3 Computer Science Conferences. This contains the bids of all reviewers (aside a small number of opt-outs) over a subset of papers at the conference.
The bidding language for these conferences is yes/maybe/conflict. In order to make these more useful for PreLib users, we have converted them to incomplete partial orders of the form {yes} > {maybe} > {no response}. The papers for which a reviewer had a conflict have been removed from their preference list. All reviewers had different preference orderings, hence each file contains as many entries as reviewers.
Consists of 3 data files: AI Conference 1 AI Conference 2 AI Conference 3
00051
This dataset contains elections generated from indicator-based rankings of countries. For each year between 2005 and 2016, the underlying country ranking data (based on the popular world happiness report; kaggle.com/alcidesoxa/world-happiness-report-2005-2018) contains different quantitative indicators for the happiness of citizens from over 100 countries. For each year, we created an election where the countries are the candidates and each vote ranks them according to one indicator.
Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete.
Other indicator-based rankings have been used to create the university rankings and city rankings datasets.
This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.
Consists of 12 data files: 2005 2006 2007 2008 2009 2010 2011 and 5 more.
00043
This dataset contains elections generated from road bicycle racing competitions. It consists of two parts.
Tour de France. For each edition of the Tour de France between 1903 and 2021, the underlying Tour de France data (procyclingstats.com) contains the completion times of all riders for each stage. For each edition, we created an election in which the riders are the candidates and each vote corresponds to a stage and ranks the riders by their completion time.
Giro d'Italia. For each edition of the Giro d'Italia between 1910 and 2020, the underlying data Giro d'Italia data (procyclingstats.com) contains the completion times of all riders for each stage of the edition. For each edition, we created an election in which the riders are the candidates and each vote corresponds to a stage and ranks the riders by their completion time.
Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete. The name of each patch starts either with gdi (Giro d'Italia) or tdf (Tour de France), followed by the respective year.
This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.
Consists of 196 data files: Gdi-1910 Gdi-1911 Gdi-1912 Gdi-1913 Gdi-1914 Gdi-1919 Gdi-1920 and 189 more.
00002
The Debian Project Leader Elections are held yearly with most of the ballots available online.
We have captured several years of data below including the vote for the Debian logo. Some years there have been only a few candidate and we have omitted these years. The included data sets have between 4 and 9 candidates depending on instance and about 400 individual votes per instance.
Consists of 8 data files: Debian 2002 Leader Debian 2003 Leader Debian 2005 Leader Debian 2006 Leader Debian 2007 Leader Debian 2010 Leader Debian 2012 Leader and 1 more.
00032
This dataset contains the results of surveying students and professors in the Faculty of Informatics, Instituto Superior Politecnico Jose Antonio Echeverria (Cujae, Havana, Cuba) about their preferences on courses and the most important aspects affecting their performance as students and professionals. Answers include ties and missing elements. These surveys, conducted in 2015, include criteria about different numbers of aspects (6 to 32 candidates) and 13 courses.
This dataset was donated by Alejandro Rosete Suarez and Milton Garcia Borroto and may be augmented with new surveys in the future.
Consists of 10 data files: Professors: Most Important Aspects of Failure Students: Most Important Aspects of Failure Combined: Most Important Aspects of Failure Professors: Most Important Courses Professors: Most Important Courses Professors: Least Important Courses Professors: Most Important Qualities for Success and 3 more.
00007
This dataset contains the results of 86 separate elections of various elections held by non-profit organizations, trade unions, and professional organizations. They were originally donated by Nicolaus Tideman who secured NSF funding to have the ballots tabulated. The ballots are from elections held under various voting rules requiring incomplete strict orders. The tabulated results were initially collected by the Electoral Reform Society in the UK in order to support the adoption of STV and other range voting methods.
The files contain vote records with a maximum of 29 candidates and as few as 3; the number of voters ranges from 9 to 3419. The toc files have all unranked candidates tied, at the end of the order. Additionally, some of these are complete sets of ballots from the given elections and some are random samples from the set of all ballots.
Consists of 87 data files: ERS Set 1 ERS Set 2 ERS Set 3 ERS Set 4 ERS Set 5 ERS Set 6 ERS Set 7 and 80 more.
00053
This dataset contains elections generated from the Formula 1 World Championship. The underlying Formula 1 data (kaggle.com/rohanrao/formula-1-world-championship-1950-2020) contains the finishing times of all drivers in all laps of races taking place between 1950 and 2020. For each race (taking place between 1950 and 2020), we created an election where each vote corresponds to a lap in the race and ranks the drivers by the time they spend in this lap.
Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete. The name of each patch starts with the year in which the race took place followed by the name of the race.
The Formula 1 seasons dataset contains elections generated from the same data.
This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.
Consists of 454 data files: 1996 Argentine Grand Prix 1996 Australian Grand Prix 1996 Belgian Grand Prix 1996 Brazilian Grand Prix 1996 British Grand Prix 1996 Canadian Grand Prix 1996 European Grand Prix and 447 more.
00052
This dataset contains elections generated from the Formula 1 World Championship. The underlying Formula 1 data (kaggle.com/rohanrao/formula-1-world-championship-1950-2020) contains the finishing times of all drivers in all laps of races taking place between 1950 and 2020. For each year, we created an election where each vote corresponds to a race in this year and ranks the drivers by their total finishing time in this race.
Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete.
The Formula 1 races dataset contains elections generated from the same data.
This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.
Consists of 71 data files: 1950 1951 1952 1953 1954 1955 1956 and 64 more.
00008
This data set contains the results of the 2007 Glasgow City Council elections, seperated by Ward. There are 21 wards, each with different candidates and voters. These files report the results of all the Ward level elections which were origionally held under STV. In this data set there is a maximum of 13 candidates and a minimum of 8 candidates. The maximum number of voters is 12,744 and the minimum is 5,199.
The data presented here was donated by Jeffrey O'Neill who runs the site OpenSTV.org.
Consists of 21 data files: 2007 Scotland Anderston Ward 2007 Scotland Baillieston Ward 2007 Scotland Calton Ward 2007 Scotland Canal Ward 2007 Scotland Craigton Ward 2007 Scotland Drumchapel Ward 2007 Scotland EastCentre Ward and 14 more.
00046
This dataset contains elections generated from indicator-based rankings of universities. For each year between 2012 and 2015, the university ranking data (kaggle.com/mylesoneill/world-university-rankings) contains rankings of universities according to different criteria provided by three systems. For each year, we created an election where the universities are the candidates and each vote ranks them according to one criterion used by one of the three systems.
Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete.
Other indicator-based rankings have been used to create the country rankings and city rankings datasets.
This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.
00001
The Dublin North, West, and Meath data sets contain a complete record of votes for two separate elections held in Dublin, Ireland in 2002. The votes were posted online but have since been removed.
The data sets are not complete, they contain many partial votes over the candidate set. The North data set contains 43,942 votes over 12 candidates, the West data set contains 29,988 over 9 candidates, and the Meath set contains 64,081 votes over 14 candidates.
The Meath data presented here was donated by Jeffrey O'Neill who runs the site OpenSTV.org.
Consists of 3 data files: 2002 Dublin North 2002 Dublin West 2002 Meath
00036
This dataset contains 310 instances of synthetic kidney donor pools. The data was generated using a state of the art donor pool generation method (described in Saidman et al., Increasing the opportunity of live kidney donation by matching for two-and three-way exchanges. Transplantation 81(5), 2006) and was donated by John Dickerson. John has recently posted his generation as well as his exchange solving code online; it is available here.
The dataset consists of 10 randomly generated instances of kidney exchanges with 16, 32, 64, 128, 256, 512, 1024, 2048 patients and, as a percentage of the pool, altruists at 0%, 5%, 10%, and 15% for a total of 310 data files. The main components use the wmd data format. Each edge has a source and multiple destinations to represent the patients that can receive a kidney from the source. All edges have weight 1 unless they connect from a patient to an altruist (who does not need a kidney), which have weight 0.
There is a dat file associated with each kidney exchange datafile. This file contains some extra fields that may be of interest to researchers. Specifically, the file contains the following files: Pair index number of the pair in the corresponding wmd file.; Patient the blood type of the person needing the kidney; Donor the blood type of the person donating the kidney; Wife-P? 1 if the person needing the kidney is the wife of the donor; %Pra denotes the panel reactive antibody level of the patient, discretized into three levels; Out-Deg the number of nodes in the wmd file that can receive a kidney from this donor; Altruist1 if the corresponding pair is an altruist.
Consists of 310 data files: Kidney Matching - 16 with 0 Kidney Matching - 16 with 0 Kidney Matching - 16 with 0 Kidney Matching - 16 with 0 Kidney Matching - 16 with 0 Kidney Matching - 16 with 0 Kidney Matching - 16 with 0 and 303 more.
00061
Certain blockchain protocols conduct approval-based committee elections on a day-to-day basis. Specifically, these elections occur in blockchains using the Nominated Proof-of-Stake (NPoS) protocol. In this system, a subset of stakeholders, called validators, are elected to run the consensus protocol, which is crucial for the integrity of the blockchain. The problem of selecting the validators can be modeled as a committee election.
This dataset presents the voting data of the Kusama network, a blockchain system that implements the Nominated Proof-of-Stake (NPoS) protocol. The dataset contains 96 elections from the Polkadot blockchain. These elections contain roughly 2000 candidates and 10 000 voters each.
Note that in practice voters are assigned weights (that are of highly different scales). We cannot present this data in the PrefLib data. To every ".cat" file that includes the approval ballots corresponds thus a ".dat" file that describes the weights.
This dataset has been converted into the PrefLib format based on the sources provided by Niclas Böhmer (available here).
Consists of 1520 data files: Session 17057 - Weights Session 17063 - Weights Session 17069 - Weights Session 17075 - Weights Session 17081 - Weights Session 17087 - Weights Session 17093 - Weights and 1513 more.
00003
The Mariner Trajectory Selection Data Set is the votes cast by the various science teams responsible for selecting the trajectory for the 1977 interplanetary satellite. There were a total of 10 science teams voting over 32 possible paths. All these votes are complete but indifference was allowed between some of the objects.
Consists of 1 data file: NASA Mariner Path Selection
00024
The Mechanical Turk Dots datasets come from Andrew Mao and were collected using Mechanical Turk. These data sets each contain elections with 794-800 voters over 4 candidates.
Each of the candidates correspond to random dots presented to a user on Mechanical Turk, who is asked to rank the items from those containing the least dots (first) to those containing the most dots (last). Thus, for all of these data sets there is a ground truth ranking which corresponds to the candidate names in sorted order. In the Dots task, each task contains elements with 200, 200+i, 200+2i, and 200+3i dots, where i = {3, 5, 7, 9}. This allows for more noise to be introduced to various iterations of the task. For each i, 40 sets of puzzles were placed on Mechanical Turk and were ranked by 20 users. As per the data owners request these 160 individual trails have been aggregated into a single file for each i. The individual trial runs are available upon request.
Consists of 4 data files: all_200x3 all_200x5 all_200x7 all_200x9
00025
The Mechanical Turk Dots datasets come from Andrew Mao and were collected using Mechanical Turk. These data sets each contain elections with 793-797 voters over 4 candidates.
Each of the candidates correspond to an instance of the sliding puzzle game presented to a user on Mechanical Turk, who is asked to rank the items from those in a position closest to solution (first) to those requiring the most moves to complete (last). Thus, for all of these data sets there is a ground truth ranking which corresponds to the candidate names in sorted order. In the Puzzle task, each task contains elements requiring d, d+3, d+6, and d+9 moves to complete, where d = {5, 7, 9, 11}. This allows for more noise to be introduced to various iterations of the task. For each i, 40 sets of puzzles were placed on Mechanical Turk and were ranked by 20 users. As per the data owners request these 160 individual trails have been aggregated into a single file for each i. The individual trial runs are available upon request.
Consists of 4 data files: all_11_14_17_20 all_5_8_11_14 all_7_10_13_16 all_9_12_15_18
00018
The 2009 Minneapolis Data contains the results from the election for the Parks and Rec Commissioner and Tax Assessor in Minneapolis, MN. The set contains about 30,000 votes over 7-400 candidates. The full data sets contain ballots along with write in candidates (Mikey Mouse and Yoda are well represented). The No Write In files contain the same votes removing any write-ins and modifying the votes accordingly.
Note that these elections were conducted under a ranked voting system which allowed blank entries. In processing this data for PrefLib we have ignored blanks and only report the order over the candidates.
The data on this page was donated by Jeffrey O'Neill who runs the site OpenSTV.org.
Consists of 4 data files: 2009 Minneapolis Park and Recreation Commissioner At-Large Election - Full 2009 Minneapolis Park and Recreation Commissioner At-Large Election - No Write In 2009 Minneapolis Board of Estimate and Taxation Election - Full 2009 Minneapolis Board of Estimate and Taxation Election - No Write In
00050
This dataset contains an election generated from indicator-based rankings of cities. The underlying city ranking data (kaggle.com/blitzr/movehub-city-rankings) contains twelve quantitative indicators for the life quality in 216 different cities determined by movehub.com. We created a single election where each city is a candidate and each vote corresponds to the ranking of the cities with respect to one of the indicators.
Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete.
Other indicator-based rankings have been used to create the country rankings and university rankings datasets.
This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.
Consists of 1 data file: All
00049
This dataset contains elections generated from multi-lap sports competitions.
The underlying mylaps data contains the completion time of athletes in each lap of a multi-lap competition (specifically, speed skating and cycling competitions) crawled from results.sporthive.com. For each race, we created an election in which the athletes are the candidates and each vote corresponds to one lap and ranks the athletes by their completion time.
Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete.
This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.
Consists of 635 data files: 100 101 102 103 104 105 106 and 628 more.
00004
The Netflix Prize was a competition devised by Netflix to improve the accuracy of its recommendation system. To facilitate this Netflix released real ratings about movies from the users of the system. Any set of movies can be transformed into an election via a process outlined by Mattei, Forshee, and Goldsmith (reference below).
The data sets posted below correspond 100 random 3 and 4 candidate elections drawn from Data Set 1 in the paper , "An Empirical Study of Voting Rules and Manipulation with Large Datasets." The elements numbered 1 - 100 are all 3 candidate elections and the elements 101 - 201 are all 4 candidate elections.
Consists of 200 data files: Netflix Prize Data Netflix Prize Data Netflix Prize Data Netflix Prize Data Netflix Prize Data Netflix Prize Data Netflix Prize Data and 193 more.
00058
The New South Wales (NSW) Legislative Assembly is the lower of two houses of the Parliament of New South Wales, an Australian state. The Assembly comprises 93 seats, each representing one of 93 Districts.
In these elections, voters submitted Optional Preferential Votes; these ballots required at least one candidate to be specified. The outcome of each election was determined by the Instant-Runoff Voting (IRV) social choice function.
The data sets posted below correspond to each of the NSW districts in the 2015, 2019 and 2023 NSW Legislative Assembly elections. The elements numbered 1-93 correspond to the 2015 election, those numbered 94-186 correspond to the 2019 election, and those numbered 187-279 correspond to the 2023 election.
These datafiles comprise all formal votes cast in each contest, with informal votes omitted.
Consists of 279 data files: Albury 2015 Auburn 2015 Ballina 2015 Balmain 2015 Bankstown 2015 Barwon 2015 Bathurst 2015 and 272 more.
00019
The 2010 Oakland Data contains the results from the city council and mayoral elections held in Oakland, CA in 2010. The set contains 7 distinct elections with between 4 and 11 canddiates and 900 and 145,000 voters.
Note that these elections were conducted under a ranked voting system which allowed blank entries. In processing this data for PrefLib we have ignored blanks and only report the order over the candidates.
The data on this page was donated by Jeffrey O'Neill who runs the site OpenSTV.org.
Consists of 14 data files: 2010 Oakland City Council - District 4 2010 Oakland City Council - District 4 2010 Oakland City Council - District 4 2010 Oakland Mayor 2012 Oakland City Council - At Large 2012 Oakland City Council - At Large M2012 Oakland City Council - District 1 and 7 more.
00057
This dataset gathers parliamentary elections. The Austrian elections were provided by Martin Lackner.
Consists of 9 data files: Austrian legislative election 1994 Austrian legislative election 1995 Austrian legislative election 1999 Austrian legislative election 2002 Austrian legislative election 2006 Austrian legislative election 2008 Austrian legislative election 2013 and 2 more.
00020
The 2008 Pierce Data contains the results from several elections, including county executive, held in Pierce, WA in 2008. The set contains 4 distinct elections with between 4 and 7 canddiates and 40,000 and 300,000 voters.
Note that these elections were conducted under a ranked voting system which allowed blank entries. In processing this data for PrefLib we have ignored blanks and only report the order over the candidates.
The data on this page was donated by Jeffrey O'Neill who runs the site OpenSTV.org.
Consists of 8 data files: 2009 Pierce County Auditor 2009 Pierce County Auditor 2009 Pierce County Auditor 2008 Pierce County Council - District 2 2009 Pierce County Auditor 2008 Pierce County Executive 2009 Pierce County Auditor and 1 more.
00060
Certain blockchain protocols conduct approval-based committee elections on a day-to-day basis. Specifically, these elections occur in blockchains using the Nominated Proof-of-Stake (NPoS) protocol. In this system, a subset of stakeholders, called validators, are elected to run the consensus protocol, which is crucial for the integrity of the blockchain. The problem of selecting the validators can be modeled as a committee election.
This dataset presents the voting data of the Polkadot network, a blockchain system that implements the Nominated Proof-of-Stake (NPoS) protocol. The dataset contains 96 elections from the Polkadot blockchain. These elections contain between 18 202 and 48 025 voters and between 920 and 1080 candidates.
Note that in practice voters are assigned weights (that are of highly different scales). We cannot present this data in the PrefLib data. To every ".cat" file that includes the approval ballots corresponds thus a ".dat" file that describes the weights.
This dataset has been converted into the PrefLib format based on the sources provided by Niclas Böhmer (available here).
Consists of 496 data files: Session 2429 - Weights Session 2435 - Weights Session 2441 - Weights Session 2447 - Weights Session 2453 - Weights Session 2459 - Weights Session 2465 - Weights and 489 more.
00038
This dataset contains bids of students over a set of projects for student/project allocations at the School of Computing Science, University of Glasgow. Each project is supervised by an individual each with a maximum capacity of supervision. There are 8 years worth of data in this set and with between 31 and 51 students and 56 and 155 projects. This data was kindly donated by David Manlove who collected this data.
In addition to the strict and incomplete preference profiles of the students we have extended the profiles with all unranked items tied at the end. We have also posted .dat files containing the supervisor identifiers and capacities. The format for the .dat files is Supervisor ID, Capacity, Projects; where Projects is a space separated list of the projects supervised by the Supervisor. Each project has a capacity of 1 while each supervisor has a variable capacity. In academic sessions 2007-08 and 2008-09 there were no supervisor capacities in force, thus the projects and supervisors are in 1-1 correspondence.
Consists of 16 data files: Capacities_Projects_07_08.txt Capacities_Projects_07_08.txt Capacities_Projects_08_09.txt Capacities_Projects_08_09.txt Projects_09_10.txt Projects_09_10.txt Capacities_Projects_10_11.txt and 9 more.
00027
This analog dataset to the 2002 French Presidential Election Dataset was collected by Jean-Francois Laslier, Karine Van der Straeten and Michel Balinski. It consists of 398 approval ballots collected over potential candidates for the 2002 French Presidential election cast by students at Institut d'Etudes Politiques de Paris.
This dataset is interesting as its companion dataset Proto French Election Ratings has both the subjective evaluations of the candidates, along with the approvals. This dataset only preserves the approval ballots cast by the students. As the candidate set is the potential presidential candidates (and thus, not the exact set used in ED-00026), this is presented as a separate dataset.
Consists of 1 data file: Etudes Poltiques
00029
This analog dataset to the 2002 French Presidential Election Dataset was collected by Jean-Francois Laslier, Karine Van der Straeten and Michel Balinski. It consists of 398 approval ballots and subjective ratings on a 20 point scale collected over potential candidates for the 2002 French Presidential election cast by students at Institut d’Etudes Politiques de Paris.
This dataset preserves both the approval ballots and the subjective ratings of the candidates by each of the voters. The Approvals are coded as either a 1.0 for approved or a 0.0 for not approved. The subjective ratings are on 20 point scale where a score of -1.0 is when no input was provided (as compared to a rating of 0.0, the lowest possible).
Consists of 1 data file: Etudes Poltiques
00021
The San Francsico data contains the results from several elections, including board of supervisors, district attorny, and mayoral elections, held in San Francisco, CA between 2008 and 2012. The set contains 14 distinct elections with between 4 and 25 canddiates and 18,000 and 195,000 voters.
Note that these elections were conducted under a ranked voting system which allowed blank entries. In processing this data for PrefLib we have ignored blanks and only report the order over the candidates.
The data on this page was donated by Jeffrey O'Neill who runs the site OpenSTV.org.
Consists of 28 data files: 2008 San Francisco Board of Supervisors - District 1 2008 San Francisco Board of Supervisors - District 1 2008 San Francisco Board of Supervisors - District 1 2008 San Francisco Board of Supervisors - District 11 2008 San Francisco Board of Supervisors - District 1 2008 San Francisco - Board of Supervisors District 3 2008 San Francisco Board of Supervisors - District 1 and 21 more.
00022
The San Leandro data contains the results from several elections, including mayor and city council elections, held in San Leandro, CA between 2010 and 2012. The set contains 3 distinct elections with between 4 and 7 canddiates and about 25,000 voters each.
Note that these elections were conducted under a ranked voting system which allowed blank entries. In processing this data for PrefLib we have ignored blanks and only report the order over the candidates.
The data on this page was donated by Jeffrey O'Neill who runs the site OpenSTV.org.
Consists of 3 data files: 2010 San Leandro Mayor 2012 San Leandro City Council - District 2 2012 San Leandro City Council - District 4
00033
Approval Ballots from the San Sebastian Poster Competition held during The Summer School on Computational Social Choice organized by COST Action IC1205 at the Miramar Palace in San Sebastian in July 2016. This set has two elections of approval ballots with 17 alternatives and about 60 voters each. The data on this page was donated by Ulle Endriss.
Two elections were held, using approval voting. In the first election the alternatives were posters A1-A17; in the second election the alternatives were posters B1-B17. There were 67 eligible voters (56 summer school participants, including the 34 poster presenters, as well as 7 lecturers and 4 organizers). Of these, 65 voters participated in the first election and 60 voters participated in the second election (1 voter did not vote in either election). The elections were conducted using the Whale3 system of Sylvain Bouveret. Most of the posters are available at the summer school website.
The original data file (00033-00000001.dat) includes one column per poster. Each of the two sets of posters is ordered by the number of approvals received. Each row corresponds to a voter. The voters are ordered by the number of approvals they have given across both elections, except that the 7 voters who only participated in one of the two elections are listed last. The other files are converted into standard PrefLib format where all approved alternatives are considered a tied equivalence class.
Consists of 3 data files: San_Sebastian_All_Approvals San_Sebastian_Group_1 San_Sebastian_Group_2
00056
This dataset contains elections generated from weekly power rankings. Specifically, the underlying power ranking data (kaggle.com/masseyratings/rankings) contains weekly power rankings of college basketball teams (between 2001 and 2021), college baseball teams (between 2010 and 2021), and college American football teams (between 1997 and 2021) from different media outlets and ranking systems.
For each of the three sports (basketball, baseball, American football), for each season and each ranking system, we created an election where each vote corresponds to the power ranking of the teams in one week of the season according to the ranking system.
Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete. The name of each patch starts with the sports, followed by the relevant year and ranking system.
The combined power rankings and weekly power rankings datasets were generated from the same data.
This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.
Consists of 4015 data files: Baseball 2010 Baseball America Baseball 2010 Boyd Isr Baseball 2010 Collegiate Baseball Baseball 2010 Composite Baseball 2010 Massey Bcs Baseball 2010 Massey Baseball 2010 Moore and 4008 more.
00006
This dataset contains figure skating rankings from various competitions during the 1998 season including the World Juniors, World Championships, and the Olympics. These data sets generally have 10-25 candidates (skaters) and 8-10 judges (voters).
The candidates (skaters) are ordered such that the first candidate skated first, and on down the list. We have maintained this order as presented in the original versions of this dataset.
Consists of 48 data files: Euros Men Short Program Euros Men Free Skate Euros Pairs Short Program Euros Pairs Free Skate Euros Dance Compulsory Dance#1 Euros Dance Compulsory Dance#2 Euros Dance Original Dance and 41 more.
00010
This dataset contains the Cross Country Skiing and Ski Jumping results from the 2006-2009 World Championships. This data is provided by Robert Bredereck at TU Berlin. Robert provides tools to compute Kemeny rankings on this data at his website at TU Berlin.
The results from each competition in the season provides a rank ordering over the candidates (competitiors). We have created a toc datafile where all candidates are tied, at the end of rankings.
Note that this dataset used to contain the Formula 1 data. A larger set of the F1 data is now available in the 00052 and 00053 datasets.
Consists of 2 data files: CrossCountry Skiing Ski Jumping
00013
This dataset contains the Facebook Social Graph and full ratings of 16 restaurants and 23 pubs by 93 users.
You can find anonymous versions of the social network and the items ratings. It includes three files:
Each line in the rating files (pubs.csv and rest.csv) represents a participant with the structure: userid,X1,...,Xn. The userid in these files corresponds with the ids in the links.csv file.
The data on this page has been donated by Lihi Dery.
00048
This dataset contains elections generated from charts on Spotify. For each day between the 1st of January 2017 and the 9th of January 2018, the Spotify data (kaggle.com/edumucelli/spotifys-worldwide-daily-song-ranking) contains a daily ranking of the 200 most listened songs in 53 different countries. In our elections, candidates model songs. For each month and each country, we created an election where each vote corresponds to the ranking of the songs on one day of the month in the country.
Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete. The name of a patch starts with an abbreviation of the relevant country followed by the relevant year and the month. Note that the names of candidates are the IDs of the respective songs on spotify (e.g., candidate 10nqz67NQWWa7XPq7ycihi corresponds to "Welcome to New York" from Taylor Swift open.spotify.com/track/10nqz67NQWWa7XPq7ycihi).
The spotify daily charts dataset contains elections generated from the same data.
This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.
Consists of 645 data files: Ar 2017-01 Ar 2017-02 Ar 2017-03 Ar 2017-04 Ar 2017-05 Ar 2017-06 Ar 2017-07 and 638 more.
00047
This dataset contains elections generated from charts on Spotify. For each day between the 1st of January 2017 and the 9th of January 2018, the Spotify data (kaggle.com/edumucelli/spotifys-worldwide-daily-song-ranking) contains a daily ranking of the 200 most listened songs in 53 different countries. In our elections, candidates model songs. For each day between the 1st of January 2017 and 9th January 2018, we created an election where each vote corresponds to the ranking of the songs on this day in one of the 53 countries.
Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete. Note that the names of candidates are the IDs of the respective songs on spotify (e.g., candidate 10nqz67NQWWa7XPq7ycihi corresponds to "Welcome to New York" from Taylor Swift ).
The spotify country charts dataset contains elections generated from the same data.
This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.
Consists of 362 data files: 2017-01-01 2017-01-02 2017-01-03 2017-01-04 2017-01-05 2017-01-06 2017-01-07 and 355 more.
00014
This dataset contains the results of a series of surveys conducted by Toshihiro Kamishima asking 5000 individuals for their preferences about various kinds of sushi. There are three different datasets that were elicited in different ways:
This dataset contains 14 files in total including soc, soi, toi, and toc files.
Note that the dataset was incorrectly converted, it has been fixed as of Jan 2016, please re-download.
Due to licence issues we require that you go through Toshihiro Kamishima website to obtain the datafiles and observe the following licence terms:
Consists of 3 data files: Sushi 10 Rank Sushi 100 Rank Sushi 100 Score
00012
This dataset contains complete rank orderings of T-Shirt designs voted on by members of the Optimization Research Group at NICTA. There are 11 designs (candidates) and 30 votes about these deisgns. Voters were required to submit complete strict orders.
This data has been kindly donated by Carleton Coffrin.
Consists of 1 data file: T-Shirt Design
00044
This dataset contains elections generated from tennis world rankings. The underlying table tennis data (kaggle.com/romanzdk/ittf-table-tennis-player-rankings-and-information) contains the monthly ITTF ranking of the top 500-1500 male and female table tennis players between 2001 and 2020. For each year, we created an election where each player is a candidate and each vote corresponds to the ranking of the players in one month.
Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete. The name of each patch starts with the relevant gender followed by the year.
Other sports world rankings have been used to create the boxing world rankings and tennis world rankings datasets.
This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.
Consists of 38 data files: Men 2001 Men 2002 Men 2003 Men 2004 Men 2005 Men 2006 Men 2007 and 31 more.
00023
The Takoma Park Data contains the results from the 2007 Takoma Park, WA special election for city council. The set contains one elections with between 4 canddiates and about 400 voters.
Note that these elections were conducted under a ranked voting system which allowed blank entries. In processing this data for PrefLib we have ignored blanks and only report the order over the candidates.
The data on this page was donated by Jeffrey O'Neill who runs the site OpenSTV.org.
Consists of 1 data file: 2007 Takoma Park City Council Special Election - Ward 5
00045
This dataset contains elections generated from tennis world rankings. The underlying tennis data (kaggle.com/mimoopoo/atp-tennis-rankings-1990-to-2019) contains weekly rankings of the top 100 male tennis players published by the ATP between January 1990 and September 2019. For each year, we created an election where each player is a candidate and each vote corresponds to the ranking of the players in one week.
Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete.
Other sports world rankings have been used to create the table tennis world rankings and boxing world rankings datasets.
This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.
Consists of 29 data files: 1990 1991 1992 1993 1994 1995 1996 and 22 more.
00040
This dataset contains 675,069 reviews of 1,851 hotels across the world scraped from Trip Advisor. The data was scraped and donated by Hongning Wang.
One file contains the numerical aspect ratings provided by the users, along with other information about the hotel. The other files contains the text of the users review (split into 3 files). These reviews have been slightly modified, all excess spaces and tabs have been removed and all commas have been changed to semi-colons.
Both files are encoded in the dat format but are actually CSV files. The first line of each file explains the fields within the file. Some of the usernames are encoded in Unicode so please be careful when parsing the files!
Consists of 4 data files: Ratings Review Texts Review Texts Review Texts
00030
The 2010 UK Labor Party Leadership Vote is posted at www.rangevoting.org. This set contains the votes cast by all 266 MPs over the 5 leadership candidates. The votes are incomplete strict orders which we have posted along with extensions placing all unranked candidates tied at the end and pairwise graphs.
Consists of 1 data file: 2010 UK Labor Leadership
00031
This dataset contains votes for 15 different races for various public offices held in Vermont in 2014. This data was collected and donated by Jeremy A. Hansen. There are 3 to 6 candidates and 532 to 1960 voters in these data files. Not all races were competitive so not every race is reported for every district.
Consists of 15 data files: East Montpelier Vermont Assistant Judge East Montpelier Vermont Senate Fayston Vermont Assistant Judge Fayston Vermont House Fayston Vermont Senate Manchester Vermont Assistant Judge Manchester Vermont House and 8 more.
00011
This dataset contains the results of comparing websearches across Bing, Google, Yahoo, and Ask. This data is provided by Robert Bredereck at TU Berlin. Robert provides tools to compute Kemeny rankings on this data at his website at TU Berlin.
The data files marked big contain around 2000 candidates each while the data files marked small contain between 100 and 200 results. The search querys are shown in the names of the individual data files below. For the WebImpact files the number of search results for a particular term were used to creage a complete ranking over the search terms. These files measure the webimpact of various world cities and countries. The results are not complete and not every candidate (website) is ranked by all the voters (search engines). We have extended this data into tournament graphs, weighted majoirty graphs, and created a toc dataset where all candidates are tied, at the end of rankings.
Consists of 77 data files: webimpact_capitals5 webimpact_nations5 webimpact_richest5 websearch_big_Death+Valley websearch_big_Gulf+war websearch_big_HIV websearch_big_Lipari and 70 more.
00054
This dataset contains elections generated from weekly power rankings. Specifically, the underlying power ranking data (kaggle.com/masseyratings/rankings) contains weekly power rankings of college basketball teams (between 2001 and 2021), college baseball teams (between 2010 and 2021), and college American football teams (between 1997 and 2021) from different media outlets and ranking systems.
For each of the three sports (basketball, baseball, American football), for each week in one of the seasons, we created an election where each vote corresponds to the power ranking of the teams in this week according to one of the ranking systems.
Each patch contains the raw election and a post-processed version where some candidates and voters are deleted to make the election complete. The name of each patch starts with the sports, followed by the day on which the power rankings were published (there is one day from each covered weeek).
The combined power rankings and season power rankings datasets were generated from the same data.
This dataset is part of a larger study by Boehmer and Schaar (see this page for a more detailed description of the dataset, the post-processing, and pointers to similar datasets). If you have any questions, please contact: niclas.boehmer@tu-berlin.de.
Consists of 956 data files: Baseball 2010-05-10 Baseball 2010-05-17 Baseball 2010-05-24 Baseball 2010-05-31 Baseball 2010-06-29 Baseball 2011-02-28 Baseball 2011-03-07 and 949 more.