PrefLib is about sharing data representing preferences. We have attempted to unify the formatting of the data as much as possible. We detail below the different aspects of the hosted data. Specifically, you can learn about:
Our database gathers several datasets. A dataset is a collection of data files somewhat related to each others (different years of the same election for instance). The data file represent preferences, expressed according to different formats, see the different types of data we host.
The datasets are classified based on some tags they can have or not. We describe them below.
Each data file has a given data type which represents the type of preferences that are expressed in the data file. These types are also used as file extensions. We review them all in the following.
The SOC extension contains preferences represented by a strict and complete linear order (transitive, and asymmetric relation) over the alternatives. They are complete in the sense that every linear order contains the whole set of alternatives. They are strict in the sense that no two alternatives can be tied.
Download all the SOC data files
The SOI extension contains preferences represented by a strict and possibly incomplete linear order (transitive, and asymmetric relation) over the alternatives. They are possibly incomplete in the sense that some preferences might not contain the whole set of alternatives. They are strict in the sense that no two alternatives can be tied.
Download all the SOI data files
The TOC extension contains preferences represented by a transitive and complete relation over the alternatives. They are complete in the sense that every preference contains the whole set of candidates. They need not be strict: several alternatives can be tied.
Download all the TOC data files
The TOI extension contains preferences represented by a transitive and possibly incomplete relation over the alternatives. They are possibly incomplete in the sense that some preferences might not contain the whole set of alternatives. They also need not be strict: several alternatives can be tied.
Download all the TOI data files
Files with a CAT extension describe categorical preferences. In this domain, voters are asked to position all the alternatives into pre-determined categories, for instance the categories “Yes”, “Maybe”, and “No”. There exists an underlying ranking over the categories that determine the voters' preferences. Note that these preferences are closely related to the ordinal ones described above, except they allow for some categories to be empty.
Download all the CAT data files
Files with a WMD extension describe a set of weighted matching data. These are weighted directed graphs, i.e., a collection of edges between the alternatives associated with a weight.
Download all the WMD data files
Files with a CSV or a DAT extension are used when miscellaneous data are needed. They are generally paired with another file, providing more information than is expressible in the basic data formats.
All the data file share a common file format, with few adaptions for each specific type. Data files contain two parts, first a list of metadata with lines starting with a “#”; second the preferences themselves.
Below, you will find an example of the first 8 lines of the header of a file in the irish dataset.
1 | # FILE NAME: 00001-00000001.soi |
2 | # TITLE: 2002 Dublin North |
3 | # DESCRIPTION: |
4 | # DATA TYPE: soi |
5 | # MODIFICATION TYPE: original |
6 | # RELATES TO: |
7 | # RELATED FILES: 00001-00000001.toc |
8 | # PUBLICATION DATE: 2013-08-17 |
Let us describe each of the metadata that are common to all files.
The preference part of the file is specific to each data type, we will describe it in the following.
Ordinal preferences are represented using the data types SOC, SOI, TOC, and TOI all represent ordinal preferences.
On top of the metadata described above, additional ones are specific to ordinal preferences.
The orders are described in the following way. Each line indicates first the number of voters who submitted the given preference list, and then, after a column, the preference list. Inside a preference list, a strict ordering is indicated by comma, and indifference classes are gouped with brackets. We provide some examples below for a better understanding.
To conclude, here is an example of the 27 first lines of a data file of complete orders with ties (TOC) (taken from the debian election dataset).
1 | # FILE NAME: 00002-00000001.toc |
2 | # TITLE: Debian 2002 Leader |
3 | # DESCRIPTION: Obtained from the soi by adding the unranked alternatives at the bottom |
4 | # DATA TYPE: toc |
5 | # MODIFICATION TYPE: imbued |
6 | # RELATES TO: 00002-00000001.soi |
7 | # RELATED FILES: |
8 | # PUBLICATION DATE: 2013-08-17 |
9 | # MODIFICATION DATE: 2022-09-16 |
10 | # NUMBER ALTERNATIVES: 4 |
11 | # NUMBER VOTERS: 475 |
12 | # NUMBER UNIQUE ORDERS: 31 |
13 | # ALTERNATIVE NAME 1: Branden Robinson |
14 | # ALTERNATIVE NAME 2: Raphael Hertzog |
15 | # ALTERNATIVE NAME 3: Bdale Garbee |
16 | # ALTERNATIVE NAME 4: None Of The Above |
17 | 100: 3,1,2,4 |
18 | 79: 1,3,2,4 |
19 | 54: 3,2,1,4 |
20 | 43: 2,3,1,4 |
21 | 34: 3,2,4,1 |
22 | 30: 1,2,3,4 |
23 | 29: 2,1,3,4 |
24 | 16: 1,3,4,2 |
25 | 14: 2,3,4,1 |
26 | 12: 3,1,4,2 |
27 | 9: 3,{1,2,4} |
Categorical preferences are very similar to ordinal ones. Some metadata are also specific to them.
The preferences are described in the following way. Each line indicates first the number of voters who submitted the given preference list, and then, after a column, the preference list. Inside a preference list, each category is grouped around brackets, except for the categories with a single alternative, the empty category being “{}”. We provide some examples below for abetter understanding.
Let's conclude with an example from the french approval dataset.
1 | # FILE NAME: 00026-00000001.cat |
2 | # TITLE: GylesNonains |
3 | # DESCRIPTION: |
4 | # DATA TYPE: cat |
5 | # MODIFICATION TYPE: original |
6 | # RELATES TO: |
7 | # RELATED FILES: 00026-00000001.toc |
8 | # PUBLICATION DATE: 2017-04-13 |
9 | # MODIFICATION DATE: 2022-09-16 |
10 | # NUMBER ALTERNATIVES: 16 |
11 | # NUMBER VOTERS: 365 |
12 | # NUMBER UNIQUE PREFERENCES: 216 |
13 | # NUMBER CATEGORIES: 2 |
14 | # CATEGORY NAME 1: Yes |
15 | # CATEGORY NAME 2: No |
16 | # ALTERNATIVE NAME 1: Megret |
17 | # ALTERNATIVE NAME 2: Lepage |
18 | # ALTERNATIVE NAME 3: Gluckstein |
... | |
32 | 13: 6,{1,2,3,4,5,7,8,9,10,11,12,13,14,15,16} |
33 | 13: {},{1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16} |
34 | 10: {9,10},{1,2,3,4,5,6,7,8,11,12,13,14,15,16} |
35 | 10: {1,6},{2,3,4,5,7,8,9,10,11,12,13,14,15,16} |
Let us start, as usual, for the metadata that are specific to the matching data.
The matching grap iteself is described as a list of Source, Destination, Weight. Below is an example from the kidney matching dataset.
1 | # FILE NAME: 00036-00000001.wmd |
2 | # TITLE: Kidney Matching - 16 with 0 |
3 | # DESCRIPTION: |
4 | # DATA TYPE: wmd |
5 | # MODIFICATION TYPE: synthetic |
6 | # RELATES TO: |
7 | # RELATED FILES: 00036-00000001.dat |
8 | # PUBLICATION DATE: 2017-04-13 |
9 | # MODIFICATION DATE: 2022-09-16 |
10 | # NUMBER ALTERNATIVES: 16 |
11 | # NUMBER VOTERS: 365 |
12 | # NUMBER EDGES: 59 |
13 | # ALTERNATIVE NAME 1: Pair 1 |
14 | # ALTERNATIVE NAME 2: Pair 2 |
15 | # ALTERNATIVE NAME 3: Pair 3 |
... | |
28 | 1,5,1.0 |
29 | 1,6,1.0 |
30 | 2,1,1.0 |
31 | 2,3,1.0 |
When miscellaneous data are needed, we use the file extension DAT which has no specified format.
We have annotated most of our data files to be able to have a more fine grain analysis of the data we host. This allows for instance to be able to have a more interesting search tool. For each data file, its metadata are presented on its corresponding page.
In the following we present all the metadata we are using. Note that they may not always be available as some of them require sophisticated computations and/or do not apply for all types of data.
Each data file is labeled as either Original, Induced, Imbued or Synthetic.
We encourage you to understand some of the impacts that making these assumptions can have, see, e.g. A Behavioral Perspective on Social Choice. Anna Popova, Michel Regenwetter, and Nicholas Mattei. Annals of Mathematics and Artificial Intelligence 68(1-3), 2013.