Python Package for Statistical Rank Models (Ba/Ma)
Topic for a bachelor/master's thesis
In an increasing number of machine learning applications involving human activities, such as opinion polls, sports competition, or recommender systems, the available data is partially or entirely not of a quantitative, but rather of a qualitative nature, such as partial resp. total rankings or even incomplete rankings of the involved entities . Such type of data is typically referred to as preference data and has given rise to the growing subfield of preference learning . Unlike the classical machine learning problem, where the data is real-valued and methods can leverage the appealing properties of the Euclidean space, preference data is mathematically more difficult to handle, as the space of rankings (permutations) is lacking a vectorial structure. Nevertheless, the field of preference-based learning has evolved quite rapidly over the past decades, bringing up numerous problem frameworks ranging from variants of batch learning scenarios to online and reinforcement learning. One key modeling approach, which has indisputable lead to significant progress in preference-based learning, is the usage of a Statistical Rank Model, providing a sound and convenient probabilistic model for the underlying preference data, which in turn can give rise to reasonable as well as practically powerful learning algorithms. Valuable insights into the numerical performance of preference-based learning algorithms can be gained by conducting experiments with synthetic preference data generated by statistical rank models. For this purpose, researchers can benefit from existing software packages in their favorite programming language, which provide convenient and versatile functions for dealing with preference data. Although software packages for statistical rank models exist for some programming languages which are prevalent in machine learning research, such as R , there is (somewhat surprisingly) no software package available for Python , despite its increasing usage in machine learning research.
The goal of the thesis is to develop a Python package providing efficient and well-documented implementations of statistical rank models as well as suitable functions for importing and handling preference data.
Knowledge of basic probability theory, programming skills
-  M. Alvo, P. Yu. Statistical methods for ranking data. Springer, 2014.
-  J. Fürnkranz, E. Hüllermeier. Preference learning. Springer, 2010.
-  R: A Language and Environment for Statistical Computing. https://www.Rproject.org/
-  G. Van Rossum and F.L. Drake. Python 3 Reference Manual, 2009.