-
Gábor Bartók, Dávid Pál, Csaba Szepesvári, István Szita
Online Learning Lecture Notes
Summary:
In spring 2011, Csaba taught an online learning course with occasional help
from Gábor and István and me. As we were teaching, we were continuously
writing lecture notes. The focus of the course was no-regret non-stochastic
learning (learning with expert advice, switching experts, follow the
regularized leader algorithm, online linear regression), and brief explanation of
online-to-batch conversion and multi-armed bandits. Our goal was to develop
the theory from first principles.
Bib info: lecture notes
Download:
[PDF]
-
Yasin Abbasi-Yadkori, Dávid Pál, Csaba Szepesvári
Online-to-Confidence-Set Conversions and Application to Sparse Stochastic Bandits
Summary:
We show how to convert any online algorithm for linear prediction under quadratic loss
(e.g. online least squares, online gradient, p-norm algorithm) can be converted
into a confidence set for the weight vector of the linear model under
any sub-Gaussian noise. The lower the regret of the algorithm, the smaller the confidence
set. As an application, we use these confidence sets to construct online algorithms
for the linear stochastic bandit problem. Using an online linear prediction algorithm for
sparse models, we are able to construct an algorithm
for the sparse linear stochastic bandit problem i.e. problem where the underlying
linear model is sparse.
Bib info: AISTATS 2012
Download:
[PDF],
[Slides],
[Overview Talk]
-
Yasin Abbasi-Yadkori, Dávid Pál, Csaba Szepesvári
Improved Algorithms for Linear Stochastic Bandits
Summary:
We provide an upper bound for the tail probability of the norm of a
vector-valued martingale. This bound immediately gives rise to a confidence set
(ellipsoid) for online regularized linear least-squares estimator. We apply
these confidence sets to the stochastic multi-armed bandit problem and the
stochastic linear bandit problem. We show that the regret of the modified UCB
algorithm is constant for any fixed confidence parameter! Furthermore, we
improve the previously known bounds for the stochastic linear bandit problem by
a poly-logarithmic factor.
Bib info: NIPS 2011
Download:
[PDF],
[arXiv]
-
Gábor Bartók, Dávid Pál, Csaba Szepesvári
Minimax Regret of Finite Partial-Monitoring Games in Stochastic Environments
Summary:
We continue with our ALT 2010 paper by classification of minimax regret of all
partial games with finite number of actions and outcomes. However, we restrict
ourselves to stochastic environments, i.e. the environment choose outcomes
i.i.d. from a fixed but unknown probability distribution. We show that regret
of any game falls into one of the four categories: zero,
T1/2, T2/3 or T. We provide
algorithms that achieve that regret within logarithmic factor. We believe that
this result can be lifted to adversarial setting also; however that's left as a
future work.
Bib info: Proceedings COLT 2011
Download:
[PDF]
-
Gábor Bartók, Dávid Pál, Csaba Szepesvári
Toward a Classification of Partial Monitoring Games
Summary:
Partial monitoring games are online learning problems where a decision maker repeatedly makes
a decision, receives a feedback and suffers a loss. Multi-armed bandits and dynamic pricing are special cases.
We give characterization of min-max regret of all finite partial monitoring games with two outcomes.
We show that regret of any such game falls into one of the four categories:
zero, T1/2, T2/3 or T. Our characterization is nice and simple
combinatorial/geometrical condition on the loss matrix.
Bib info: Proceedings of ALT 2010, Gábor received for this paper the best student paper award.
Download:
[PDF],
[arXiv]
-
Dávid Pál, Barnabás Póczos, Csaba Szepesvári
Estimation of Rényi Entropy and Mutual Information Based on Generalized Nearest-Neighbor Graphs
Summary:
We consider simple and computationally efficient
nonparametric estimators of Rényi entropy and mutual information based on
an i.i.d. sample drawn from an unknown, absolutely continuous distribution over
Rd. The estimators are calculated as the sum
of p-th powers of the Euclidean lengths of the edges of the `generalized
nearest-neighbor' graph of the sample and the empirical copula of the sample
respectively. Under mild conditions we prove the almost sure
consistency of the estimators. In addition, we derive high probability error
bounds assuming that the density underlying the sample is Lipschitz continuous.
Bib info: Proceedings of NIPS 2010
Download:
conference version: [PDF],
extended version with proofs: [PDF], [arXiv],
poster: [PDF]
-
Shai Ben-David, Dávid Pál, Shai Shalev-Shwartz
Agnostic Online Learning
Summary: We generalize the Littlestone's online learning model to the
agnostic setting where no hypothesis makes zero error. Littlestone defined a
combinatorial parameter of the hypothesis class, which we call Littlestone's
dimension and which determines the worst-case number of prediction mistakes
made by an online learning algorithm in the realizable setting. Point of our paper is that
Littlestone's dimension characterizes learnability in the agnostic case as
well. Namely, we give upper and lower bounds on the regret in terms of the Littlestone's
dimension.
This is a similar story to what happened to the Valiant's PAC model. The key
quantity there is Vapnik-Chervonekis dimension. Valiant's PAC model is the
``realizable case''. Our work can be paralleled to what Haussler and others
did in 1992 when they generalized the PAC model to the agnostic setting and
showed that Vapnik-Chervonekis dimension remains the key quantity
characterizing learnability there as well.
Bib info: COLT 2009
Download:
[PDF],
[slides],
[slides sources]
-
PhD thesis.
Contributions to Unsupervised and Semi-Supervised Learning
This thesis is essentially a compilation of the two older papers on clustering stability and
an part of the paper on semi-supervised learning. The formal gaps, especially in the first paper on clustering stability,
were filled in and presentation was improved.
Download:
[PDF],
[sources]
-
Tyler Lu, Dávid Pál, Martin Pál
Showing Relevant Ads via Lipschitz Context Multi-Armed Bandits
Summary:
We study context multi-armed bandit problems where the context comes
from a metric space and the payoff satisfies a Lipschitz condition with respect
to the metric. Abstractly, a context multi-armed bandit problem models a
situation in which, in a sequence of independent trials, an online algorithm
chooses an action based on a given context (side information) from a set
of possible actions so as to maximize the total payoff of the chosen actions.
The payoff depends on both the action chosen and the context. In contrast,
context-free multi-armed bandit problems, studied previously, model
situations where no side information is available and the payoff depends only
on the action chosen.
For concreteness we focus in this paper on an application to Internet search
advertisement where the task is to display ads to a user of a Internet search
engine based on his search query so as to maximize the click-trough rate of the
ads displayed. We cast this problem as a context multi-armed bandit problem
where queries and ads form metric spaces and the payoff function is Lipschitz
with respect to both the metrics. We present an algorithm, give upper bound on its regret
and show an (almost) matching lower bound on the regret of any algorithm.
Bib info: Proceedings of AISTATS 2010
Download:
[PDF],
[slides],
[slides sources]
-
Shai Ben-David, Tyler Lu, Dávid Pál, Miroslava Sotáková
Learning Low-Density Separators
Summary:
We define a novel, basic, unsupervised learning problem -
learning the lowest density homogeneous hyperplane separator of an
unknown probability distribution. This task is relevant to several problems
in machine learning, such as semi-supervised learning and clustering
stability. We investigate the question of existence of a universally
consistent algorithm for this problem. We propose two natural learning
paradigms and prove that, on input unlabeled random samples generated
by any member of a rich family of distributions, they are guaranteed to
converge to the optimal separator for that distribution. We complement
this result by showing that no learning algorithm for our task can achieve
uniform learning rates (that are independent of the data generating distribution).
Bib info: Proceedings of AISTATS 2009
Download:
[PDF]
-
Shai Ben-David, Tyler Lu, Dávid Pál
Does Unlabeled Data Provably Help? Worst-case Analysis of the Sample Complexity of Semi-Supervised Learning
Summary:
We mathematically study the potential benefits of the use of unlabeled data to
classification prediction to the learner. We propose a simple model of
semi-supervised learning (SSL) in which the unlabeled data distribution is
perfectly known to the learner. We compare this model with the standard PAC
model (and its agnostic vesrion) for supervised learning, in which the unlabeled distribution is uknown
to the learner. Can it be that there exists a supervised algorithm which
for any unlabeled distribution needs in the worst case (over possible targets)
at most by a constant factor more labeled samples than any SSL learner? It
seems that the ERM algorithm is such a candidate. We prove, for some special
cases, that this indeed the case.
Bib info: Proceedings of COLT 2008
Note: See also Tyler's Master's thesis.
Download:
[PDF] [sources]
-
Gagan Aggarwal, S. Muthukrishnan, Dávid Pál, Martin Pál
General Auction Mechanism for Search Advertising
Summary:
We propose a new auction mechanims for search advertising that generalizes both Generelized Second Price
auction (which is currently, in 2008, used by Google, Yahoo! and MSN) and the famous Vickrey-Clarke-Groves mechanism
adapted to the search auctions. Our mechanism allows each bidder to specify a subset of slots in which (s)he is interested,
and the value and the maximum price of each slot. For the auctioneer (the search engine) it allows to specify for every slot-bidder
pair a reserve (minimum) price. Our auction computes the bidder-optimal stable matching. The running time is O(nk^3), where
n is the number of bidders and k is the number of slots. We also prove that our auction mechanism
is truth-revealing, that is, dominant strategy of each bidder is not to lie.
Bib info: Proceedings of WWW 2009
Download:
[PDF]
-
Shai Ben-David, Dávid Pál, Hans Ulrich Simon
Stability of K-means Clusterings
Summary:
This is follow-up of the previous paper. We consider the clustering algorithm A which minimizes the k-means cost precisely.
(Such algorithm is not realistic, since the optimization problem is known to be NP-hard.
Nevertheless, the classical Lloyd's hueristic a.k.a. "the k-mean algorithm" tries to do this.)
We analyze how stable is the clustering outputted by A with respect to a random draw of the sample points.
Given a probability distribution, draw two i.i.d. samples of the same size. If with high
probability, the clustering of the first sample does not differ from the
clustering of the second sample too much, we say that A is stable on
the probability distribution. For discrete probability distributions we show that
instability happens if and only if the probability distribution has two or more clusterings
with optimal k-means cost.
Bib info: Proceedings of COLT 2007
Download:
[PDF] (extended version), [PDF] (short conference version)
-
Shai Ben-David, Dávid Pál, Ulrike von Luxburg
A Sober look at Clustering Stability
Summary:
We investigate how stable is the clustering outputted by a clustering algorithm
with respect to a random draw of the sample points. Given a probability
distribution, draw two i.i.d. samples of the same size. If with high
probability, the clustering of the first sample does not differ from the
clustering of the second sample too much, we say that algorithm is stable on
the probability distribution. We show that for that for algorithms that optimize some cost function stability depends
on the uniqueness of the optimal solution of the optimization problem for the "infinite" sample represented by the
probability distribution.
Bib info: Proceedings of COLT 2006, Best Student Paper Award
Download:
[PDF],
[Slides in PDF],
[Slides source files]
-
Dávid Pál, Martin Škoviera
Colouring Cubic Graphs by Small Steiner Triple Systems
Summary: Steiner triple system is a collection of 3-element subsets (called triples) of
a finite set of points, such that every two points appear together in exactly one triple.
Given Steiner triple system S, one can ask whether it is possible to color the edges of a cubic graph with points of S,
in such way that colors of the three edges at a vertex form a triple of S.
We construct a small Steiner triple system (of order 21) which colors all simple cubic graphs.
Note: This is the main result of my "Magister" Thesis at Comenius University under my advisor Martin Škoviera, 2004.
Bib info: Graphs and Combinatorics, 2007
Download:
[PDF],
[Postscript],
[Source files]
|
co-authors:
Yasin Abbasi-Yadkori
Gagan Aggarwal
Gábor Bartók
Shai Ben-David
Tyler Lu
Ulrike von Luxburg
S. Muthukrisnan
Martin Pál
Barnabás Póczos
Hans Ulrich Simon
Miroslava Sotáková
Martin Škoviera
Shai Shalev-Shwartz
Csaba Szepesvári
|