Accès direct au contenu


Sites des composantes ou serv > Modélisation aléatoire

Colloque "Multimodality and Related Topics"



Jeudi 27 et vendredi 28 mars 2008

Université Paris X-Nanterre


Organisatrice : Karine Tribouley




Jeudi 27 mars 2008, 13h-17h.
  1. 13h Günther Sawitzki, Ruprecht-Karls-Universität Heidelberg :
    Diagnostics for multi-modality. Résumé.
  2. 14h Ghislaine Gayraud, Université de Rouen :
    Bayesian Level set estimation. Résumé.
  3. 15h Gérard Biau, Université Paris 6 :
    Set and set properties estimation. Résumé.
  4. 16h Jussi Klemela, University of Oulu :
    Level set trees and the analysis of the shape
    of multivariate objects
    . Résumé.

Vendredi 28 mars 2008, 9h-12h.
  1. 9h Wolfgang Polonik, University of California at Davis :
    Excess mass and related statistical methods. Résumé.
  2. 10h Cristina Butucea, Université de Lille 1 :
    Functional approach for the excess mass estimation
    in the density model
    . Résumé
  3. 11h Mathilde Mougeot, Université Paris X-Nanterre :
    Applications using the excess mass estimator.

Vendredi 28 mars 2008, 14h-18h.
  1. 14h Philippe Berthet, Université Rennes 1 : An empirical and Brownian processes look at mode estimation using the shorth and excess mass approaches. Résumé.
  2. 15h Elisabeth Gassiat, Université de Paris-Sud :
    The likelihood ratio test for general mixture models
    with possibly unknown structural parameter
    . Résumé.
  3. 16h Sophie Dabo-Niang, Université de Lille 3 :
    Mode estimation for functional random variable:
    application for curves classification
    . Résumé.
  4. 17h Rui Castro, University of Wisconsin, Madison :
    Active Learning and the Importance of Feedback
    in Sampling
    . Résumé.



**************************************************************************

Résumés des communications


Philippe Berthet : In this talk we will start from the general question of excess mass estimation of level sets of a density. Assuming that these sets belong to a Donsker class, this is a typical empirical argmin problem. We will explain our current direction of investigation in M-estimation based on what we call the Brownian paradigm. Our basic tool is a local strong Gaussian approximation of the empirical process which reduces the problem to the study of a uniquely determined random process, namely drifted increments of a Brownian bridge. This gives access to known rates of convergence of the maximum excess mass sets. Moreover, exploiting gaussianity opens the door to sharper results such as stability of almost maximizers, lower bounds for rates of convergence and excess risk deviations, polynomial rates of decay for probabilities of key events and also to intrinsic hypotheses based only on the basic characteristics of the Gaussian process - drift, variance and correlations. Next we will connect the excess mass estimation to the shorth estimation - or minimal volume sets - and show some simulations. This will enable us to focus in greater details on the real line situation where the previously discussed general tools are available and optimal, such as the strong approximation and the crucial large deviation estimates for the argmin of drifted Brownian increments we specially obtain. We also need our sharpest results on localized increments of the empirical quantile process. This leads to a powerful exact limit theorem describing the stochastic oscillation behaviour of shorth intervals with mass q_n tending to 0. We then derive several exact weak and strong limit theorems for estimators of the mode in either regular (cube root asymptotics), irregular (faster rates) or misspecified (slowed rates) situations by using a bias and variance comparison reasoning to localize the mode in the q_n shorth. This provides a deep insight on mode estimation together with a systematic way of choosing q_n depending on which parameters are known in semi-parametric models for the density, or data-driven. Similar results hold for the excess mass estimators which are shown to oscillate at very similar rates, but are longer to compute in practice. We will finally come back to the general setting for our last comments, in view of our precise results for intervals.

*********************************************************************************

Gérard Biau : We study the two problems of reconstructing a set $S$ and of estimating its number of connected components, from random points of $S$ drawn from some probability measure. First, we focus on the certainly most simple set estimator defined as the union of balls centered at the random points. Second, we propose a related graph-based estimator of the number of connected components of $S$. Using tools from Riemannian geometry, and under mild analytic conditions on the underlying density of the data, we derive the exact rate of convergence of the set estimator and prove the consistency of the estimator of the number of connected components. Statistical applications include density support estimation and estimation of the number of clusters in data partitioning.

***************************************************************************

Cristina Butucea : We consider a multivariate density model where we estimate the excess mass of the unknown probability density $f$ at a given level $\nu>0$ from $n$ i.i.d. observed random variables. This problem has several applications such as multimodality testing, density contour clustering, anomaly detection, classification and so on. For the first time in the literature we estimate the excess mass as an integrated functional of the unknown density $f$. We suggest an estimator and evaluate its rate of convergence, when $f$ belongs to general Besov smoothness classes, for several risk measures.

****************************************************************************

Rui Castro : In this talk I will present a discussion of active learning and the role of feedback in sampling. In many practical scenarios it is possible to use information gleaned from previous observations to focus the sampling process, in the spirit of the "twenty-questions" game. These techniques are generically known as active learning or adaptive sampling. Although appealing, analysis of such methodologies is difficult, since we can no longer rely on the statistical independence of the observations. This is especially the
case in the presence of measurement uncertainty or noise. The main focus of this work is to provide a deep understanding of the potential draws and limitations of active learning or adaptive sampling under the presence of measurement uncertainty. I present results characterizing the fundamental limits of active sampling in the function regression context, focusing on various nonparametric function classes, and present also algorithms capable ofexploiting the adaptivity of active sampling, provably improving upon non-adaptive techniques and achieving nearly optimal performance.


***************************************************************************

Sophie Dabo-Niang : Recent advances in nonparametric functional data analysis allow to define the notion of mode for a sample of curves; a kernel-type estimator is proposed for estimating this modal curve. In addition, other centrality curves can be easily extended to the functional case, namely mean and median curves. We present a nonparametric unsupervised classification method, mainly based on the comparison between the modal curve with an other centrality curve for measuring some heterogeneity index. The main point is to show the good practical behavior of this hierarchical classification on a sample of altimetric curves registered from the satellite Topex/Poseidon upon the Amazonian basin. In addition, theoretical advances on the kernel mode estimator are provided.

****************************************************************************

Elisabeth Gassiat : This talk deals with the likelihood ratio test (LRT) for testing hypotheses on the mixing measure in mixture models with possibly unknown structural parameter. The main result gives the asymptotic distribution of the LRT statistics under some conditions that are proved to be almost necessary. A detailed solution is given for two testing problems: the test of a single distribution against any mixture, with application to Gaussian, Poisson and binomial distributions; the test of the number of populations in a finite mixture with possibly unknown structural parameter.

**************************************************************************


Ghislaine Gayraud : From i.i.d. observations drawn from an unknown distribution, we study the nonparametric problem of Bayesian estimation of level sets. The unknown distribution is supposed to have a probability density with respect to the lebesgue measure on R^d, d>=2. Given both a prior Pi on a class of densities and the pseudo-distance which is the Lebesgue measure of the symmetric difference between two sets, the nice feature of the Bayesian approach is that it leads to an explicit expression of the Bayes estimate and it does not require the a priori knowledge of a smoothness class at which the true density belongs. Under fairly general conditions on the prior Pi, we provide an upper bound on the rate of convergence of the Bayesian level set estimate. This result is valid for a large class of nonparametric prior distribution. We then apply our general result to a particular choice of prior in order to compare our results to existing rates of convergence in the frequentist nonparametric literature: it turns out that the Bayesian level set estimate is competitive.


**************************************************************************

Jussi Klemela : A level set tree of a function is a tree structure of the separated components of the level sets of the function. The tree structure of local minima or maxima of a function can be described by a level set tree. Level set trees can be used to describe not only the shape of functions but also the shape of multidimensional sets; we can define a distance function or a height function on a set and construct a level set tree of this function. Level set trees can also be used to describe the shape of point clouds, by applying appropriate smoothing. With the help of a level set tree one can define shape isomorphic transforms. A shape isomorphic transform transforms a multidimensional object to a low-dimensional object which has same shape characteristics as the original multivariate object. This leads to a recursive analysis of the shape of a function: we start by analyzing the structure of local extremes of the function with level set trees and then continue to analyze the shape of the connected components of the level sets. A natural approach to mode testing consist of testing at each level whether the level set contains separated components. Level set trees provide a conceptual and computational framework for implementing such a testing procedure.

****************************************************************************

Wolfgang Polonik : The excess mass approach (Hartigan, 1987, Müller and Sawitzki, 1991) touches on various other statistical methodologies. Knowledge of these connections provides additional insight and different points of views to these methodologies. In this talk we will outline these connections of which some of them known, some of them are new. By doing so we will touch on various topics such as majorization, level set estimation, classification, split-point estimation, the Hough transform and vertical density representation.

************************************************************

Günther Sawitzki : Excess mass and its companion, the silhouette, have been developed as a means to test for multimodality. They are part of more general program: to judge the quality of a model, find the proportion of data which can be covered by this model in excess to a more simple competitor ("null hypothesis"). In a first part, we discuss some of the ideas on which excess mass estimators are based.

Exploratory analysis needs different tools, going beyond excess mass and silhouette. This will be discussed in a second part of the talk.


Inscription

L'inscription est gratuite mais obligatoire.

Merci d'envoyer un courriel à Karine Tribouley.

Lieu

Equipe MODAL'X
Université Paris X
Bâtiment B salle 015

200 avenue de la République
92000 NANTERRE

Comment venir?

Hébergement

imprimer haut de la page