# Marco Zaffalon

(Drawn in pencil; see the author.)

Scientific Director
Istituto "Dalle Molle" di Studi sull'Intelligenza Artificiale (IDSIA)
Polo universitario Lugano
Via la Santa 1
CH-6962 Lugano, Switzerland
Tel.: +41 (0)58 666 666 5
Fax: +41 (0)58 666 666 1
E-mail: .

Dblp: 08/4633.
Orcid: 0000-0001-8908-1502.
Scholar: ECEcK4cAAAAJ.
Scopus: 6602149061.
Semanticscholar: 1825163.

Last update: January 25, 2023.

Science. There are three stages in scientific discovery: first, people deny that it is true, then they deny that it is important; finally they credit the wrong person. (Alexander Von Humboldt)

Art. This fellow has a problem ... he considers boredom an Art. (Charles Bukowski)

Ethics. An ethicist is someone who sees something wrong with whatever you have in mind. (Marvin Minsky)

Love. A love relation's stages: don't ever change; you should change; you have changed. (Anonymous)

Realism. Have you ever noticed that anybody driving slower than you is an idiot, and anyone going faster than you is a maniac? (George Carlin)

Religion. Thanks to god for making me an atheist. (Ricky Gervais)

Hanlon's razor. Don't ascribe to malice what can be plainly explained with incompetence. (Robert J. Hanlon)

My point of view on the state of Artificial Intelligence in Ticino (in Italian) as of July 8, 2020. Debating on similar issues with some nice guests (still in Italian) on October 15, 2021.

## Biographical note

• M.Sc. degree cum laude in Computer Science in 1991 and Ph.D. degree in Applied Mathematics in 1997, both from Università degli Studi di Milano, Italy.
• At IDSIA since 1997. Founder and head of the Imprecise Probability Group.
• Co-chair of the conferences ISIPTA '03, ISIPTA '07, and chair of the 1st SIPTA School on Imprecise Probabilities.
• Formerly: founding Member, Secretary, and President of SIPTA.
• Senior Area Editor for imprecise probabilities at IJAR since 2018 (Area Editor during 2005–2017).
• SUPSI Professor since 2009.
• Lecturer: Causal Inference, 2018–2020 [book: Primer; slides: all (revised Jan 2020) or chapter by chapter (older version) here];
Uncertain Reasoning and Data Mining, 2008–2019; Probability and Statistics, 2005–2009.
• Program Committee Member for several meetings, including: AAAI, CLearR, ECAI, ECMLPKDD, IPMU, ISIPTA, IJCAI, NeurIPS, PGM, SMPS, UAI, WHY.
Senior Program Committee Member for IJCAI 2021–2023, UAI 2021–2023, AAAI 2022.
• Scientific Director of IDSIA since 2019.
• Co-founder and Chief Scientist of Artificialy since 2020.
• Top 2% scientist according to the science-wide author databases of standardised citation indicators (2020–2022).

## Papers

Writing. I am writing a longer letter than usual because there is not enough time to write a short one. (Blaise Pascal)

## Edited books

• De Cooman, G., Vejnarová, J., Zaffalon, M. (Eds) (2007), ISIPTA '07, Action M Agency, Prague, Czech Republic.
• Bernard, J.-M., Seidenfeld, T., Zaffalon, M. (Eds) (2003), ISIPTA '03, Proceedings in Informatics 18, Carleton Scientific, Canada.

## Edited journal issues

Correlated product of experts for sparse Gaussian process regression

Authors: Manuel Schürch, Dario Azzimonti, Alessio Benavoli and Marco Zaffalon.
Year: 2023.

Abstract: Gaussian processes (GPs) are an important tool in machine learning and statistics. However, off-the-shelf GP inference procedures are limited to datasets with several thousand data points because of their cubic computational complexity. For this reason, many sparse GPs techniques have been developed over the past years. In this paper, we focus on GP regression tasks and propose a new approach based on aggregating predictions from several local and correlated experts. Thereby, the degree of correlation between the experts can vary between independent up to fully correlated experts. The individual predictions of the experts are aggregated taking into account their correlation resulting in consistent uncertainty estimates. Our method recovers independent Product of Experts, sparse GP and full GP in the limiting cases. The presented framework can deal with a general kernel function and multiple variables, and has a time and space complexity which is linear in the number of experts and data samples, which makes our approach highly scalable. We demonstrate superior performance, in a time vs. accuracy sense, of our proposed method against state-of-the-art GP approximations for synthetic as well as several real-world datasets with deterministic and stochastic optimization.

Published in Machine Learning, doi:10.1007/s10994-022-06297-3.

Nonlinear desirability theory

Authors: Enrique Miranda and Marco Zaffalon.
Year: 2023.

Abstract: Desirability can be understood as an extension of Anscombe and Aumann's Bayesian decision theory to sets of expected utilities. At the core of desirability lies an assumption of linearity of the scale in which rewards are measured. It is a traditional assumption used to derive the expected utility model, which clashes with a general representation of rational decision making, though. Allais has, in particular, pointed this out in 1953 with his famous paradox. We note that the utility scale plays the role of a closure operator when we regard desirability as a logical theory. This observation enables us to extend desirability to the nonlinear case by letting the utility scale be represented via a general closure operator. The new theory directly expresses rewards in actual nonlinear currency (money), much in Savage's spirit, while arguably weakening the founding assumptions to a minimum. We characterise the main properties of the new theory both from the perspective of sets of gambles and of their lower and upper prices (previsions). We show how Allais paradox finds a solution in the new theory, and discuss the role of sets of probabilities in the theory.

Published in International Journal of Approximate Reasoning 154, 176–199.

Bounding counterfactuals under selection bias

Authors: Marco Zaffalon, Alessandro Antonucci, Rafael Cabañas, David Huber and Dario Azzimonti.
Year: 2022.

Abstract: Causal analysis may be affected by selection bias, which is defined as the systematic exclusion of data from a certain subpopulation. Previous work in this area focused on the derivation of identifiability conditions. We propose instead a first algorithm to address both identifiable and unidentifiable queries. We prove that, in spite of the missingness induced by the selection bias, the likelihood of the available data is unimodal. This enables us to use the causal expectation-maximisation scheme to obtain the values of causal queries in the identifiable case, and to compute bounds otherwise. Experiments demonstrate the approach to be practically viable. Theoretical convergence characterisations are provided.

Published in Salmerón, A., Rumí, R. (Eds), PGM 2022. PMLR 186, JMLR.org, 289–300.

Quantum indistinguishability through exchangeability

Authors: Alessio Benavoli, Alessandro Facchini and Marco Zaffalon.
Year: 2022.

Abstract: Two particles are identical if all their intrinsic properties, such as spin and charge, are the same, meaning that no quantum experiment can distinguish them. In addition to the well known principles of quantum mechanics, understanding systems of identical particles requires a new postulate, the so called symmetrization postulate. In this work, we show that the postulate corresponds to exchangeability assessments for sets of observables (gambles) in a quantum experiment, when quantum mechanics is seen as a normative and algorithmic theory guiding an agent to assess her subjective beliefs represented as (coherent) sets of gambles. Finally, we show how sets of exchangeable observables (gambles) may be updated after a measurement and discuss the issue of defining entanglement for indistinguishable particle systems.
Published in International Journal of Approximate Reasoning 151, 389–412.

Nonlinear desirability as a linear classification problem

Authors: Arianna Casanova, Alessio Benavoli and Marco Zaffalon.
Year: 2023.

Abstract: This paper presents an interpretation as classification problem for standard desirability and other instances of nonlinear desirability (convex coherence and positive additive coherence). In particular, we analyze different sets of rationality axioms and, for each one of them, we show that proving that a subject respects these axioms on the basis of a finite set of \emph{acceptable} and a finite set of rejectable gambles can be reformulated as a binary classification problem where the family of classifiers used changes with the axioms considered. Moreover, by borrowing ideas from machine learning, we show the possibility of defining a feature mapping, which allows us to reformulate the above nonlinear classification problems as linear ones in higher-dimensional spaces. This allows us to interpret gambles directly as payoffs vectors of monetary lotteries, as well as to provide a practical tool to check the rationality of an agent.
Published in International Journal of Approximate Reasoning 152, 1–32.

Information algebras in the theory of imprecise probabilities, an extension

Authors: Arianna Casanova, Jürg Kohlas and Marco Zaffalon.
Year: 2022.

Abstract: In recent works, we have shown how to construct an information algebra of coherent sets of gambles, considering firstly a particular model to represent questions, called the multivariate model, and then generalizing it. Here we further extend the construction made to the highest level of generality, setting up an associated information algebra of coherent lower previsions, analyzing the connection of both the information algebras constructed with an instance of set algebras and, finally, establishing and inspecting a version of the marginal problem in this framework.

Published in International Journal of Approximate Reasoning 150, 311–336.

Information algebras in the theory of imprecise probabilities

Authors: Arianna Casanova, Jürg Kohlas and Marco Zaffalon.
Year: 2022.

Abstract: In this paper we create a bridge between desirability and information algebras: we show how coherent sets of gambles, as well as coherent lower previsions, induce such structures. This allows us to enforce the view of such imprecise-probability objects as algebraic and logical structures; moreover, it enforces the interpretation of probability as information, and gives tools to manipulate them as such.

Published in International Journal of Approximate Reasoning 142, 383–416.

Causal expectation-maximisation

Authors: Marco Zaffalon, Alessandro Antonucci and Rafael Cabañas.
Year: 2021.

Abstract: Structural causal models are the basic modelling unit in Pearl's causal theory; in principle they allow us to solve counterfactuals, which are at the top rung of the ladder of causation. But they often contain latent variables that limit their application to special settings. This appears to be a consequence of the fact, proven in this paper, that causal inference is NP-hard even in models characterised by polytree-shaped graphs. To deal with such a hardness, we introduce the causal EM algorithm. Its primary aim is to reconstruct the uncertainty about the latent variables from data about categorical manifest variables. Counterfactual inference is then addressed via standard algorithms for Bayesian networks. The result is a general method to approximately compute counterfactuals, be they identifiable or not (in which case we deliver bounds). We show empirically, as well as by deriving credible intervals, that the approximation we provide becomes accurate in a fair number of EM runs. These results lead us finally to argue that there appears to be an unnoticed limitation to the trending idea that counterfactual bounds can often be computed without knowledge of the structural equations.

Accepted at WHY-21 @ NeurIPS 2021.

The weirdness theorem and the origin of quantum paradoxes

Authors: Alessio Benavoli, Alessandro Facchini and Marco Zaffalon.
Year: 2021.

Abstract: We argue that there is a simple, unique, reason for all quantum paradoxes, and that such a reason is not uniquely related to quantum theory. It is rather a mathematical question that arises at the intersection of logic, probability, and computation. We give our 'weirdness theorem' that characterises the conditions under which the weirdness will show up. It shows that whenever logic has bounds due to the algorithmic nature of its tasks, then weirdness arises in the special form of negative probabilities or non-classical evaluation functionals. Weirdness is not logical inconsistency, however. It is only the expression of the clash between an unbounded and a bounded view of computation in logic. We discuss the implication of these results for quantum mechanics, arguing in particular that its interpretation should ultimately be computational rather than exclusively physical. We develop in addition a probabilistic theory in the real numbers that exhibits the phenomenon of entanglement, thus concretely showing that the latter is not specific to quantum mechanics.

Published in Foundations of Physics 51(5), 95.

Time series forecasting with Gaussian processes needs priors

Authors: Giorgio Corani, Alessio Benavoli and Marco Zaffalon.
Year: 2021.

Abstract: Automatic forecasting is the task of receiving a time series and returning a forecast for the next time steps without any human intervention. Gaussian Processes (GPs) are a powerful tool for modeling time series, but so far there are no competitive approaches for automatic forecasting based on GPs. We propose practical solutions to two problems: automatic selection of the optimal kernel and reliable estimation of the hyperparameters. We propose a fixed composition of kernels, which contains the components needed to model most time series: linear trend, periodic patterns, and other flexible kernel for modeling the non-linear trend. Not all components are necessary to model each time series; during training the unnecessary components are automatically made irrelevant via automatic relevance determination (ARD). We moreover assign priors to the hyperparameters, in order to keep the inference within a plausible range; we design such priors through an empirical Bayes approach. We present results on many time series of different types; our GP model is more accurate than state-of-the-art time series models. Thanks to the priors, a single restart is enough the estimate the hyperparameters; hence the model is also fast to train.

Published in ECML PKDD 2021. Lecture notes in Computer Science 4, Springer, 103–117.

Algebras of sets and coherent sets of gambles

Authors: Arianna Casanova, Jürg Kohlas and Marco Zaffalon.
Year: 2021.

Abstract: In a recent work we have shown how to construct an information algebra of coherent sets of gambles defined on general possibility spaces. Here we analyze the connection of such an algebra with the set algebra of sets of its atoms and the set algebra of subsets of the possibility space on which gambles are defined. Set algebras are particularly important information algebras since they are their prototypical structures. Furthermore, they are the algebraic counterparts of classical propositional logic. As a consequence, this paper also details how propositional logic is naturally embedded into the theory of imprecise probabilities.

Published in Vejnarová, J., Wilson, N. (Eds), ECSQARU 2021. Lecture Notes in Artificial Intelligence 12897, Springer, 603–615.

A revised and updated version of this conference paper is Information algebras in the theory of imprecise probabilities, an extension.

Quantum indistinguishability through exchangeable desirable gambles

Authors: Alessio Benavoli, Alessandro Facchini and Marco Zaffalon.
Year: 2021.

Abstract: Two particles are identical if all their intrinsic properties, such as spin and charge, are the same, meaning that no quantum experiment can distinguish them. In addition to the well-known principles of quantum mechanics, understanding systems of identical particles requires a new postulate, the so-called symmetrization postulate. In this work, we show that the postulate corresponds to exchangeability assessments for sets of observables (gambles) in a quantum experiment, when quantum mechanics is seen as a normative and algorithmic theory guiding an agent to assess her subjective beliefs represented as (coherent) sets of gambles. Finally, we show how sets of exchangeable observables (gambles) may be updated after a measurement and discuss the issue of defining entanglement for indistinguishable particle systems.

Published in de Bock, J., Cano, A., Miranda, E., Moral, S. (Eds), ISIPTA '21. PMLR 147, JMLR.org, 22–31.

A revised and updated version of this conference paper is Quantum indistinguishability through exchangeability.

Nonlinear desirability as a linear classification problem

Authors: Arianna Casanova, Alessio Benavoli and Marco Zaffalon.
Year: 2021.

Abstract: The present paper proposes a generalization of linearity axioms of coherence through a geometrical approach, which leads to an alternative interpretation of desirability as a classification problem. In particular, we analyze different sets of rationality axioms and, for each one of them, we show that proving that a subject, who provides finite accept and reject statements, respects these axioms, corresponds to solving a binary classification task using, each time, a different (usually nonlinear) family of classifiers. Moreover, by borrowing ideas from machine learning, we show the possibility to define a feature mapping allowing us to reformulate the above nonlinear classification problems as linear ones in a higher-dimensional space. This allows us to interpret gambles directly as payoffs vectors of monetary lotteries, as well as to reduce the task of proving the rationality of a subject to a linear classification task.

Published in de Bock, J., Cano, A., Miranda, E., Moral, S. (Eds), ISIPTA '21. PMLR 147, JMLR.org, 61–71.

A revised and updated version of this conference paper is Nonlinear desirability as a linear classification problem.

Information algebras of coherent sets of gambles in general possibility spaces

Authors: Jüerg Kohlas, Arianna Casanova and Marco Zaffalon.
Year: 2021.

Abstract: The present paper proposes a generalization of linearity axioms of coherence through a geometrical approach, which leads to an alternative interpretation of desirability as a classification problem. In particular, we analyze different sets of rationality axioms and, for each one of them, we show that proving that a subject, who provides finite accept and reject statements, respects these axioms, corresponds to solving a binary classification task using, each time, a different (usually nonlinear) family of classifiers. Moreover, by borrowing ideas from machine learning, we show the possibility to define a feature mapping allowing us to reformulate the above nonlinear classification problems as linear ones in a higher-dimensional space. This allows us to interpret gambles directly as payoffs vectors of monetary lotteries, as well as to reduce the task of proving the rationality of a subject to a linear classification task.

Published in de Bock, J., Cano, A., Miranda, E., Moral, S. (Eds), ISIPTA '21. PMLR 147, JMLR.org, 191–200.

A revised and updated version of this conference paper is Information algebras in the theory of imprecise probabilities, an extension.

The sure thing

Authors: Marco Zaffalon and Enrique Miranda.
Year: 2021.

Abstract: If we prefer action a to b both under an event and under its complement, then we should just prefer a to b. This is Savage's sure-thing principle. In spite of its intuitive- and simple-looking nature, for which it gets almost immediate acceptance, the sure thing is not a logical principle. So where does it get its support from? In fact, the sure thing may actually fail. This is related to a variety of deep and foundational concepts in causality, decision theory, and probability, as well as to Simpons' paradox and Blyth's game. In this paper we try to systematically clarify such a network of relations. Then we propose a general desirability theory for non-linear utility scales. We use that to show that the sure thing is primitive to many of the previous concepts: in non-causal settings, the sure thing follows from considerations of temporal coherence and coincides with conglomerability; it can be understood as a rationality axiom to enable well-behaved conditioning in logic. In causal settings, it can be derived using only coherence and a causal independence condition.

Published in de Bock, J., Cano, A., Miranda, E., Moral, S. (Eds), ISIPTA '21. PMLR 147, JMLR.org, 342–351.

Joint desirability foundations of social choice and opinion pooling

Authors: Arianna Casanova, Enrique Miranda and Marco Zaffalon.
Year: 2021.

Abstract: We develop joint foundations for the fields of social choice and opinion pooling using coherent sets of desirable gambles, a general uncertainty model that allows to encompass both complete and incomplete preferences. This leads on the one hand to a new perspective of traditional results of social choice (in particular Arrow's theorem as well as sufficient conditions for the existence of an oligarchy and democracy) and on the other hand to using the same framework to analyse opinion pooling. In particular, we argue that weak Pareto (unanimity) should be given the status of a rationality requirement and use this to discuss the aggregation of experts' opinions based on probability and (state-independent) utility, showing some inherent limitation of this framework, with implications for statistics. The connection between our results and earlier work in the literature is also discussed.

Published in Annals of Mathematics and Artificial Intelligence 89(10–11), 965–1011..

Impact on place of death in cancer patients: a causal exploration in southern Switzerland

Authors: Heidi Kern, Giorgio Corani, David Huber, Nicola Vermes, Marco Zaffalon, Marco Varini, Claudia Wenzel and André Fringer.
Year: 2020.

Abstract: Most terminally ill cancer patients prefer to die at home, but a majority die in institutional settings. Research questions about this discrepancy have not been fully answered. This study applies artificial intelligence and machine learning techniques to explore the complex network of factors and the cause-effect relationships affecting the place of death, with the ultimate aim of developing policies favouring home-based end-of-life care.

Published in BMC Palliative Care 19, 160.

CREDICI: a Java library for causal inference by credal networks

Authors: Rafael Cabañas, Alessandro Antonucci, David Huber and Marco Zaffalon.
Year: 2020.

Abstract: We present CREDICI, a Java open-source tool for causal inference based on credal networks. Credal networks are an extension of Bayesian networks where local probability mass functions are only constrained to belong to given, so-called credal, sets. CREDICI is based on the recent work of Zaffalon et al. (2020), where an equivalence between Pearl's structural causal models and credal networks has been derived. This allows to reduce a counterfactual query in a causal model to a standard query in a credal network, even in the case of unidentifiable causal effects. The necessary transformations and data structures are implemented in CREDICI, while inferences are eventually computed by CREMA (Huber et al., 2020), a twin library for general credal network inference. Here we discuss the main implementation challenges and possible outlooks.

Published in Jaeger, M., Nielsen, T. D. (Eds), PGM 2020. PMLR 138, JMLR.org, 597–600.

CREMA: a Java library for credal network inference

Authors: David Huber, Rafael Cabañas, Alessandro Antonucci and Marco Zaffalon.
Year: 2020.

Abstract: We present CREMA (credal models algorithms), a Java library for inference in credal networks. These models are analogous to Bayesian networks, but their local parameters are only constrained to vary in, so-called credal, sets. Inference in credal networks is intended as the computation of the bounds of a query with respect to those local variations. For credal networks the task is harder than in Bayesian networks, being NPPP-hard in general models. Yet, scalable approximate algorithms have been shown to provide good accuracies on large or dense models, while exact techniques can be designed to process small or sparse models. CREMA embeds these algorithms and also offers an API to build and query credal networks together with a specification format. This makes CREMA, whose features are discussed and described by a simple example, the most advanced tool for credal network modelling and inference developed so far.

Published in Jaeger, M., Nielsen, T. D. (Eds), PGM 2020. PMLR 138, JMLR.org, 613–616.

Structural causal models are (solvable by) credal networks

Authors: Marco Zaffalon, Alessandro Antonucci and Rafael Cabañas.
Year: 2020.

Abstract: A structural causal model is made of endogenous (manifest) and exogenous (latent) variables. We show that endogenous observations induce linear constraints on the probabilities of the exogenous variables. This allows to exactly map a causal model into a credal network. Causal inferences, such as interventions and counterfactuals, can consequently be obtained by standard algorithms for the updating of credal nets. These natively return sharp values in the identifiable case, while intervals corresponding to the exact bounds are produced for unidentifiable queries. A characterization of the causal models that allows the map above to be compactly derived is given, along with a discussion about the scalability for general models. This contribution should be regarded as a systematic approach to represent structural causal models by credal networks and hence to systematically compute causal inferences. A number of demonstrative examples is presented to clarify our methodology. Extensive experiments show that approximate algorithms for credal networks can immediately be used to do causal inference in real-size problems.

Published in Jaeger, M., Nielsen, T. D. (Eds), PGM 2020. PMLR 138, JMLR.org, 581–592.

Recursive estimation for sparse Gaussian process regression

Authors: Manuel Schürch, Dario Azzimonti, Alessio Benavoli and Marco Zaffalon.
Year: 2020.

Abstract: Gaussian Processes (GPs) are powerful kernelized methods for non-parameteric regression used in many applications. However, their use is limited to a few thousand of training samples due to their cubic time complexity. In order to scale GPs to larger datasets, several sparse approximations based on so-called inducing points have been proposed in the literature. In this work we investigate the connection between a general class of sparse inducing point GP regression methods and Bayesian recursive estimation which enables Kalman filter like updating for online learning. The majority of previous work has focused on the batch setting, in particular for learning the model parameters and the position of the inducing points, here instead we focus on training with mini-batches. By exploiting the Kalman filter formulation, we propose a novel approach that estimates such parameters by recursively propagating the analytical gradients of the posterior over mini-batches of the data. Compared to state-of-the-art methods, our method keeps analytic updates for the mean and covariance of the posterior, thus reducing drastically the size of the optimization problem. We show that our method achieves faster convergence and superior performance compared to state of the art sequential Gaussian Process regression on synthetic GP as well as real-world data with up to a million of data samples.

Published in Automatica 120, 109127.

Probabilistic reconciliation of hierarchical forecast via Bayes' rule

Authors: Giorgio Corani, Dario Azzimoni, João Augusto and Marco Zaffalon.
Year: 2020.

Abstract: We present a novel approach for reconciling hierarchical forecasts, based on Bayes' rule. We define a prior distribution for the bottom time series of the hierarchy, based on the bottom base forecasts. Then we update their distribution via Bayes' rule, based on the base forecasts for the upper time series. Under the Gaussian assumption, we derive the updating in closed-form. We derive two algorithms, which differ as for the assumed independencies. We discuss their relation with the MinT reconciliation algorithm and with the Kalman filter, and we compare them experimentally.

Published in Hutter, F., Kersting, K., Lijffijt, J., Valera, I. (Eds), ECML PKDD 2020. Lecture notes in Computer Science 12459. Springer, 211–226.

Compatibility, desirability, and the running intersection property

Authors: Enrique Miranda and Marco Zaffalon.
Year: 2020.

Abstract: Compatibility is the problem of checking whether some given probabilistic assessments have a common joint probabilistic model. When the assessments are unconditional, the problem is well established in the literature and finds a solution through the running intersection property (RIP). This is not the case of conditional assessments. In this paper, we study the compatibility problem in a very general setting: any possibility space, unrestricted domains, imprecise (and possibly degenerate) probabilities. We extend the unconditional case to our setting, thus generalising most of previous results in the literature. The conditional case turns out to be fundamentally different from the unconditional one. For such a case, we prove that the problem can still be solved in general by RIP but in a more involved way: by constructing a junction tree and propagating information over it. Still, RIP does not allow us to optimally take advantage of sparsity: in fact, conditional compatibility can be simplified further by joining junction trees with coherence graphs.

Published in Artificial Intelligence 283, 103274.

Social pooling of beliefs and values with desirability

Authors: Arianna Casanova, Enrique Miranda and Marco Zaffalon.
Year: 2020.

Abstract: The problem of aggregating beliefs and values of rational subjects is treated with the formalism of sets of desirable gambles. This leads on the one hand to a new perspective of traditional results of social choice (in particular Arrow's theorem as well as sufficient conditions for the existence of an oligarchy and democracy) and on the other hand to use the same framework to create connections with opinion pooling. In particular, we show that weak Pareto can be derived as a coherence requirement and discuss the aggregation of state independent beliefs.

Published in FLAIRS-33, AAAI, 569–574.

A revised and updated version of this conference paper is Joint desirability foundations of social choice and opinion pooling.

Sampling subgraphs with guaranteed treewidth for accurate and efficient graphical inference

Authors: Jaemin Yoo, U Kang, Mauro Scanagatta, Giorgio Corani and Marco Zaffalon.
Year: 2020.

Abstract: How can we run graphical inference on large graphs efficiently and accurately? Many real-world networks are modeled as graphical models, and graphical inference is fundamental to understand the properties of those networks. In this work, we propose a novel approach for fast and accurate inference, which first samples a small sub-graph and then runs inference over the subgraph instead of the given graph. This is done by the bounded tree-width (BTW) sampling, our novel algorithm that generates a subgraph with guaranteed bounded tree-width while retaining as many edges as possible. We first analyze the properties of BTW theoretically. Then, we evaluate our approach on node classification and compare it with the baseline which is to run loopy belief propagation (LBP) on the original graph. Our approach can be coupled with various inference algorithms: it shows higher accuracy up to 13.7% with the junction tree algorithm, and allows faster inference up to 23.8 times with LBP. We further compare BTW with previous graph sampling algorithms and show that it gives the best accuracy.

Published in Caverlee, J., Hu, X., Lalmas, M., Wang, W. (Eds), WSDM '20. ACM, 708–716.

Bernstein's socks, polynomial-time provable coherence and entanglement

Authors: Alessio Benavoli, Alessandro Facchini and Marco Zaffalon.
Year: 2019.

Abstract: We recently introduced a bounded rationality approach for the theory of desirable gambles. It is based on the unique requirement that being nonnegative for a gamble has to be defined so that it can be provable in polynomial time. In this paper we continue to investigate properties of this class of models. In particular we verify that the space of Bernstein polynomials in which nonnegativity is specified by the Krivine-Vasilescu certificate is yet another instance of this theory. As a consequence, we show how it is possible to construct in it a thought experiment uncovering entanglement with classical (hence non quantum) coins.

Published in de Bock, J., de Campos, C., de Cooman, G., Quaeghebeur, E, Wheeler, G. (Eds), ISIPTA '19. PMLR 103, JMLR.org, 23–31.

Hierarchical estimation of parameters in Bayesian networks

Authors: Laura Azzimonti, Giorgio Corani and Marco Zaffalon.
Year: 2019.

Abstract: A novel approach for parameter estimation in Bayesian networks is presented. The main idea is to introduce a hyper-prior in the Multinomial-Dirichlet model, traditionally used for conditional distribution estimation in Bayesian networks. The resulting hierarchical model jointly estimates different conditional distributions belonging to the same conditional probability table, thus borrowing statistical strength from each other. An analytical study of the dependence structure a priori induced by the hierarchical model is performed and an ad hoc variational algorithm for fast and accurate inference is derived. The proposed hierarchical model yields a major performance improvement in classification with Bayesian networks compared to traditional models. The proposed variational algorithm reduces by two orders of magnitude the computational time, with the same accuracy in parameter estimation, compared to traditional MCMC methods. Moreover, motivated by a real case study, the hierarchical model is applied to the estimation of Bayesian networks parameters by borrowing strength from related domains.

Published in Computational Statistics and Data Analysis 137, 67–91.

Sum-of-squares for bounded rationality

Authors: Alessio Benavoli, Alessandro Facchini, Dario Piga and Marco Zaffalon.
Year: 2019.

Abstract: In the gambling foundation of probability theory, rationality requires that a subject should always (never) find desirable all nonnegative (negative) gambles, because no matter the result of the experiment the subject never (always) decreases her money. Evaluating the nonnegativity of a gamble in infinite spaces is a difficult task. In fact, even if we restrict the gambles to be polynomials in ℝn, the problem of determining nonnegativity is NP-hard. The aim of this paper is to develop a computable theory of desirable gambles. Instead of requiring the subject to desire all nonnegative gambles, we only require her to desire gambles for which she can efficiently determine the nonnegativity (in particular sum-of-squares polynomials). We refer to this new criterion as bounded rationality.

Published in International Journal of Approximate Reasoning 105, 130–152.

Desirability foundations of robust rational decision making

Authors: Marco Zaffalon and Enrique Miranda.
Year: 2021 (published online in 2018).

Abstract: Recent work has formally linked the traditional axiomatisation of incomplete preferences à la Anscombe-Aumann with the theory of desirability developed in the context of imprecise probability, by showing in particular that they are the very same theory. The equivalence has been established under the constraint that the set of possible prizes is finite. In this paper, we relax such a constraint, thus de facto creating one of the most general theories of rationality and decision making available today. We provide the theory with a sound interpretation and with basic notions, and results, for the separation of beliefs and values, and for the case of complete preferences. Moreover, we discuss the role of conglomerability for the presented theory, arguing that it should be a rationality requirement under very broad conditions.

Published in Synthese 198(27), S6529–S6570.

Compatibility, coherence and the RIP

Authors: Enrique Miranda and Marco Zaffalon.
Year: 2019.

Abstract: We generalise the classical result on the compatibility of marginal, possible non-disjoint, assessments in terms of the running intersection property to the imprecise case, where our beliefs are modelled in terms of sets of desirable gambles. We consider the case where we have unconditional and conditional assessments, and show that the problem can be simplified via a tree decomposition.

Published in Destercke, S., Denoeux, T., Gil, M. A., Grzegorzewski, P., Hryniewicz, O. (Eds), SMPS 2018, Uncertainty Modelling in Data Science, Advances in Intelligent and Soft Computing 832, Springer, 166–174.

Entropy-based pruning for learning Bayesian networks using BIC

Authors: Cassio P. de Campos, Mauro Scanagatta, Giorgio Corani and Marco Zaffalon.
Year: 2018.

Abstract: For decomposable score-based structure learning of Bayesian networks, existing approaches first compute a collection of candidate parent sets for each variable and then optimize over this collection by choosing one parent set for each variable without creating directed cycles while maximizing the total score. We target the task of constructing the collection of candidate parent sets when the score of choice is the Bayesian Information Criterion (BIC). We provide new non-trivial results that can be used to prune the search space of candidate parent sets of each node. We analyze how these new results relate to previous ideas in the literature both theoretically and empirically. We show in experiments with UCI data sets that gains can be significant. Since the new pruning rules are easy to implement and have low computational costs, they can be promptly integrated into all state-of-the-art methods for structure learning of Bayesian networks.

Published in Artificial Intelligence 260, 42–50.

Approximate structure learning for large Bayesian networks

Authors: Mauro Scanagatta, Giorgio Corani, Cassio P. de Campos and Marco Zaffalon.
Year: 2018.

Abstract: Learning a Bayesian networks with bounded treewidth is important for reducing the complexity of the inferences. We present a novel anytime algorithm (k-MAX) method for this task, which scales up to thousands of variables. Through extensive experiments we show that it consistently yields higher-scoring structures than its competitors on complete data sets. We then consider the problem of structure learning from incomplete data sets. This can be addressed by structural EM, which however is computationally very demanding. We thus adopt the novel k-MAX algorithm in the maximization step of structural EM, obtaining an efficient computation of the expected sufficient statistics. We test the resulting structural EM method on the task of imputing missing data, comparing it against the state-of-the-art approach based on random forests. Our approach achieves the same imputation accuracy of the competitors, but in about one tenth of the time. Furthermore we show that it has worst-case complexity linear in the input size, and that it is easily parallelizable.

Published in Machine Learning 107(8–10), 1209–1227.

Efficient learning of bounded-treewidth Bayesian networks from complete and incomplete data sets

Authors: Mauro Scanagatta, Giorgio Corani, Marco Zaffalon, Jaemin Yoo and U Kang.
Year: 2018.

Abstract: Learning a Bayesian networks with bounded treewidth is important for reducing the complexity of the inferences. We present a novel anytime algorithm (k-MAX) method for this task, which scales up to thousands of variables. Through extensive experiments we show that it consistently yields higher-scoring structures than its competitors on complete data sets. We then consider the problem of structure learning from incomplete data sets. This can be addressed by structural EM, which however is computationally very demanding. We thus adopt the novel k-MAX algorithm in the maximization step of structural EM, obtaining an efficient computation of the expected sufficient statistics. We test the resulting structural EM method on the task of imputing missing data, comparing it against the state-of-the-art approach based on random forests. Our approach achieves the same imputation accuracy of the competitors, but in about one tenth of the time. Furthermore we show that it has worst-case complexity linear in the input size, and that it is easily parallelizable.

Published in International Journal of Approximate Reasoning 95, 152–166.

Axiomatising incomplete preferences through sets of desirable gambles

Authors: Marco Zaffalon and Enrique Miranda.
Year: 2017.

Abstract: We establish the equivalence of two very general theories: the first is the decision-theoretic formalisation of incomplete preferences based on the mixture independence axiom; the second is the theory of coherent sets of desirable gambles (bounded variables) developed in the context of imprecise probability and extended here to vector-valued gambles. Such an equivalence allows us to analyse the theory of incomplete preferences from the point of view of desirability. Among other things, this leads us to uncover an unexpected and clarifying relation: that the notion of state independence—the traditional assumption that we can have separate models for beliefs (probabilities) and values (utilities)—coincides with that of strong independence in imprecise probability; this connection leads us also to propose much weaker, and arguably more realistic, notions of state independence. Then we simplify the treatment of complete beliefs and values by putting them on a more equal footing. We study the role of the Archimedean condition—which allows us to actually talk of expected utility—, identify some weaknesses and propose alternatives that solve these. More generally speaking, we show that desirability is a valuable alternative foundation to preferences for decision theory that streamlines and unifies a number of concepts while preserving great generality. In addition, the mentioned equivalence shows for the first time how to extend the theory of desirability to imprecise non-linear utility, thus enabling us to formulate one of the most powerful self-consistent theories of reasoning and decision-making available today.

Published in Journal of Artificial Intelligence Research 60, 1057–1126.

Hierarchical multinomial-Dirichlet model for the estimation of conditional probability tables

Authors: Laura Azzimonti, Giorgio Corani and Marco Zaffalon.
Year: 2017.

Abstract: We present a novel approach for estimating conditional probability tables, based on a joint, rather than independent, estimate of the conditional distributions belonging to the same table. We derive exact analytical expressions for the estimators and we analyse their properties both analytically and via simulation. We then apply this method to the estimation of parameters in a Bayesian network. Given the structure of the network, the proposed approach better estimates the joint distribution and significantly improves the classification performance with respect to traditional approaches.

Published in Raghavan, V., Aluru, S., Karypis, G., Miele, L., Wu, X. (Eds), ICDM 2017, IEEE, 739–744.

A revised and updated version of this conference paper is Hierarchical estimation of parameters in Bayesian networks.

Improved local search in Bayesian networks structure learning

Authors: Mauro Scanagatta, Giorgio Corani and Marco Zaffalon.
Year: 2017.

Abstract: We present a novel approach for score-based structure learning of Bayesian network, which couples an existing ordering-based algorithm for structure optimization with a novel operator for exploring the neighborhood of a given order in the space of the orderings. Our approach achieves state-of-the-art performances in data sets containing thousands of variables.

Published in Hyttinen, A., Malone, B., Suzuki, J. (Eds), MBN 2017. PMLR 73, JMLR.org, 45–56.

Time for a change: a tutorial for comparing multiple classifiers through Bayesian analysis

Authors: Alessio Benavoli, Giorgio Corani, Janez Demšar and Marco Zaffalon.
Year: 2017.

Abstract: The machine learning community adopted the use of null hypothesis significance testing (NHST) in order to ensure the statistical validity of results. Many scientific fields however realized the shortcomings of frequentist reasoning and in the most radical cases even banned its use in publications. We should do the same: just as we have embraced the Bayesian paradigm in the development of new machine learning methods, so we should also use it in the analysis of our own results. We argue for abandonment of NHST by exposing its fallacies and, more importantly, offer better—more sound and useful—alternatives for it.

Published in Journal of Machine Learning Research 18(37), 1–36.

SOS for bounded rationality

Authors: Alessio Benavoli, Alessandro Facchini, Dario Piga and Marco Zaffalon.
Year: 2017.

Abstract: In the gambling foundation of probability theory, rationality requires that a subject should always (never) find desirable all nonnegative (negative) gambles, because no matter the result of the experiment the subject never (always) decreases her money. Evaluating the nonnegativity of a gamble in infinite spaces is a difficult task. In fact, even if we restrict the gambles to be polynomials in n, the problem of determining nonnegativity is NP-hard. The aim of this paper is to develop a computable theory of desirable gambles. Instead of requiring the subject to accept all nonnegative gambles, we only require her to accept gambles for which she can efficiently determine the nonnegativity (in particular SOS polynomials). We call this new criterion bounded rationality.

Published in Antonucci, A., Corani, G., Couso, I., Destercke, S. (Eds), ISIPTA '17. PMLR 62, JMLR.org, 25–36.

A polarity theory for sets of desirable gambles

Authors: Alessio Benavoli, Alessandro Facchini, Marco Zaffalon and José Vicente-Pérez.
Year: 2017.

Abstract: Coherent sets of almost desirable gambles and credal sets are known to be equivalent models. That is, there exists a bijection between the two collections of sets preserving the usual operations, e.g. conditioning. Such a correspondence is based on the polarity theory for closed convex cones. Learning from this simple observation, in this paper we introduce a new (lexicographic) polarity theory for general convex cones and then we apply it in order to establish an analogous correspondence between coherent sets of desirable gambles and convex sets of lexicographic probabilities.

Published in Antonucci, A., Corani, G., Couso, I., Destercke, S. (Eds), ISIPTA '17. PMLR 62, JMLR.org, 37–48.

Statistical comparison of classifiers through Bayesian hierarchical modelling

Authors: Giorgio Corani, Alessio Benavoli, Janez Demšar, Francesca Mangili and Marco Zaffalon.
Year: 2017.

Abstract: Usually one compares the accuracy of two competing classifiers using null hypothesis significance tests. Yet such tests suffer from important shortcomings, which can be overcome by switching to Bayesian hypothesis testing. We propose a Bayesian hierarchical model that jointly analyzes the cross-validation results obtained by two classifiers on multiple data sets. The model estimates more accurately the difference between classifiers on the individual data sets than the traditional approach of averaging, independently on each data set, the cross-validation results. It does so by jointly analyzing the results obtained on all data sets, and applying shrinkage to the estimates. The model eventually returns the posterior probability of the accuracies of the two classifiers being practically equivalent or significantly different.

Published in Machine Learning 106(11), 1817–1837.

A Gleason-type theorem for any dimension based on a gambling formulation of quantum mechanics

Authors: Alessio Benavoli, Alessandro Facchini and Marco Zaffalon.
Year: 2017.

Abstract: Based on a gambling formulation of quantum mechanics, we derive a Gleason-type theorem that holds for any dimension n of a quantum system, and in particular for n = 2. The theorem states that the only logically consistent probability assignments are exactly the ones that are definable as the trace of the product of a projector and a density matrix operator. In addition, we detail the reason why dispersion-free probabilities are actually not valid, or rational, probabilities for quantum mechanics, and hence should be excluded from consideration.

Published in Foundations of Physics 47(7), 991–1002.

Full conglomerability

Authors: Enrique Miranda and Marco Zaffalon.
Year: 2017.

Abstract: We do a thorough mathematical study of the notion of full conglomerability, that is, conglomerability with respect to all the partitions of an infinite possibility space, in the sense considered by Peter Walley (1991). We consider both the cases of precise and imprecise probability (sets of probabilities). We establish relations between conglomerability and countable additivity, continuity, super-additivity and marginal extension. Moreover, we discuss the special case where a model is conglomerable with respect to a subset of all the partitions, and try to sort out the different notions of conglomerability present in the literature. We conclude that countable additivity, which is routinely used to impose full conglomerability in the precise case, appears to be the most well-behaved way to do so in the imprecise case as well by taking envelopes of countably additive probabilities. Moreover, we characterise these envelopes by means of a number of necessary and sufficient conditions.

Published in Journal of Statistical Theory and Practice 11(4), 634–669.

Quantum rational preferences and desirability

Authors: Alessio Benavoli, Alessandro Facchini and Marco Zaffalon.
Year: 2016.

Abstract: We develop a theory of quantum rational decision making in the tradition of Anscombe and Aumann's axiomatisation of preferences on horse lotteries. It is essentially the Bayesian decision theory generalised to the space of Hermitian matrices. Among other things, this leads us to give a representation theorem showing that quantum complete rational preferences are obtained by means of expected utility considerations.

Published in Proceedings of the 1st International Workshop on Imperfect Decision Makers: Admitting Real-World Rationality @ NIPS 2016, PMLR 58, JMLR.org, 87–96.

Learning treewidth-bounded Bayesian networks with thousands of variables

Authors: Mauro Scanagatta, Giorgio Corani, Cassio P. de Campos and Marco Zaffalon.
Year: 2016.

Abstract: We present a method for learning treewidth-bounded Bayesian networks from data sets containing thousands of variables. Bounding the treewidth of a Bayesian network greatly reduces the complexity of inferences. Yet, being a global property of the graph, it considerably increases the difficulty of the learning process. Our novel algorithm accomplishes this task, scaling both to large domains and to large treewidths. Our novel approach consistently outperforms the state of the art on experiments with up to thousands of variables.

Published in Lee, D. D., Sugiyama, M., Luxburg, U. V., Guyon, I., Garnett, R. (Eds), NIPS 2016, 1426–1470.

Quantum mechanics: the Bayesian theory generalised to the space of Hermitian matrices

Authors: Alessio Benavoli, Alessandro Facchini and Marco Zaffalon.
Year: 2016.

Abstract: We consider the problem of gambling on a quantum experiment and enforce rational behaviour by a few rules. These rules yield, in the classical case, the Bayesian theory of probability via duality theorems. In our quantum setting, they yield the Bayesian theory generalised to the space of Hermitian matrices. This very theory is quantum mechanics: in fact, we derive all its four postulates from the generalised Bayesian theory. This implies that quantum mechanics is self-consistent. It also leads us to reinterpret the main operations in quantum mechanics as probability rules: Bayes' rule (measurement), marginalisation (partial tracing), independence (tensor product). To say it with a slogan, we obtain that quantum mechanics is the Bayesian theory in the complex numbers.

Published in Physical Review A 94, 042106.

Full conglomerability, continuity and marginal extension

Authors: Enrique Miranda and Marco Zaffalon.
Year: 2017.

Abstract: We investigate fully conglomerable coherent lower previsions in the sense of Walley, and some particular cases of interest: envelopes of fully conglomerable linear previsions, envelopes of countably additive linear previsions and fully disintegrable linear previsions. We study the connections with continuity and countable super-additivity, and show that full conglomerability can be characterised in terms of a supremum of marginal extension models.

Published in Ferraro, M. B., Giordani, P., Vantaggi, B., Gagolewski, M., Gil, M. A., Grzegorzewski, P., Hryniewicz, O. (Eds), SMPS 2016, Soft Methods for Data Science, Advances in Intelligent and Soft Computing 456, Springer, 355–362.

A revised and updated version of this conference paper is Full conglomerability.

Conformity and independence with coherent lower previsions

Authors: Enrique Miranda and Marco Zaffalon.
Year: 2016.

Abstract: We define the conformity of marginal and conditional models with a joint model within Walley's theory of coherent lower previsions. Loosely speaking, conformity means that the joint can reproduce the marginal and conditional models we started from. By studying conformity with and without additional assumptions of epistemic irrelevance and independence, we establish connections with a number of prominent models in Walley's theory: the marginal extension, the irrelevant natural extension, the independent natural extension and the strong product.

Published in International Journal of Approximate Reasoning 78, 125–137.

Learning Bayesian networks with thousands of variables

Authors: Mauro Scanagatta, Cassio P. de Campos, Giorgio Corani and Marco Zaffalon.
Year: 2015.

Abstract: We present a method for learning Bayesian networks from data sets containing thousands of variables without the need for structure constraints. Our approach is made of two parts. The first is a novel algorithm that effectively explores the space of possible parent sets of a node. It guides the exploration towards the most promising parent sets on the basis of an approximated score function that is computed in constant time. The second part is an improvement of an existing ordering-based algorithm for structure optimization. The new algorithm provably achieves a higher score compared to its original formulation. Our novel approach consistently outperforms the state of the art on very large data sets.

Published in Cortes, C., Lawrence, N. D., Lee, D. D., Sugiyama, M., Garnett, R. (Eds), NIPS 2015, Curran Associates, 1855–1863.

Bayesian hypothesis testing in machine learning

Authors: Giorgio Corani, Alessio Benavoli, Francesca Mangili and Marco Zaffalon.
Year: 2015.

Abstract: Most hypothesis testing in machine learning is done using the frequentist null-hypothesis significance test, which has severe drawbacks. We review recent Bayesian tests which overcome the drawbacks of the frequentist ones.

Published in Bifet, A., May, M., Zadrozny, B., Gavaldà, R., Pedreschi, D., Bonchi, F., Cardoso, J. S., Spiliopoulou, M. (Eds), ECML PKDD 2015. Lecture notes in Computer Science 9286, Springer, 199–202.

Conformity and independence with coherent lower previsions

Authors: Enrique Miranda and Marco Zaffalon.
Year: 2015.

Abstract: We study the conformity of marginal unconditional and conditional models with a joint model under assumptions of epistemic irrelevance and independence, within Walley's theory of coherent lower previsions. By doing so, we make a link with a number of prominent models within this theory: the marginal extension, the irrelevant natural extension, the independent natural extension and the strong product.

Published in Augustin, T., Doria, S., Miranda, E., Quaeghebeur, E. (Eds), ISIPTA '15. SIPTA, 197–206.

A revised and updated version of this conference paper is Conformity and independence with coherent lower previsions.

Reliable survival analysis based on the Dirichlet process

Authors: Francesca Mangili, Alessio Benavoli, Cassio Polpo de Campos and Marco Zaffalon.
Year: 2015.

Abstract: We present a robust Dirichlet process for estimating survival functions from samples with right-censored data. It adopts a prior near-ignorance approach to avoid almost any assumption about the distribution of the population lifetimes, as well as the need of eliciting an infinite dimensional parameter (in case of lack of prior information), as it happens with the usual Dirichlet process prior. We show how such model can be used to derive robust inferences from right-censored lifetime data. Robustness is due to the identification of the decisions that are prior-dependent, and can be interpreted as an analysis of sensitivity with respect to the hypothetical inclusion of fictitious new samples in the data. In particular, we derive a nonparametric estimator of the survival probability and a hypothesis test about the probability that the lifetime of an individual from one population is shorter than the lifetime of an individual from another. We evaluate these ideas on simulated data and on the Australian AIDS survival dataset. The methods are publicly available through an easy-to-use R package.

Published in Biometrical Journal 57, 1002–1019.

A Bayesian nonparametric procedure for comparing algorithms

Authors: Alessio Benavoli, Francesca Mangili, Giorgio Corani and Marco Zaffalon.
Year: 2015.

Abstract: A fundamental task in machine learning is to compare the performance of multiple algorithms. This is usually performed by the frequentist Friedman test followed by multiple comparisons. This implies dealing with the well-known shortcomings of null hypothesis significance tests. We propose a Bayesian approach to overcome these problems. We provide three main contributions. First, we propose a nonparametric Bayesian version of the Friedman test using a Dirichlet process (DP) based prior. We show that, from a Bayesian perspective, the Friedman test is an inference for a multivariate mean based on an ellipsoid inclusion test. Second, we derive a joint procedure for the multiple comparisons which accounts for their dependencies and which is based on the posterior probability computed through the DP. The proposed approach allows verifying the null hypothesis, not only rejecting it. Third, as a practical application we show the results in our algorithm for racing, i.e. identifying the best algorithm among a large set of candidates sequentially assessed. Our approach consistently outperforms its frequentist counterpart.

Published in Bach, F. R., Bleim, D. M. (Eds), ICML 2015. PMLR 37, JMLR.org, 1264–1272.

Learning extended tree augmented naive structures

Authors: Cassio Polpo de Campos, Giorgio Corani, Mauro Scanagatta, Marco Cuccu and Marco Zaffalon.
Year: 2016.

Abstract: This work proposes an extended version of the well-known tree-augmented naive Bayes (TAN) classifier where the structure learning step is performed without requiring features to be connected to the class. Based on a modification of Edmonds' algorithm, our structure learning procedure explores a superset of the structures that are considered by TAN, yet achieves global optimality of the learning score function in a very efficient way (quadratic in the number of features, the same complexity as learning TANs). We enhance our procedure with a new score function that only takes into account arcs that are relevant to predict the class, as well as an optimization over the equivalent sample size during learning. These ideas may be useful for structure learning of Bayesian networks in general. A range of experiments show that we obtain models with better prediction accuracy than Naive Bayes and TAN, and comparable to the accuracy of the state-of-the-art classifier averaged one-dependence estimator (AODE). We release our implementation of ETAN so that it can be easily installed and run within Weka.

Published in International Journal of Approximate Reasoning 68, 153–163.

Bayesian network data imputation with application to survival tree analysis

Authors: Paola Maria Vittoria Rancoita, Marco Zaffalon, Emanuele Zucca, Francesco Bertoni and Cassio Polpo de Campos.
Year: 2016.

Abstract: Retrospective clinical datasets are often characterized by a relatively small sample size and many missing data. In this case, a common way for handling the missingness consists in discarding from the analysis patients with missing covariates, further reducing the sample size. Alternatively, if the mechanism that generated the missing allows, incomplete data can be imputed on the basis of the observed data, avoiding the reduction of the sample size and allowing methods to deal with complete data later on. Moreover, methodologies for data imputation might depend on the particular purpose and might achieve better results by considering specific characteristics of the domain. The problem of missing data treatment is studied in the context of survival tree analysis for the estimation of a prognostic patient stratification. Survival tree methods usually address this problem by using surrogate splits, that is, splitting rules that use other variables yielding similar results to the original ones. Instead, our methodology consists in modeling the dependencies among the clinical variables with a Bayesian network, which is then used to perform data imputation, thus allowing the survival tree to be applied on the completed dataset. The Bayesian network is directly learned from the incomplete data using a structural expectation-maximization (EM) procedure in which the maximization step is performed with an exact anytime method, so that the only source of approximation is due to the EM formulation itself. On both simulated and real data, our proposed methodology usually outperformed several existing methods for data imputation and the imputation so obtained improved the stratification estimated by the survival tree (especially with respect to using surrogate splits).

Published in Computational Statistics and Data Analysis 93, 373–387.

Independent products in infinite spaces

Authors: Enrique Miranda and Marco Zaffalon.
Year: 2015.

Abstract: Probabilistic independence, intended as the mutual irrelevance of given variables, can be solidly founded on a notion of self-consistency of an uncertainty model, in particular when probabilities go imprecise. There is nothing in this approach that prevents it from being adopted in very general setups, and yet it has mostly been detailed for variables taking finitely many values. In this mathematical study, we complement previous research by exploring the extent to which such an approach can be generalised. We focus in particular on the independent products of two variables. We characterise the main notions, including some of factorisation and productivity, in the general case where both spaces can be infinite and show that, however, there are situations—even in the case of precise probability—where no independent product exists. This is not the case as soon as at least one space is finite. We study in depth this case at the frontiers of good-behaviour detailing the relations among the most important notions; we show for instance that being an independent product is equivalent to a certain productivity condition. Then we step back to the general case: we give conditions for the existence of independent products and study ways to get around its inherent limitations.

Published in Journal of Mathematical Analysis and Application 425, 460–488.

Approximate credal network updating by linear programming with applications to decision making

Authors: Alessandro Antonucci, Cassio Polpo de Campos, David Huber and Marco Zaffalon.
Year: 2015.

Abstract: Credal nets are probabilistic graphical models which extend Bayesian nets to cope with sets of distributions. An algorithm for approximate credal network updating is presented. The problem in its general formulation is a multilinear optimization task, which can be linearized by an appropriate rule for fixing all the local models apart from those of a single variable. This simple idea can be iterated and quickly leads to accurate inferences. A transformation is also derived to reduce decision making in credal networks based on the maximality criterion to updating. The decision task is proved to have the same complexity of standard inference, being NPPP-complete for general credal nets and NP-complete for polytrees. Similar results are derived for the E-admissibility criterion. Numerical experiments confirm a good performance of the method.

Published in International Journal of Approximate Reasoning 58, 25–38.

Imprecise Dirichlet process with application to the hypothesis test on the probability that XY

Authors: Alessio Benavoli, Francesca Mangili, Fabrizio Ruggeri and Marco Zaffalon.
Year: 2015.

Abstract: The Dirichlet process (DP) is one of the most popular Bayesian nonparametric models. An open problem with the DP is how to choose its infinite-dimensional parameter (base measure) in case of lack of prior information. In this work we present the Imprecise DP (IDP)—a prior near-ignorance DP-based model that does not require any choice of this probability measure. It consists of a class of DPs obtained by letting the normalized base measure of the DP vary in the set of all probability measures. We discuss the tight connections of this approach with Bayesian robustness and in particular prior near-ignorance modeling via sets of probabilities. We use this model to perform a Bayesian hypothesis test on the probability P(XY). We study the theoretical properties of the IDP test (e.g., asymptotic consistency), and compare it with the frequentist Mann-Whitney-Wilcoxon rank test that is commonly employed as a test on P(XY). In particular we will show that our method is more robust, in the sense that it is able to isolate instances in which the aforementioned test is virtually guessing at random.

Published in Journal of Statistical Theory and Practice 9(3), 658–684.

Prior near ignorance for inferences in the k-parameter exponential family

Authors: Alessio Benavoli and Marco Zaffalon.
Year: 2015.

Abstract: This paper proposes a model of prior ignorance about a multivariate variable based on a set of distributions M. In particular, we discuss four minimal properties that a model of prior ignorance should satisfy: invariance, near-ignorance, learning and convergence. Near-ignorance and invariance ensure that our prior model behaves as a vacuous model with respect to some statistical inferences (e.g., mean, credible intervals, etc.) and some transformation of the parameter space. Learning and convergence ensure that our prior model can learn from data and, in particular, that the influence of M on the posterior inferences vanishes with increasing numbers of observations. We show that these four properties can all be satisfied by a set of conjugate priors in the multivariate exponential families if the set M includes finitely additive probabilities obtained as limits of truncated exponential functions. The obtained set M is a model of prior ignorance with respect to the functions (queries) that are commonly used for statistical inferences and, because of conjugacy, it is tractable and easy to elicit. Applications of the model to some practical statistical problems show the effectiveness of the approach.

Published in Statistics 49(5), 1104–1140.

On the problem of computing the conglomerable natural extension

Authors: Enrique Miranda and Marco Zaffalon.
Year: 2015.

Abstract: Embedding conglomerability as a rationality requirement in probability was among the aims of Walley's behavioural theory of coherent lower previsions. However, recent work has shown that this attempt has only been partly successful. If we focus in particular on the extension of given assessments to a rational and conglomerable model (in the least-committal way), we have that the procedure used in Walley's theory, the natural extension, provides only an approximation to the model that is actually sought for: the so-called conglomerable natural extension. In this paper we consider probabilistic assessments in the form of a coherent lower prevision P, which is another name for a lower expectation functional, and make an in-depth mathematical study of the problem of computing the conglomerable natural extension for this case: that is, where it is defined as the smallest coherent lower prevision FP that is conglomerable, in case it exists. Past work has shown that F can be approximated by an increasing sequence (En)n∈ℕ of coherent lower previsions. We solve an open problem by showing that this sequence can consist of infinitely many distinct elements. Moreover, we give sufficient conditions, of quite broad applicability, to make sure that the point-wise limit of the sequence is F in case P is the lower envelope of finitely many linear previsions. In addition, we study the question of the existence of F and its relationship with the notion of marginal extension.

Published in International Journal of Approximate Reasoning 56(A), 1–27.

Extended tree augmented naive classifier

Authors: Cassio P. de Campos, Marco Cuccu, Giorgio Corani and Marco Zaffalon.
Year: 2014.

Abstract: This work proposes an extended version of the well-known tree-augmented naive Bayes (TAN) classifier where the structure learning step is performed without requiring features to be connected to the class. Based on a modification of Edmonds' algorithm, our structure learning procedure explores a superset of the structures that are considered by TAN, yet achieves global optimality of the learning score function in a very efficient way (quadratic in the number of features, the same complexity as learning TANs). A range of experiments show that we obtain models with better accuracy than TAN and comparable to the accuracy of the state-of-the-art classifier averaged one-dependence estimator.

Published in van der Gaag, L., Feelders, A. (Eds), PGM 2014. Lecture Notes in Computer Science 8754, Springer, 176–189.

A revised and updated version of this conference paper is Learning extended tree augmented naive structures.

Min-BDeu and max-BDeu scores for learning Bayesian networks

Authors: Mauro Scanagatta, Cassio P. de Campos and Marco Zaffalon.
Year: 2014.

Abstract: This work presents two new score functions based on the Bayesian Dirichlet equivalent uniform (BDeu) score for learning Bayesian network structures. They consider the sensitivity of BDeu to varying parameters of the Dirichlet prior. The scores take on the most adversary and the most beneficial priors among those within a contamination set around the symmetric one. We build these scores in such way that they are decomposable and can be computed efficiently. Because of that, they can be integrated into any state-of-the-art structure learning method that explores the space of directed acyclic graphs and allows decomposable scores. Empirical results suggest that our scores outperform the standard BDeu score in terms of the likelihood of unseen data and in terms of edge discovery with respect to the true network, at least when the training sample size is small. We discuss the relation between these new scores and the accuracy of inferred models. Moreover, our new criteria can be used to identify the amount of data after which learning is saturated, that is, additional data are of little help to improve the resulting model.

Published in van der Gaag, L., Feelders, A. (Eds), PGM 2014. Lecture Notes in Computer Science 8754, Springer, 426–441.

A Bayesian Wilcoxon signed-rank test based on the Dirichlet process

Authors: Alessio Benavoli, Francesca Mangili, Giorgio Corani, Marco Zaffalon and Fabrizio Ruggeri.
Year: 2014.

Abstract: Bayesian methods are ubiquitous in machine learning. Nevertheless, the analysis of empirical results is typically performed by frequentist tests. This implies dealing with null hypothesis significance tests and p-values, even though the shortcomings of such methods are well known. We propose a nonparametric Bayesian version of the Wilcoxon signed-rank test using a Dirichlet process (DP) based prior. We address in two different ways the problem of how to choose the infinite dimensional parameter that characterizes the DP. The proposed test has all the traditional strengths of the Bayesian approach; for instance, unlike the frequentist tests, it allows verifying the null hypothesis, not only rejecting it, and taking decisions which minimize the expected loss. Moreover, one of the solutions proposed to model the infinite-dimensional parameter of the DP allows isolating instances in which the traditional frequentist test is guessing at random. We show results dealing with the comparison of two classifiers using real and simulated data.

Published in ICML 2014. PMLR 32, JMLR.org, 1026–1034.

Comments on "Imprecise probability models for learning multinomial distributions from data. Applications to learning credal networks" by Andrés R. Masegosa and Serafín Moral

Authors: Marco Zaffalon and Giorgio Corani.
Year: 2014.

Abstract: We briefly overview the problem of learning probabilities from data using imprecise probability models that express very weak prior beliefs. Then we comment on the new contributions to this question given in the paper by Masegosa and Moral and provide some insights about the performance of their models in data mining experiments of classification.

Published in International Journal of Approximate Reasoning 55(7), 1579–1600.

Probabilistic graphical models

Authors: Alessandro Antonucci, Cassio P. de Campos and Marco Zaffalon.
Year: 2014.

Abstract: This report presents probabilistic graphical models that are based on imprecise probabilities using a simplified language. In particular, the discussion is focused on credal networks and discrete domains. It describes the building blocks of credal networks, algorithms to perform inference, and discusses on complexity results and related work. The goal is to have an easy-to-follow introduction to the topic.

Published in Augustin, T., Coolen, F., de Cooman, G., Troffaes, M. (Eds), Introduction to Imprecise Probabilities, Wiley, chapter 9.

Classification

Authors: Giorgio Corani, Joaquín Abellán, Andrés Masegosa, Serafín Moral and Marco Zaffalon.
Year: 2014.

Abstract: This report presents an introduction to credal classification. The discussion focuses on the naive credal classifier and its extensions as well as on credal classifiers based on classification trees. In addition we discuss the metrics suitable for scoring credal classifiers and present some experiments. The goal is to have an easy-to-follow introduction to the topic.

Published in Augustin, T., Coolen, F., de Cooman, G., Troffaes, M. (Eds), Introduction to Imprecise Probabilities, Wiley, chapter 10.

On the complexity of solving polytree-shaped limited memory influence diagrams with binary variables

Authors: Denis D. Mauá, Cassio P. de Campos and Marco Zaffalon.
Year: 2013.

Abstract: Influence diagrams are intuitive and concise representations of structured decision problems. When the problem is non-Markovian, an optimal strategy can be exponentially large in the size of the diagram. We can avoid the inherent intractability by constraining the size of admissible strategies, giving rise to limited memory influence diagrams. A valuable question is then how small do strategies need to be to enable efficient optimal planning. Arguably, the smallest strategies one can conceive simply prescribe an action for each time step, without considering past decisions or observations. Previous work has shown that finding such optimal strategies even for polytree-shaped diagrams with ternary variables and a single value node is NP-hard, but the case of binary variables was left open. In this paper we address such a case, by first noting that optimal strategies can be obtained in polynomial time for polytree-shaped diagrams with binary variables and a single value node. We then show that the same problem is NP-hard if the diagram has multiple value nodes. These two results close the fixed-parameter complexity analysis of optimal strategy selection in influence diagrams parametrized by the shape of the diagram, the number of value nodes and the maximum variable cardinality.

Published in Artificial Intelligence 205, 30–38.

Discovering subgroups of patients from DNA copy number data using NMF on compacted matrices

Authors: Cassio Polpo de Campos, Paola Maria Vittoria Rancoita, Ivo Kwee, Emanuele Zucca, Marco Zaffalon and Francesco Bertoni.
Year: 2013.

Abstract: In the study of complex genetic diseases, the identification of subgroups of patients sharing similar genetic characteristics represents a challenging task, for example, to improve treatment decision. One type of genetic lesion, frequently investigated in such disorders, is the change of the DNA copy number (CN) at specific genomic traits. Non-negative Matrix Factorization (NMF) is a standard technique to reduce the dimensionality of a data set and to cluster data samples, while keeping its most relevant information in meaningful components. Thus, it can be used to discover subgroups of patients from CN profiles. It is however computationally impractical for very high dimensional data, such as CN microarray data. Deciding the most suitable number of subgroups is also a challenging problem. The aim of this work is to derive a procedure to compact high dimensional data, in order to improve NMF applicability without compromising the quality of the clustering. This is particularly important for analyzing high-resolution microarray data. Many commonly used quality measures, as well as our own measures, are employed to decide the number of subgroups and to assess the quality of the results. Our measures are based on the idea of identifying robust subgroups, inspired by biologically/clinically relevance instead of simply aiming at well-separated clusters. We evaluate our procedure using four real independent data sets. In these data sets, our method was able to find accurate subgroups with individual molecular and clinical features and outperformed the standard NMF in terms of accuracy in the factorization fitness function. Hence, it can be useful for the discovery of subgroups of patients with similar CN profiles in the study of heterogeneous diseases.

Published in PLoS ONE 8(11), e79720.

CREDO: a military decision-support system based on credal networks

Authors: Alessandro Antonucci, David Huber, Marco Zaffalon, Philippe Luginbühl, Ian Chapman and Richard Ladouceur.
Year: 2013.

Abstract: A software tool especially designed for military domains to create and query decision-support systems is presented. Credal networks, which are Bayesian networks whose parameters have the freedom to vary in convex sets, are used to model the relations among the system variables. A novel elicitation procedure of these sets, which allows the military experts to report their knowledge by purely qualitative judgements, is proposed. Two high-level fusion procedures to cope with multiple experts in this framework are also derived. All these features are supported by the software and demonstrated in an application to space security tested during the last NATO multinational experiment.

Published in FUSION 2013.

Computing the conglomerable natural extension

Authors: Enrique Miranda and Marco Zaffalon.
Year: 2013.

Abstract: Given a coherent lower prevision P, we consider the problem of computing the smallest coherent lower prevision FP that is conglomerable, in case it exists. F is called the conglomerable natural extension. Past work has showed that F can be approximated by an increasing sequence (En)n∈ℕ of coherent lower previsions. We close an open problem by showing that this sequence can be made of infinitely many distinct elements. Moreover, we give sufficient conditions, of quite broad applicability, to make sure that the point-wise limit of the sequence is F in case P is the lower envelope of finitely many linear previsions. In addition, we study the question of the existence of F and its relationship with the notion of marginal extension.

Published in Cozman, F., Denoeux, T., Destercke, S., Seidenfeld, T. (Eds), ISIPTA '13. SIPTA, 255–264.

A revised and updated version of this conference paper is On the problem of computing the conglomerable natural extension.

Conglomerable coherence

Authors: Enrique Miranda and Marco Zaffalon.
Year: 2013.

Abstract: We contrast Williams' and Walley's theories of coherent lower previsions in the light of conglomerability. These are two of the most credited approaches to a behavioural theory of imprecise probability. Conglomerability is the notion that distinguishes them the most: Williams' theory does not consider it, while Walley aims at embedding it in his theory. This question is important, as conglomerability is a major point of disagreement at the foundations of probability, since it was first defined by de Finetti in 1930. We show that Walley's notion of joint coherence (which is the single axiom of his theory) for conditional lower previsions does not take all the implications of conglomerability into account. Considered also some previous results in the literature, we deduce that Williams' theory should be the one to use when conglomerability is not required; for the opposite case, we define the new theory of conglomerably coherent lower previsions, which is arguably the one to use, and of which Walley's theory can be understood as an approximation. We show that this approximation is exact in two important cases: when all conditioning events have positive lower probability, and when conditioning partitions are nested.

Published in International Journal of Approximate Reasoning 54(9), 1322–1350.

Approximating credal network inferences by linear programming

Authors: Alessandro Antonucci, Cassio Polpo de Campos, David Huber and Marco Zaffalon.
Year: 2013.

Abstract: An algorithm for approximate credal network updating is presented. The problem in its general formulation is a multilinear optimization task, which can be linearized by an appropriate rule for fixing all the local models apart from those of a single variable. This simple idea can be iterated and quickly leads to very accurate inferences. The approach can also be specialized to classification with credal networks based on the maximality criterion. A complexity analysis for both the problem and the algorithm is reported together with numerical experiments, which confirm the good performance of the method. While the inner approximation produced by the algorithm gives rise to a classifier which might return a subset of the optimal class set, preliminary empirical results suggest that the accuracy of the optimal class set is seldom affected by the approximate probabilities.

Published in van der Gaag, L. C. (Ed.), ECSQARU 2013. Lecture Notes in Computer Science 7958, Springer, 13–25. Best paper award.

A revised and updated version of this conference paper is Approximate credal network updating by linear programming with applications to decision making.

Probability and time

Authors: Marco Zaffalon and Enrique Miranda.
Year: 2013.

Abstract: Probabilistic reasoning is often attributed a temporal meaning, in which conditioning is regarded as a normative rule to compute future beliefs out of current beliefs and observations. However, the well-established 'updating interpretation' of conditioning is not concerned with beliefs that evolve in time, and in particular with future beliefs. On the other hand, a temporal justification of conditioning was proposed already by De Moivre and Bayes, by requiring that current and future beliefs be consistent. We reconsider the latter approach while dealing with a generalised version of the problem, using a behavioural theory of imprecise probability in the form of coherent lower previsions as well as of coherent sets of desirable gambles, and letting the possibility space be finite or infinite. We obtain that using conditioning is normative, in the imprecise case, only if one establishes future behavioural commitments at the same time of current beliefs. In this case it is also normative that present beliefs be conglomerable, which is a result that touches on a long-term controversy at the foundations of probability. In the remaining case, where one commits to some future behaviour after establishing present beliefs, we characterise the several possibilities to define consistent future assessments; this shows in particular that temporal consistency does not preclude changes of mind. And yet, our analysis does not support that rationality requires consistency in general, even though pursuing consistency makes sense and is useful, at least as a way to guide and evaluate the assessment process. These considerations narrow down in the special case of precise probability, because this formalism cannot distinguish the two different situations illustrated above: it turns out that the only consistent rule is conditioning and moreover that it is not rational to be willing to stick to precise probability while using a rule different from conditioning to compute future beliefs; rationality requires in addition the disintegrability of the present-time probability.

Published in Artificial Intelligence 198, 1–51.

Density-ratio robustness in dynamic state estimation

Authors: Alessio Benavoli and Marco Zaffalon.
Year: 2013.

Abstract: The filtering problem is addressed by taking into account imprecision in the knowledge about the probabilistic relationships involved. Imprecision is modelled in this paper by a particular closed convex set of probabilities that is known with the name of density ratio class or constant odds-ratio (COR) model. The contributions of this paper are the following. First, we shall define an optimality criterion based on the squared-loss function for the estimates derived from a general closed convex set of distributions. Second, after revising the properties of the density ratio class in the context of parametric estimation, we shall extend these properties to state estimation accounting for system dynamics. Furthermore, for the case in which the nominal density of the COR model is a multivariate Gaussian, we shall derive closed-form solutions for the set of optimal estimates and for the credible region. Third, we discuss how to perform Monte Carlo integrations to compute lower and upper expectations from a COR set of densities. Then we shall derive a procedure that, employing Monte Carlo sampling techniques, allows us to propagate in time both the lower and upper state expectation functionals and, thus, to derive an efficient solution of the filtering problem. Finally, we empirically compare the proposed estimator with the Kalman filter. This shows that our solution is more robust to the presence of modelling errors in the system and that, hence, appears to be a more realistic approach than the Kalman filter in such a case.

Published in Mechanical Systems and Signal Processing 37, 54–75.

The complexity of approximately solving influence diagrams

Authors: Denis D. Mauá, Cassio P. de Campos and Marco Zaffalon.
Year: 2012.

Abstract: Influence diagrams allow for intuitive and yet precise description of complex situations involving decision making under uncertainty. Unfortunately, most of the problems described by influence diagrams are hard to solve. In this paper we discuss the complexity of approximately solving influence diagrams. We do not assume no-forgetting or regularity, which makes the class of problems we address very broad. Remarkably, we show that when both the treewidth and the cardinality of the variables are bounded the problem admits a fully polynomial-time approximation scheme.

Published in de Freitas, N., Murphy, K. P. (Eds), UAI 2012, 604–613.

Updating credal networks is approximable in polynomial time

Authors: Denis D. Mauá, Cassio P. de Campos and Marco Zaffalon.
Year: 2012.

Abstract: Credal networks relax the precise probability requirement of Bayesian networks, enabling a richer representation of uncertainty in the form of closed convex sets of probability measures. The increase in expressiveness comes at the expense of higher computational costs. In this paper, we present a new variable elimination algorithm for exactly computing posterior inferences in extensively specified credal networks, which is empirically shown to outperform a state-of-the-art algorithm. The algorithm is then turned into a provably good approximation scheme, that is, a procedure that for any input is guaranteed to return a solution not worse than the optimum by a given factor. Remarkably, we show that when the networks have bounded treewidth and bounded number of states per variable the approximation algorithm runs in time polynomial in the input size and in the inverse of the error factor, thus being the first known fully polynomial-time approximation scheme for inference in credal networks.

Published in International Journal of Approximate Reasoning 53(8), 1183–1199.

Conglomerable natural extension

Authors: Enrique Miranda, Marco Zaffalon and Gert de Cooman.
Year: 2012.

Abstract: At the foundations of probability theory lies a question that has been open since de Finetti framed it in 1930: whether or not an uncertainty model should be required to be conglomerable. Conglomerability is related to accepting infinitely many conditional bets. Walley is one of the authors who have argued in favor of conglomerability, while de Finetti rejected the idea. In this paper we study the extension of the conglomerability condition to two types of uncertainty models that are more general than the ones envisaged by de Finetti: sets of desirable gambles and coherent lower previsions. We focus in particular on the weakest (i.e., the least-committal) of those extensions, which we call the conglomerable natural extension. The weakest extension that does not take conglomerability into account is simply called the natural extension. We show that taking the natural extension of assessments after imposing conglomerability—the procedure adopted in Walley's theory—does not yield, in general, the conglomerable natural extension (but it does so in the case of the marginal extension). Iterating this process of imposing conglomerability and taking the natural extension produces a sequence of models that approach the conglomerable natural extension, although it is not known, at this point, whether this sequence converges to it. We give sufficient conditions for this to happen in some special cases, and study the differences between working with coherent sets of desirable gambles and coherent lower previsions. Our results indicate that it is necessary to rethink the foundations of Walley's theory of coherent lower previsions for infinite partitions of conditioning events.

Published in International Journal of Approximate Reasoning 53(8), 1200–1227.

Evaluating credal classifiers by utility-discounted predictive accuracy

Authors: Marco Zaffalon, Giorgio Corani and Denis D. Mauá.
Year: 2012.

Abstract: Predictions made by imprecise-probability models are often indeterminate (that is, set-valued). Measuring the quality of an indeterminate prediction by a single number is important to fairly compare different models, but a principled approach to this problem is currently missing. In this paper we derive, from a set of assumptions, a metric to evaluate the predictions of credal classifiers. These are supervised learning models that issue set-valued predictions. The metric turns out to be made of an objective component, and another that is related to the decision-maker's degree of risk aversion to the variability of predictions. We discuss when the measure can be rendered independent of such a degree, and provide insights as to how the comparison of classifiers based on the new measure changes with the number of predictions to be made. Finally, we make extensive empirical tests of credal, as well as precise, classifiers by using the new metric. This shows the practical usefulness of the metric, while yielding a first insightful and extensive comparison of credal classifiers.

Published in International Journal of Approximate Reasoning 53(8), 1282–1301.

Conglomerable coherent lower previsions

Authors: Enrique Miranda and Marco Zaffalon.
Year: 2013.

Abstract: Walley's theory of coherent lower previsions builds upon the former theory by Williams with the explicit aim to make it deal with conglomerability. We show that such a construction has been only partly successful because Walley's founding axiom of joint coherence does not entirely capture the implications of conglomerability. As a way to fully achieve Walley's original aim, we propose then the new theory of conglomerable coherent lower previsions. We show that Walley's theory coincides with ours when all conditioning events have positive lower probability, or when conditioning partitions are nested.

Published in Kruse, R., Berthold, M. R., Moewes, C., Gil, M. A., Grzegorzewski, P., Hryniewicz, O. (Eds), SMPS 2012. Synergies of Soft Computing and Statistics for Intelligent Data Analysis, Advances in Intelligent and Soft Computing 190, Springer Berlin Heidelberg, 419–427.

A revised and updated version of this conference paper is Conglomerable coherence.

Solving limited memory influence diagrams

Authors: Denis D. Mauá, Cassio P. de Campos and Marco Zaffalon.
Year: 2012.

Abstract: We present a new algorithm for exactly solving decision making problems represented as influence diagrams. We do not require the usual assumptions of no forgetting and regularity; this allows us to solve problems with simultaneous decisions and limited information. The algorithm is empirically shown to outperform a state-of-the-art algorithm on randomly generated problems of up to 150 variables and 1064 solutions. We show that these problems are NP-hard even if the underlying graph structure of the problem has low treewidth and the variables take on a bounded number of states, and that they admit no provably good approximation if variables can take on an arbitrary number of states.

Published in Journal of Artificial Intelligence Research 44, 97–140.

A model of prior ignorance for inferences in the one-parameter exponential family

Authors: Alessio Benavoli and Marco Zaffalon.
Year: 2012.

Abstract: This paper proposes a model of prior ignorance about a scalar variable based on a set of distributions M. In particular, a set of minimal properties that a set M of distributions should satisfy to be a model of prior ignorance without producing vacuous inferences is defined. In the case the likelihood model corresponds to a one-parameter exponential family of distributions, it is shown that the above minimal properties are equivalent to a special choice of the domains for the parameters of the conjugate exponential prior. This makes it possible to define the largest (that is, the least-committal) set of conjugate priors M that satisfies the above properties. The obtained set M is a model of prior ignorance with respect to the functions (queries) that are commonly used for statistical inferences; it is easy to elicit and, because of conjugacy, tractable; it encompasses frequentist and the so-called objective Bayesian inferences with improper priors. An application of the model to a problem of inference with count data is presented.

Published in Journal of Statistical Planning and Inference 142(7), 1960–1979.

Bayesian networks with imprecise probabilities: theory and application to classification

Authors: Giorgio Corani, Alessandro Antonucci and Marco Zaffalon.
Year: 2012.

Abstract: Bayesian network are powerful probabilistic graphical models for modelling uncertainty. Among others, classification represents an important application: some of the most used classifiers are based on Bayesian networks. Bayesian networks are precise models: exact numeric values should be provided for quantification. This requirement is sometimes too narrow. Sets instead of single distributions can provide a more realistic description in these cases. Bayesian networks can be generalized to cope with sets of distributions. This leads to a novel class of imprecise probabilistic graphical models, called credal networks. In particular, classifiers based on Bayesian networks are generalized to so-called credal classifiers. Unlike Bayesian classifiers, which always detect a single class as the one maximizing the posterior class probability, a credal classifier may eventually be unable to discriminate a single class. In other words, if the available information is not sufficient, credal classifiers allow for indecision between two or more classes, this providing a less informative but more robust conclusion than Bayesian classifiers.

Published in Holmes, D. E., Jain, L. C. (Eds), Data Mining: Foundations and Intelligent Paradigms (Volume 1: Clustering, Association and Classification). Springer-Verlag, Berlin, 49–93.

A discussion on learning and prior ignorance for sets of priors in the one-parameter exponential family

Authors: Alessio Benavoli and Marco Zaffalon.
Year: 2011.

Abstract: For a conjugate likelihood-prior model in the one-parameter exponential family of distributions, we show that, by letting the parameters of the conjugate exponential prior vary in suitable sets, it is possible to define a set of conjugate priors M that guarantees prior near-ignorance without producing vacuous inferences. This result is obtained following both a behavioural and a sensitivity analysis interpretation of prior near-ignorance. We also discuss the problem of the incompatibility of learning and prior near-ignorance for sets of priors in the one-parameter exponential family of distributions in the case of imperfect observations. In particular, we prove that learning and prior near-ignorance are compatible under an imperfect observation mechanism if and only if the support of the priors in M is the whole real axis.

Published in Coolen, F., de Cooman, G., Fetz, T., Oberguggenberger, M. (Eds), ISIPTA '11. SIPTA, 69–78.

A revised and updated version of this conference paper is A model of prior ignorance for inferences in the one-parameter exponential family.

A fully polynomial time approximation scheme for updating credal networks of bounded treewidth and number of variable states

Authors: Denis D. Mauá, Cassio P. de Campos and Marco Zaffalon.
Year: 2011.

Abstract: Credal networks lift the precise probability assumption of Bayesian networks, enabling a richer representation of uncertainty in the form of closed convex sets of probability measures. The increase in expressiveness comes at the expense of higher computational costs. In this paper we present a new algorithm which is an extension of the well-known variable elimination algorithm for computing posterior inferences in extensively specified credal networks. The algorithm efficiency is empirically shown to outperform a state-of-the-art algorithm. We then provide the first fully polynomial time approximation scheme for inference in credal networks with bounded treewidth and number of states per variable.

Published in Coolen, F., de Cooman, G., Fetz, T., Oberguggenberger, M. (Eds), ISIPTA '11. SIPTA, 277–286.

A revised and updated version of this conference paper is Updating credal networks is approximable in polynomial time.

Conglomerable natural extension

Authors: Enrique Miranda, Marco Zaffalon and Gert de Cooman.
Year: 2011.

Abstract: We study the weakest conglomerable model that is implied by desirability or probability assessments: the conglomerable natural extension. We show that taking the natural extension of the assessments while imposing conglomerability—the procedure adopted in Walley's theory—does not yield, in general, the conglomerable natural extension (but it does so in the case of the marginal extension). Iterating this process produces a sequence of models that approach the conglomerable natural extension, although it is not known, at this point, whether it is attained in the limit. We give sufficient conditions for this to happen in some special cases, and study the differences between working with coherent sets of desirable gambles and coherent lower previsions. Our results indicate that it might be necessary to re-think the foundations of Walley's theory of coherent conditional lower previsions for infinite partitions of conditioning events.

Published in Coolen, F., de Cooman, G., Fetz, T., Oberguggenberger, M. (Eds), ISIPTA '11. SIPTA, 287–296.

A revised and updated version of this conference paper is Conglomerable natural extension.

Utility-based accuracy measures to empirically evaluate credal classifiers

Authors: Marco Zaffalon, Giorgio Corani and Denis D. Mauá.
Year: 2011.

Abstract: Predictions made by imprecise-probability models are often indeterminate (that is, set-valued). Measuring the quality of an indeterminate prediction by a single number is important to fairly compare different models, but a principled approach to this problem is currently missing. In this paper we derive a measure to evaluate the predictions of credal classifiers from a set of assumptions. The measure turns out to be made of an objective component, and another that is related to the decision-maker's degree of risk-aversion. We discuss when the measure can be rendered independent of such a degree, and provide insights as to how the comparison of classifiers based on the new measure change with the number of predictions to be made. Finally, we empirically study the behavior of the proposed measure.

Published in Coolen, F., de Cooman, G., Fetz, T., Oberguggenberger, M. (Eds), ISIPTA '11. SIPTA, 401–410.

A revised and updated version of this conference paper is Evaluating credal classifiers by utility-discounted predictive accuracy.

Independent natural extension

Authors: Gert de Cooman, Enrique Miranda and Marco Zaffalon.
Year: 2011.

Abstract: There is no unique extension of the standard notion of probabilistic independence to the case where probabilities are indeterminate or imprecisely specified. Epistemic independence is an extension that formalises the intuitive idea of mutual irrelevance between different sources of information. This gives epistemic independence very wide scope as well as appeal: this interpretation of independence is often taken as natural also in precise-probabilistic contexts. Nevertheless, epistemic independence has received little attention so far. This paper develops the foundations of this notion for variables assuming values in finite spaces. We define (epistemically) independent products of marginals (or possibly conditionals) and show that there always is a unique least-committal such independent product, which we call the independent natural extension. We supply an explicit formula for it, and study some of its properties, such as associativity, marginalisation and external additivity, which are basic tools to work with the independent natural extension. Additionally, we consider a number of ways in which the standard factorisation formula for independence can be generalised to an imprecise-probabilistic context. We show, under some mild conditions, that when the focus is on least-committal models, using the independent natural extension is equivalent to imposing a so-called strong factorisation property. This is an important outcome for applications as it gives a simple tool to make sure that inferences are consistent with epistemic independence judgements. We discuss the potential of our results for applications in Artificial Intelligence by recalling recent work by some of us, where the independent natural extension was applied to graphical models. It has allowed, for the first time, the development of an exact linear-time algorithm for the imprecise probability updating of credal trees.

Published in Artificial Intelligence 175, 1911–1950.

Robust filtering through coherent lower previsions

Authors: Alessio Benavoli, Marco Zaffalon and Enrique Miranda.
Year: 2011.

Abstract: The classical filtering problem is re-examined to take into account imprecision in the knowledge about the probabilistic relationships involved. Imprecision is modeled in this paper by closed convex sets of probabilities. We derive a solution of the state estimation problem under such a framework that is very general: it can deal with any closed convex set of probability distributions used to characterize uncertainty in the prior, likelihood, and state transition models. This is made possible by formulating the theory directly in terms of coherent lower previsions, that is, of the lower envelopes of the expectations obtained from the set of distributions. The general solution is specialized to two particular classes of coherent lower previsions. The first consists of a family of Gaussian distributions whose means are only known to belong to an interval. The second is the so-called linear-vacuous mixture model, which is a family made of convex combinations of a known nominal distribution (e.g., a Gaussian) with arbitrary distributions. For the latter case, we empirically compare the proposed estimator with the Kalman filter. This shows that our solution is more robust to the presence of modelling errors in the system and that, hence, appears to be a more realistic approach than the Kalman filter in such a case.

Published in IEEE Transactions on Automatic Control 56(7), 1567–1581.

Notes on desirability and conditional lower previsions

Authors: Enrique Miranda and Marco Zaffalon.
Year: 2010.

Abstract: We detail the relationship between sets of desirable gambles and conditional lower previsions. The former is one the most general models of uncertainty. The latter corresponds to Walley's celebrated theory of imprecise probability. We consider two avenues: when a collection of conditional lower previsions is derived from a set of desirable gambles, and its converse. In either case, we relate the properties of the derived model with those of the originating one. Our results constitute basic tools to move from one formalism to the other, and thus to take advantage of work done in the two fronts.

Published in Annals of Mathematics and Artificial Intelligence 60(3–4), 251–309.

Epistemic irrelevance in credal nets: the case of imprecise Markov trees

Authors: Gert de Cooman, Filip Hermans, Alessandro Antonucci and Marco Zaffalon.
Year: 2010.

Abstract: We focus on credal nets, which are graphical models that generalise Bayesian nets to imprecise probability. We replace the notion of strong independence commonly used in credal nets with the weaker notion of epistemic irrelevance, which is arguably more suited for a behavioural theory of probability. Focusing on directed trees, we show how to combine the given local uncertainty models in the nodes of the graph into a global model, and we use this to construct and justify an exact message-passing algorithm that computes updated beliefs for a variable in the tree. The algorithm, which is linear in the number of nodes, is formulated entirely in terms of coherent lower previsions, and is shown to satisfy a number of rationality requirements. We supply examples of the algorithm's operation, and report an application to on-line character recognition that illustrates the advantages of our approach for prediction. We comment on the perspectives, opened by the availability, for the first time, of a truly efficient algorithm based on epistemic irrelevance.

Published in International Journal of Approximate Reasoning 51(9), 1029–1052.

Inference and risk measurement with the pari-mutuel model

Authors: Renato Pelessoni, Paolo Vicig and Marco Zaffalon.
Year: 2010.

Abstract: We explore generalizations of the pari-mutuel model (PMM), a formalization of an intuitive way of assessing an upper probability from a precise one. We discuss a naive extension of the PMM considered in insurance, compare the PMM with a related model, the Total Variation Model, and generalize the natural extension of the PMM introduced by P. Walley and other pertained formulae. The results are subsequently given a risk measurement interpretation: in particular it is shown that a known risk measure, Tail Value at Risk (TVaR), is derived from the PMM, and a coherent risk measure more general than TVaR from its imprecise version. We analyze further the conditions for coherence of a related risk measure, Conditional Tail Expectation. Conditioning with the PMM is investigated too, computing its natural extension, characterising its dilation and studying the weaker concept of imprecision increase.

Published in International Journal of Approximate Reasoning 51(9), 1145–1158.

Factorisation properties of the strong product

Authors: Gert de Cooman, Enrique Miranda and Marco Zaffalon.
Year: 2010.

Abstract: We investigate a number of factorisation conditions in the framework of sets of probability measures, or coherent lower previsions, with finite referential spaces. We show that the so-called strong product constitutes one way to combine a number of marginal coherent lower previsions into an independent joint lower prevision, and we prove that under some conditions it is the only independent product that satisfies the factorisation conditions.

Published in In Borgelt, C., González Rodrìguez, G., Trutschnig, W., Asunción Lubiano, M., Gil, M. A., Grzegorzewski, P., Hryniewicz, O. (Eds), SMPS 2010, Combining Soft Computing and Statistical Methods in Data Analysis, Advances in Intelligent and Soft Computing 77, 139–147.

A revised and updated version of this conference paper is Independent natural extension.

Independent natural extension

Authors: Gert de Cooman, Enrique Miranda and Marco Zaffalon.
Year: 2010.

Abstract: We introduce a general definition for the independence of a number of finite-valued variables, based on coherent lower previsions. Our definition has an epistemic flavour: it arises from personal judgements that a number of variables are irrelevant to one another. We show that a number of already existing notions, such as strong independence, satisfy our definition. Moreover, there always is a least-committal independent model, for which we provide an explicit formula: the independent natural extension. Our central result is that the independent natural extension satisfies so-called marginalisation, associativity and strong factorisation properties. These allow us to relate our research to more traditional ways of defining independence based on factorisation.

Published in Hüllermeier, E., Kruse, R., Hoffmann, F. (Eds), IPMU 2010, Computational Intelligence for Knowledge-Based Systems Design. Lecture Notes in Computer Science 6178, Springer, 737–746.

A revised and updated version of this conference paper is Independent natural extension.

Conditional models: coherence and inference through sequences of joint mass functions

Authors: Enrique Miranda and Marco Zaffalon.
Year: 2010.

Abstract: We call a conditional model any set of statements made of conditional probabilities or expectations. We take conditional models as primitive compared to unconditional probability, in the sense that conditional statements do not need to be derived from an unconditional probability. We focus on two problems: (coherence) giving conditions to guarantee that a conditional model is self-consistent; (inference) delivering methods to derive new probabilistic statements from a self-consistent conditional model. We address these problems in the case where the probabilistic statements can be specified imprecisely through sets of probabilities, while restricting the attention to finite spaces of possibilities. Using Walley's theory of coherent lower previsions, we fully characterise the question of coherence, and specialise it for the case of precisely specified probabilities, which is the most common case addressed in the literature. This shows that coherent conditional models are equivalent to sequences of (possibly sets of) unconditional mass functions. In turn, this implies that the inferences from a conditional model are the limits of the conditional inferences obtained by applying Bayes' rule, when possible, to the elements of the sequence. In doing so, we unveil the tight connection between conditional models and zero-probability events.

Published in Journal of Statistical Planning and Inference 140(7), 1805–1833.

Generalized loopy 2U: a new algorithm for approximate inference in credal networks

Authors: Alessandro Antonucci, Yi Sun, Cassio Polpo de Campos and Marco Zaffalon.
Year: 2010.

Abstract: Credal networks generalize Bayesian networks by relaxing the requirement of precision of probabilities. Credal networks are considerably more expressive than Bayesian networks, but this makes belief updating NP-hard even on polytrees. We develop a new efficient algorithm for approximate belief updating in credal networks. The algorithm is based on an important representation result we prove for general credal networks: that any credal network can be equivalently reformulated as a credal network with binary variables; moreover, the transformation, which is considerably more complex than in the Bayesian case, can be implemented in polynomial time. The equivalent binary credal network is then updated by L2U, a loopy approximate algorithm for binary credal networks. Overall, we generalize L2U to non-binary credal networks, obtaining a scalable algorithm for the general case, which is approximate only because of its loopy nature. The accuracy of the inferences with respect to other state-of-the-art algorithms is evaluated by extensive numerical tests.

Published in International Journal of Approximate Reasoning 51(5), 474–484.

Building knowledge-based systems by credal networks: a tutorial

Authors: Alberto Piatti, Alessandro Antonucci and Marco Zaffalon.
Year: 2010.

Abstract: Knowledge-based systems are computer programs achieving expert-level competence in solving problems for specific task areas. This chapter is a tutorial on the implementation of this kind of systems in the framework of credal networks. Credal networks are a generalization of Bayesian networks where credal sets, i.e., closed convex sets of probability measures, are used instead of precise probabilities. This allows for a more flexible model of the knowledge, which can represent ambiguity, contrast and contradiction in a natural and realistic way. The discussion guides the reader through the different steps involved in the specification of a system, from the evocation and elicitation of the knowledge to the interaction with the system by adequate inference algorithms. Our approach is characterized by a sharp distinction between the domain knowledge and the process linking this knowledge to the perceived evidence, which we call the observational process. This distinction leads to a very flexible representation of both domain knowledge and knowledge about the way the information is collected, together with a technique to aggregate information coming from different sources. The overall procedure is illustrated throughout the chapter by a simple knowledge-based system for the prediction of the result of a football match.

Published in Baswell, A. R. (Ed), Advances in Mathematics Research 11, Nova Science Publishers, New York.

Lazy naive credal classifier

Authors: Giorgio Corani and Marco Zaffalon.
Year: 2009.

Abstract: We propose a local (or lazy) version of the naive credal classifier. The latter is an extension of naive Bayes to imprecise probability developed to issue reliable classifications despite small amounts of data, which may then be carrying highly uncertain information about a domain. Reliability is maintained because credal classifiers can issue set-valued classifications on instances that are particularly difficult to classify. We show by extensive experiments that the local classifier outperforms the original one, both in terms of accuracy of classification and because it leads to stronger conclusions (i.e., set-valued classifications made by fewer classes). By comparing the local credal classifier with a local version of naive Bayes, we also show that the former reliably deals with instances which are difficult to classify, unlike the local naive Bayes which leads to fragile classifications.

Published in U'09: Proceedings of the First ACM SIGKDD International Workshop on Knowledge Discovery from Uncertain Data. ACM, 30–37.

Multiple model tracking by imprecise Markov trees

Authors: Alessandro Antonucci, Alessio Benavoli, Marco Zaffalon, Gert de Cooman and Filip Hermans.
Year: 2009.

Abstract: We present a new procedure for tracking manoeuvring objects by hidden Markov chains. It leads to more reliable modelling of the transitions between hidden states compared to similar approaches proposed within the Bayesian framework: we adopt convex sets of probability mass functions rather than single 'precise probability' specifications, in order to provide a more realistic and cautious model of the manoeuvre dynamics. In general, the downside of such increased freedom in the modelling phase is a higher inferential complexity. However, the simple topology of hidden Markov chains allows for efficient tracking of the object through a recently developed belief propagation algorithm. Furthermore, the imprecise specification of the transitions can produce so-called indecision, meaning that more than one model may be suggested by our method as a possible explanation of the target kinematics. In summary, our approach leads to a multiple-model estimator whose performance, investigated through extensive numerical tests, turns out to be more accurate and robust than that of Bayesian ones.

Published in FUSION 2009. IEEE, 1767–1774.

Reliable hidden Markov model filtering through coherent lower previsions

Authors: Alessio Benavoli, Marco Zaffalon and Enrique Miranda.
Year: 2009.

Abstract: We extend Hidden Markov Models for continuous variables taking into account imprecision in our knowledge about the probabilistic relationships involved. To achieve that, we consider sets of probabilities, also called coherent lower previsions. In addition to the general formulation, we study in detail a particular case of interest: linear-vacuous mixtures. We also show, in a practical case, that our extension outperforms the Kalman filter when modelling errors are present in the system.

Published in FUSION 2009. IEEE, 1743–1750.

A revised and updated version of this conference paper is Robust filtering through coherent lower previsions.

Epistemic irrelevance in credal networks: the case of imprecise Markov trees

Authors: Gert de Cooman, Filip Hermans, Alessandro Antonucci and Marco Zaffalon.
Year: 2009.

Abstract: We replace strong independence in credal networks with the weaker notion of epistemic irrelevance. Focusing on directed trees, we show how to combine local credal sets into a global model, and we use this to construct and justify an exact message-passing algorithm that computes updated beliefs for a variable in the tree. The algorithm, which is essentially linear in the number of nodes, is formulated entirely in terms of coherent lower previsions. We supply examples of the algorithm's operation, and report an application to on-line character recognition that illustrates the advantages of our model for prediction.

Published in Augustin, T., Coolen, F., Moral, S., Troffaes, M. C. M. (Eds), ISIPTA '09. SIPTA, 149–158.

A revised and updated version of this conference paper is Epistemic irrelevance in credal nets: the case of imprecise Markov trees.

Natural extension as a limit of regular extensions

Authors: Enrique Miranda and Marco Zaffalon.
Year: 2009.

Abstract: This paper is devoted to the extension of conditional assessments that satisfy some consistency criteria, such as weak or strong coherence, to further domains. In particular, we characterise the natural extension of a number of conditional lower previsions on finite spaces, by showing that it can be calculated as the limit of a sequence of conditional lower previsions defined by regular extension. Our results are valid for conditional lower previsions with non-linear domains, and allow us to give an equivalent formulation of the notion of coherence in terms of credal sets.

Published in Augustin, T., Coolen, F., Moral, S., Troffaes, M. C. M. (Eds), ISIPTA '09. SIPTA, 327–336.

A revised and updated version of this conference paper is Conditional models: coherence and inference through sequences of joint mass functions.

The pari-mutuel model

Authors: Renato Pelessoni, Paolo Vicig and Marco Zaffalon.
Year: 2009.

Abstract: We explore generalizations of the pari-mutuel model (PMM), a formalization of an intuitive way of assessing an upper probability from a precise one. We discuss a naive extension of the PMM considered in insurance and generalize the natural extension of the PMM introduced by P. Walley and other related formulae. The results are subsequently given a risk measurement interpretation: in particular it is shown that a known risk measure, Tail Value at Risk (TVaR), is derived from the PMM, and a coherent risk measure more general than TVaR from its imprecise version. We analyze further the conditions for coherence of a related risk measure, Conditional Tail Expectation. Explicit formulae for conditioning the PMM and conditions for dilation or imprecision increase are also supplied and discussed.

Published in Augustin, T., Coolen, F., Moral, S., Troffaes, M. C. M. (Eds), ISIPTA '09. SIPTA, 347–356.

A revised and updated version of this conference paper is Inference and risk measurement with the pari-mutuel model.

Conservative inference rule for uncertain reasoning under incompleteness

Authors: Marco Zaffalon and Enrique Miranda.
Year: 2009.

Abstract: In this paper we formulate the problem of inference under incomplete information in very general terms. This includes modelling the process responsible for the incompleteness, which we call the incompleteness process. We allow the process' behaviour to be partly unknown. Then we use Walley's theory of coherent lower previsions, a generalisation of the Bayesian theory to imprecision, to derive the rule to update beliefs under incompleteness that logically follows from our assumptions, and that we call conservative inference rule. This rule has some remarkable properties: it is an abstract rule to update beliefs that can be applied in any situation or domain; it gives us the opportunity to be neither too optimistic nor too pessimistic about the incompleteness process, which is a necessary condition to draw reliable while strong enough conclusions; and it is a coherent rule, in the sense that it cannot lead to inconsistencies. We give examples to show how the new rule can be applied in expert systems, in parametric statistical inference, and in pattern classification, and discuss more generally the view of incompleteness processes defended here as well as some of its consequences.

Published in Journal of Artificial Intelligence Research 34, 757–821.

Coherence graphs

Authors: Enrique Miranda and Marco Zaffalon.
Year: 2009.

Abstract: We study the consistency of a number of probability distributions, which are allowed to be imprecise. To make the treatment as general as possible, we represent those probabilistic assessments as a collection of conditional lower previsions. The problem then becomes proving Walley's (strong) coherence of the assessments. In order to maintain generality in the analysis, we assume to be given nearly no information about the numbers that make up the lower previsions in the collection. Under this condition, we investigate the extent to which the above global task can be decomposed into simpler and more local ones. This is done by introducing a graphical representation of the conditional lower previsions that we call the coherence graph: we show that the coherence graph allows one to isolate some subsets of the collection whose coherence is sufficient for the coherence of all the assessments; and we provide a polynomial-time algorithm that finds the subsets efficiently. We show some of the implications of our results by focusing on three models and problems: Bayesian and credal networks, of which we prove coherence; the compatibility problem, for which we provide an optimal graphical decomposition; probabilistic satisfiability, of which we show that some intractable instances can instead be solved efficiently by exploiting coherence graphs.

Published in Artificial Intelligence 173, 104–144.

JNCC2, the implementation of naive credal classifier 2

Authors: Giorgio Corani and Marco Zaffalon.
Year: 2008.

Abstract: JNCC2 implements the naive credal classifier 2, which is an extension of naive Bayes to imprecise probabilities. JNCC2 permits to efficiently infer the classifier from data, to evaluate its performance and to make it predict the classes of new instances. Moreover, the software is designed to cross-compare the performance of the classifier with that of naive Bayes as a way to verify the robustness of naive credal classifier 2 to the challenges originated by small sample sizes and missing data. JNCC2 is released under the GNU GPL license.

Published in Journal of Machine Learning Research 9, 2695–2698.

The software that implements the naive credal classifier 2 is freely available at the JNCC2 page.

Credal networks for military identification problems

Authors: Alessandro Antonucci, Ralph Brühlmann, Alberto Piatti and Marco Zaffalon.
Year: 2009.

Abstract: Credal networks are imprecise probabilistic graphical models generalizing Bayesian networks to convex sets of probability mass functions. This makes credal networks particularly suited to model expert knowledge under very general conditions, including states of qualitative and incomplete knowledge. In this paper, we present a credal network for risk evaluation in case of intrusion of civil aircrafts into a restricted flight area. The different factors relevant for this evaluation, together with an independence structure over them, are initially identified. These factors are observed by sensors, whose reliabilities can be affected by variable external factors, and even by the behaviour of the intruder. A model of these observation processes, and the necessary fusion scheme for the information returned by the sensors measuring the same factor, are both completely embedded into the structure of the credal network. A pool of experts, facilitated in their task by specific techniques to convert qualitative judgements into imprecise probabilistic assessments, has made possible the quantification of the network. We show the capabilities of the proposed model by means of some preliminary tests referred to simulated scenarios. Overall, we can regard this application as a useful tool to support military experts in their decision, but also as a quite general imprecise-probability paradigm for information fusion.

Published in International Journal of Approximate Reasoning 50, 666–679.

Limits of learning about a categorical latent variable under prior near-ignorance

Authors: Alberto Piatti, Marco Zaffalon, Fabio Trojani and Marcus Hutter.
Year: 2009.

Abstract: In this paper, we consider the coherent theory of (epistemic) uncertainty of Walley, in which beliefs are represented through sets of probability distributions, and we focus on the problem of modeling prior ignorance about a categorical random variable. In this setting, it is a known result that a state of prior ignorance is not compatible with learning. To overcome this problem, another state of beliefs, called near-ignorance, has been proposed. Near-ignorance resembles ignorance very closely, by satisfying some principles that can arguably be regarded as necessary in a state of ignorance, and allows learning to take place. What this paper does, is to provide new and substantial evidence that also near-ignorance cannot be really regarded as a way out of the problem of starting statistical inference in conditions of very weak beliefs. The key to this result is focusing on a setting characterized by a variable of interest that is latent. We argue that such a setting is by far the most common case in practice, and we provide, for the case of categorical latent variables (and general manifest variables) a condition that, if satisfied, prevents learning to take place under prior near-ignorance. This condition is shown to be easily satisfied even in the most common statistical problems. We regard these results as a strong form of evidence against the possibility to adopt a condition of prior near-ignorance in real statistical problems.

Published in International Journal of Approximate Reasoning 50, 597–611.

Decision-theoretic specification of credal networks: a unified language for uncertain modeling with sets of Bayesian networks

Authors: Alessandro Antonucci and Marco Zaffalon.
Year: 2008.

Abstract: Credal networks are models that extend Bayesian nets to deal with imprecision in probability, and can actually be regarded as sets of Bayesian nets. Credal nets appear to be powerful means to represent and deal with many important and challenging problems in uncertain reasoning. We give examples to show that some of these problems can only be modeled by credal nets called non-separately specified. These, however, are still missing a graphical representation language and updating algorithms. The situation is quite the opposite with separately specified credal nets, which have been the subject of much study and algorithmic development. This paper gives two major contributions. First, it delivers a new graphical language to formulate any type of credal network, both separately and non-separately specified. Second, it shows that any non-separately specified net represented with the new language can be easily transformed into an equivalent separately specified net, defined over a larger domain. This result opens up a number of new outlooks and concrete outcomes: first of all, it immediately enables the existing algorithms for separately specified credal nets to be applied to non-separately specified ones. We explore this possibility for the 2U algorithm: an algorithm for exact updating of singly connected credal nets, which is extended by our results to a class of non-separately specified models. We also consider the problem of inference on Bayesian networks, when the reason that prevents some of the variables from being observed is unknown. The problem is first reformulated in the new graphical language, and then mapped into an equivalent problem on a separately specified net. This provides a first algorithmic approach to this kind of inference, which is also proved to be NP-hard by similar transformations based on our formalism.

Published in International Journal of Approximate Reasoning 49(2), 345–361.

Generalized loopy 2U: a new algorithm for approximate inference in credal networks

Authors: Alessandro Antonucci, Marco Zaffalon, Yi Sun and Cassio Polpo de Campos.
Year: 2008.

Abstract: Credal nets generalize Bayesian nets by relaxing the requirement of precision of probabilities. Credal nets are considerably more expressive than Bayesian nets, but this makes belief updating NP-hard even on polytrees. We develop a new efficient algorithm for approximate belief updating in credal nets. The algorithm is based on an important representation result we prove for general credal nets: that any credal net can be equivalently reformulated as a credal net with binary variables; moreover, the transformation, which is considerably more complex than in the Bayesian case, can be implemented in polynomial time. The equivalent binary credal net is updated by L2U, a loopy approximate algorithm for binary credal nets. Thus, we generalize L2U to non-binary credal nets, obtaining an accurate and scalable algorithm for the general case, which is approximate only because of its loopy nature. The accuracy of the inferences is evaluated by empirical tests.

Published in Jaeger, M., Nielsen, T. D. (Eds), PGM 2008. Hirtshals (Denmark), 17–24.

A revised and updated version of this conference paper is Generalized loopy 2U: a new algorithm for approximate inference in credal networks.

Credal model averaging: an extension of Bayesian model averaging to imprecise probabilities

Authors: Giorgio Corani and Marco Zaffalon.
Year: 2008.

Abstract: We deal with the arbitrariness in the choice of the prior over the models in Bayesian model averaging (BMA), by modelling prior knowledge by a set of priors (i.e., a prior credal set). We consider Dash and Cooper's BMA applied to naive Bayesian networks, replacing the single prior over the naive models by a credal set; this models a condition close to prior ignorance about the models, which leads to credal model averaging (CMA). CMA returns an indeterminate classification, i.e., multiple classes, on the instances for which the learning set is not informative enough to smooth the effect of the choice of the prior. We give an algorithm to compute exact credal model averaging for naive networks. Extensive experiments show that indeterminate classifications preserve the reliability of CMA on the instances which are classified in a prior-dependent way by BMA.

Published in Daelemans, W., Goethals, B., Morik, K. (eds), ECML PKDD 2008. Lecture notes in Computer Science 5211, Springer, 257–271.

Spatially distributed identification of debris flow source areas by credal networks

Authors: Andrea Salvetti, Alessandro Antonucci and Marco Zaffalon.
Year: 2008.

Abstract: Debris flows represent a very destructive natural hazard, affecting buildings, transport infrastructures, and, very often, causing human losses in mountain regions. That makes the identification of potential source areas of debris flows inside a watershed particularly important. In this paper we present a general identification procedure based on the credal network (that is an imprecise probabilistic graphical model generalizing Bayesian networks) originally introduced by [Antonucci et al., 2004]. That model is significantly improved by a more refined description of the meteorological and hydrological processes contributing to the debris flow initiation. As a counterpart of such improvement, the model pays a slight increase in terms of computational time for identifications. That does not prevent its extensive, spatially distributed, application to whole basins, thanks to a preliminary deterministic analysis that rejects local areas where the triggering of a debris flow cannot take place. The overall procedure is tested for a debris flow prone watershed in Southern Switzerland. The model detects the areas in the basin more prone to debris flow initiation and also shows that different rainfall return periods produce different patterns of hazard in the basin. That makes it possible with this procedure to determine the return period of the critical rainfall that triggers debris flow as a result of channel-bed failure in a specific point along the drainage network.

Published in Sànchez-Marrè, M., Béjar, J., Comas, J., Rizzoli, A. E., Guariso, G. (Eds), iEMSs 2008, iEMSs, 380–387.

Learning reliable classifiers from small or incomplete data sets: the naive credal classifier 2

Authors: Giorgio Corani and Marco Zaffalon.
Year: 2008.

Abstract: In this paper, the naive credal classifier, which is a set-valued counterpart of naive Bayes, is extended to a general and flexible treatment of incomplete data, yielding a new classifier called naive credal classifier 2 (NCC2). The new classifier delivers classifications that are reliable even in the presence of small sample sizes and missing values. Extensive empirical evaluations show that, by issuing set-valued classifications, NCC2 is able to isolate and properly deal with instances that are hard to classify (on which naive Bayes accuracy drops considerably), and to perform as well as naive Bayes on the other instances. The experiments point to a general problem: they show that with missing values, empirical evaluations may not reliably estimate the accuracy of a traditional classifier, such as naive Bayes. This phenomenon adds even more value to the robust approach to classification implemented by NCC2.

Published in Journal of Machine Learning Research 9, 581–621.

The software that implements the naive credal classifier 2 is described in another paper published in Journal of Machine Learning Research: JNCC2, the implementation of naive credal classifier 2, and it is freely available at the JNCC2 page.

Naive credal classifier 2: an extension of naive Bayes for delivering robust classifications

Authors: Giorgio Corani and Marco Zaffalon.
Year: 2008.

Abstract: Naive credal classifier 2 (NCC2) extends naive Bayes in order to deliver more robust classifications. NCC2 is based on a set of prior densities rather than on a single prior; as a consequence, when faced with instances whose classification is prior-dependent (and therefore might not be reliable), it returns a set of classes (we call this an indeterminate classification) instead of a single class. Moreover, NCC2 introduces a very general and flexible treatment of missing data, which, under certain circumstances, can also lead to indeterminate classifications. In this case, indeterminacy can be regarded as a way to preserve reliability despite the information hidden by missing values. We call hard-to-classify the instances classified indeterminately by NCC2. Extensive empirical evaluations show that naive Bayes' accuracy drops considerably on the hard-to-classify instances identified by NCC2, and that on the other hand, NCC2 has high set-accuracy (the proportion of times that the actual class is contained in the set of returned classes) when it is indeterminate.

Published in Stahlbock, R., Crone, S. F., Lessmann, S. (Eds), DMIN'08, CSREA Press, 84–90.

A revised and updated version of this conference paper is Learning reliable classifiers from small or incomplete data sets: the naive credal classifier 2.

Credal networks for operational risk measurement and management

Authors: Alessandro Antonucci, Alberto Piatti and Marco Zaffalon.
Year: 2007.

Abstract: According to widely accepted guidelines for self-regulation, the capital requirements of a bank should relate to the level of risk with respect to three different categories. Among them, operational risk is the more difficult to assess, as it requires merging expert judgments and quantitative information about the functional structure of the bank. A number of approaches to the evaluation of operational risk based on Bayesian networks have been recently considered. In this paper, we propose credal networks, which are a generalization of Bayesian networks to imprecise probabilities, as a more appropriate framework for the measurement and management of operational risk. The reason is the higher flexibility provided by credal networks compared to Bayesian networks in the quantification of the probabilities underlying the model: this makes it possible to represent human expertise required for these evaluations in a credible and robust way. We use a real-world application to demonstrate these features and to show how to measure operational risk by means of algorithms for inference over credal nets. This is shown to be possible, also in the case when the observation of some factor is vague.

Published in Apolloni, B., Howlett, R. J., Jain, L. C., KES2007, Lectures Notes in Computer Science 4693, Springer, 604–611.

Credal networks for military identification problems

Authors: Alessandro Antonucci, Ralph Brühlmann, Alberto Piatti and Marco Zaffalon.
Year: 2007.

Abstract: Credal networks are imprecise probabilistic graphical models generalizing Bayesian networks to convex sets of probability mass functions. This makes credal networks particularly suited to capture and model expert knowledge under very general conditions, including states of qualitative and incomplete knowledge. In this paper, we present a credal network for risk evaluation in case of intrusion of civil aircrafts into a no-fly zone. The different factors relevant for this evaluation, together with an independence structure over them, are initially identified. These factors are observed by sensors, whose reliabilities can be affected by variable external factors, and even by the behavior of the intruder. A model of these observation mechanisms, and the necessary fusion scheme for the information returned by the sensors measuring the same factor, are both completely embedded into the structure of the credal network. A pool of experts, facilitated in their task by specific techniques to convert qualitative judgments into imprecise probabilistic assessments, has made possible the quantification of the network. We show the capabilities of the proposed network by means of some preliminary tests referred to simulated scenarios. Overall, we can regard this application as an useful tool to support military experts in their decision, but also as a quite general imprecise-probability paradigm for information fusion.

Published in de Cooman, G, Vejnarová, J., Zaffalon, M. (Eds), ISIPTA '07, Action M Agency, Prague (Czech Republic), 1–10.

A revised and updated version of this conference paper is Credal networks for military identification problems.

Coherence graphs

Authors: Enrique Miranda and Marco Zaffalon.
Year: 2007.

Abstract: We consider the task of proving Walley's (joint or strong) coherence of a number of probabilistic assessments, when these assessments are represented as a collection of conditional lower previsions. In order to maintain generality in the analysis, we assume to be given nearly no information about the numbers that make up the lower previsions in the collection. Under this condition, we investigate the extent to which the above global task can be decomposed into simpler and more local ones. This is done by introducing a graphical representation of the conditional lower previsions, that we call the coherence graph: we show that the coherence graph allows one to isolate some subsets of the collection whose coherence is sufficient for the coherence of all the assessments. The situation is shown to be completely analogous in the case of Walley's notion of weak coherence, for which we prove in addition that the subsets found are optimal, in the sense that they embody the maximal degree to which the task of checking weak coherence can be decomposed. In doing all of this, we obtain a number of related results: we give a new characterisation of weak coherence; we characterise, by means of a special kind of coherence graph, when the local notion of separate coherence is sufficient for coherence; and we provide an envelope theorem for collections of lower previsions whose graph is of the latter type.

Published in de Cooman, G, Vejnarová, J., Zaffalon, M. (Eds), ISIPTA '07, Action M Agency, Prague (Czech Republic), 297–306.

A revised and updated version of this conference paper is Coherence graphs.

Learning about a categorical latent variable under prior near-ignorance

Authors: Alberto Piatti, Marco Zaffalon, Fabio Trojani and Marcus Hutter.
Year: 2007.

Abstract: It is well known that complete prior ignorance is not compatible with learning, at least in a coherent theory of (epistemic) uncertainty. What is less widely known, is that there is a state similar to full ignorance, that Walley calls near-ignorance, that permits learning to take place. In this paper we provide new and substantial evidence that also near-ignorance cannot be really regarded as a way out of the problem of starting statistical inference in conditions of very weak beliefs. The key to this result is focusing on a setting characterized by a variable of interest that is latent. We argue that such a setting is by far the most common case in practice, and we show, for the case of categorical latent variables (and general manifest variables) that there is a sufficient condition that, if satisfied, prevents learning to take place under prior near-ignorance. This condition is shown to be easily satisfied in the most common statistical problems.

Published in de Cooman, G, Vejnarová, J., Zaffalon, M. (Eds), ISIPTA '07, Action M Agency, Prague (Czech Republic), 357–364.

A revised and updated version of this conference paper is Limits of learning about a categorical latent variable under prior near-ignorance.

Notes on "Notes on conditional previsions"

Authors: Paolo Vicig, Marco Zaffalon and Fabio G. Cozman.
Year: 2007.

Abstract: These notes comment on Williams' fundamental essay Notes on Conditional Previsions, written as a research report in 1975 and published in the present issue. Basic aspects of that work are discussed, including historical background and relevance to the foundations of probability; examples are supplied to help understanding.

Published in International Journal of Approximate Reasoning 44(3), 358–365.

Fast algorithms for robust classification with Bayesian nets

Authors: Alessandro Antonucci and Marco Zaffalon.
Year: 2007.

Abstract: We focus on a well-known classification task with expert systems based on Bayesian networks: predicting the state of a target variable given an incomplete observation of the other variables in the network, i.e., an observation of a subset of all the possible variables. To provide conclusions robust to near-ignorance about the process that prevents some of the variables from being observed, it has recently been derived a new rule, called conservative updating. With this paper we address the problem to efficiently compute the conservative updating rule for robust classification with Bayesian networks. We show first that the general problem is NP-hard, thus establishing a fundamental limit to the possibility to do robust classification efficiently. Then we define a wide subclass of Bayesian networks that does admit efficient computation. We show this by developing a new classification algorithm for such a class, which extends substantially the limits of efficient computation with respect to the previously existing algorithm. The algorithm is formulated as a variable elimination procedure, whose computation time is linear in the input size.

Published in International Journal of Approximate Reasoning 44(3), 200–223.

Locally specified credal networks

Authors: Alessandro Antonucci and Marco Zaffalon.
Year: 2006.

Abstract: Credal networks are models that extend Bayesian nets to deal with imprecision in probability, and can actually be regarded as sets of Bayesian nets. Evidence suggests that credal nets are a powerful means to represent and deal with many important and challenging problems in uncertain reasoning. We give examples to show that some of these problems can only be modelled by credal nets called non-separately specified. These, however, are still missing a graphical representation language and solution algorithms. The situation is quite the opposite with separately specified credal nets, which have been the subject of much study and algorithmic development. This paper gives two major contributions. First, it delivers a new graphical language to formulate any type of credal network, both separately and non-separately specified. Second, it shows that any non-separately specified net represented with the new language can be easily transformed into an equivalent separately specified net, defined over a larger domain. This result opens up a number of new perspectives and concrete outcomes: first of all, it immediately enables the existing algorithms for separately specified credal nets to be applied to non-separately specified ones.

Published in Studený, M., Vomlel, J. (Eds), PGM 2006, Action M Agency, Prague (Czech Republic), 25–34.

A revised and updated version of this conference paper is Decision-theoretic specification of credal networks: a unified language for uncertain modeling with sets of Bayesian networks.

Classification of dementia types from cognitive profiles data

Authors: Giorgio Corani, Chris Edgar, Isabelle Marshall, Keith Wesnes and Marco Zaffalon.
Year: 2006.

Abstract: The Cognitive Drug Research (CDR) system is specifically validated for dementia assessment; it consists of a series of computerized tests, which assess the cognitive faculties of the patient to derive a cognitive profile. We use six different classification algorithms to classify clinically diagnosed diseases from their cognitive profiles. Good accuracy was obtained in separating patients affected by Parkinson's disease from demented patients, and in discriminating between Alzheimer's disease and Vascular Dementia. However, in discriminating between Parkinson disease with dementia (PDD) and dementia with Lewy bodies (DLB), the accuracy was only slightly superior to chance; the existence of a significant difference in the cognitive profiles of DLB and PDD is indeed questioned in the medical literature.

Published in ECML 2006. Lecture Notes in Computer Science 4213, Springer, Berlin, 470–477.

Binarization algorithms for approximate updating in credal nets

Authors: Alessandro Antonucci, Marco Zaffalon, Jaime S. Ide and Fabio G. Cozman.
Year: 2006.

Abstract: Credal networks generalize Bayesian networks relaxing numerical parameters. This considerably expands expressivity, but makes belief updating a hard task even on polytrees. Nevertheless, if all the variables are binary, polytree-shaped credal networks can be efficiently updated by the 2U algorithm. In this paper we present a binarization algorithm, that makes it possible to approximate an updating problem in a credal net by a corresponding problem in a credal net over binary variables. The procedure leads to outer bounds for the original problem. The binarized nets are in general multiply connected, but can be updated by the loopy variant of 2U. The quality of the overall approximation is investigated by promising numerical experiments.

Published in Penserini, L., Peppas, P., Perini, A. (Eds), STAIRS'06. IOS Press, Amsterdam (Netherlands), 120–131.

Equivalence between Bayesian and credal nets on an updating problem

Authors: Alessandro Antonucci and Marco Zaffalon.
Year: 2006.

Abstract: We establish an intimate connection between Bayesian and credal nets. Bayesian nets are precise graphical models, credal nets extend Bayesian nets to imprecise probability. We focus on traditional belief updating with credal nets, and on the kind of belief updating that arises with Bayesian nets when the reason for the missingness of some of the unobserved variables in the net is unknown. We show that the two updating problems are formally the same.

Published in Lawry, J., Miranda, E., Bugarin, A., Li, S., Gil, M. A., Grzegorzewski, P., Hryniewicz, O. (Eds), SMPS 2006, Soft Methods for Integrated Uncertainty Modelling, Springer, 223–230.

Robust inference of trees

Authors: Marco Zaffalon and Marcus Hutter.
Year: 2005.

Abstract: This paper is concerned with the reliable inference of optimal tree-approximations to the dependency structure of an unknown distribution generating the data. The traditional approach to the problem measures the dependency strength between random variables by the index called mutual information. In this paper reliability is achieved by Walley's imprecise Dirichlet model, which generalizes Bayesian learning with Dirichlet priors. Adopting the imprecise Dirichlet model results in posterior interval expectation for mutual information, and in a set of plausible trees consistent with the data. Reliable inference about the actual tree is achieved by focusing on the substructure common to all the plausible trees. We develop an exact algorithm that infers the substructure in time $O(m^4)$, $m$ being the number of random variables. The new algorithm is applied to a set of data sampled from a known distribution. The method is shown to reliably infer edges of the actual tree even when the data are very scarce, unlike the traditional approach. Finally, we provide lower and upper credibility limits for mutual information under the imprecise Dirichlet model. These enable the previous developments to be extended to a full inferential method for trees.

Published in Annals of Mathematics and Artificial Intelligence 45(1–2), 215–239.

Fast algorithms for robust classification with Bayesian nets

Authors: Alessandro Antonucci and Marco Zaffalon.
Year: 2005.

Abstract: We focus on a well-known classification task with expert systems based on Bayesian networks: predicting the state of a target variable given an incomplete observation of the other variables in the network, i.e., an observation of a subset of all the possible variables. To provide conclusions robust to near-ignorance about the process that prevents some of the variables from being observed, it has recently been derived a new rule, called conservative updating. With this paper we address the problem to efficiently compute the conservative updating rule for robust classification with Bayesian networks. We show first that the general problem is NP-hard, thus establishing a fundamental limit to the possibility to do robust classification efficiently. Then we define a wide subclass of Bayesian networks that does admit efficient computation. We show this by developing a new classification algorithm for such a class, which extends substantially the limits of efficient computation with respect to the previously existing algorithm.

Published in Cozman, F. G., Nau, R., Seidenfeld, T. (Eds), ISIPTA '05. SIPTA, 11–20.

A revised and updated version of this conference paper is Fast algorithms for robust classification with Bayesian nets.

Limits of learning from imperfect observations under prior ignorance: the case of the imprecise Dirichlet model

Authors: Alberto Piatti, Marco Zaffalon and Fabio Trojani.
Year: 2005.

Abstract: Consider a relaxed multinomial setup, in which there may be mistakes in observing the outcomes of the process—this is often the case in real applications. What can we say about the next outcome if we start learning about the process in conditions of prior ignorance? To answer this question we extend the imprecise Dirichlet model to the case of imperfect observations and we focus on posterior predictive probabilities for the next outcome. The results are very surprising: the posterior predictive probabilities are vacuous, irrespectively of the amount of observations we do, and however small is the probability of doing mistakes. In other words, the imprecise Dirichlet model cannot help us to learn from data when the observational mechanism is imperfect. This result seems to rise a serious question about the use of the imprecise Dirichlet model for practical applications, and, more generally, about the possibility to learn from imperfect observations under prior ignorance.

Published in Cozman, F. G., Nau, R., Seidenfeld, T. (Eds), ISIPTA '05. SIPTA, 276–286.

A revised and updated version of this conference paper is Limits of learning about a categorical latent variable under prior near-ignorance.

Conservative rules for predictive inference with incomplete data

Author: Marco Zaffalon.
Year: 2005.

Abstract: This paper addresses the following question: how should we update our beliefs after observing some incomplete data, in order to make credible predictions about new, and possibly incomplete, data? There may be several answers to this question according to the model of the process that creates the incompleteness. This paper develops a rigorous modelling framework that makes it clear the conditions that justify the different answers; and, on this basis, it derives a new conditioning rule for predictive inference to be used in a wide range of states of knowledge about the incompleteness process, including near-ignorance, which, surprisingly, does not seem to have received attention so far. Such a case is instead particularly important, as modelling incompleteness processes can be highly impractical, and because there are limitations to statistical inference with incomplete data: it is generally not possible to learn how incompleteness processes work by using the available data; and it may not be possible, as the paper shows, to measure empirically the quality of the predictions. Yet, these depend heavily on the assumptions made.

Published in Cozman, F. G., Nau, R., Seidenfeld, T. (Eds), ISIPTA '05. SIPTA, 406–415.

A revised and updated version of this conference paper is Conservative inference rule for uncertain reasoning under incompleteness.

Updating beliefs with incomplete observations

Authors: Gert de Cooman and Marco Zaffalon.
Year: 2004.

Abstract: Currently, there is renewed interest in the problem, raised by Shafer in 1985, of updating probabilities when observations are incomplete (or set-valued). This is a fundamental problem in general, and of particular interest for Bayesian networks. Recently, Grünwald and Halpern have shown that commonly used updating strategies fail in this case, except under very special assumptions. In this paper we propose a new method for updating probabilities with incomplete observations. Our approach is deliberately conservative: we make no assumptions about the so-called incompleteness mechanism that associates complete with incomplete observations. We model our ignorance about this mechanism by a vacuous lower prevision, a tool from the theory of imprecise probabilities, and we use only coherence arguments to turn prior into posterior (updated) probabilities. In general, this new approach to updating produces lower and upper posterior probabilities and previsions (expectations), as well as partially determinate decisions. This is a logical consequence of the existing ignorance about the incompleteness mechanism. As an example, we use the new updating method to properly address the apparent paradox in the 'Monty Hall' puzzle. More importantly, we apply it to the problem of classification of new evidence in probabilistic expert systems, where it leads to a new, so-called conservative updating rule. In the special case of Bayesian networks constructed using expert knowledge, we provide an exact algorithm to compare classes based on our updating rule, which has linear-time complexity for a class of networks wider than polytrees. This result is then extended to the more general framework of credal networks, where computations are often much harder than with Bayesian nets. Using an example, we show that our rule appears to provide a solid basis for reliable updating with incomplete observations, when no strong assumptions about the incompleteness mechanism are justified.

Published in Artificial Intelligence 159(1–2), 75–125.

This paper has been reviewed by M. F. McNally in Computing Reviews 46(12), 2005.

Assessing debris flow hazard by credal nets

Authors: Alessandro Antonucci, Andrea Salvetti and Marco Zaffalon.
Year: 2004.

Published in López-Díaz, M., Gil, M. A., Grzegorzewski, P., Hryniewicz, O., Lawry, J. (Eds), SMPS 2004, Soft Methodology and Random Information Systems, Springer, 125–132.

Hazard assessment of debris flows by credal networks

Authors: Alessandro Antonucci, Andrea Salvetti and Marco Zaffalon.
Year: 2004.

Abstract: Debris flows are destructive natural hazards that affect human life, buildings, and infrastructures. Despite their importance, debris flows are only partially understood, and human expertise still plays a key role for hazard identification. This paper proposes filling the modelling gap by using credal networks, an imprecise-probability model. The model uses a directed graph to capture the causal relationships between the triggering factors of debris flows. Quantitative influences are represented by probability intervals, determined from historical data, expert knowledge, and theoretical models. Most importantly, the model joins the empirical and the quantitative modelling levels, in the direction of more credible inferences. The model is evaluated on real case studies related to dangerous areas of the Ticino Canton, southern Switzerland. The case studies highlight the good capabilities of the model: for all the areas the model produces significant probabilities of hazard.

Published in Pahl-Wostl, C., Schmidt, S., Rizzoli, A. E., Jakeman, A. J. (Eds), iEMSs, iEMSs 98–103.

Credal networks for hazard assessment of debris flows

Authors: Alessandro Antonucci, Andrea Salvetti and Marco Zaffalon.
Year: 2007.

Published in Kropp, J., Scheffran, J. (Eds), Advanced Methods for Decision Making and Risk Management in Sustainability Science, Nova Science Publishers, New York, 237–256

Distribution of mutual information from complete and incomplete data

Authors: Marcus Hutter and Marco Zaffalon.
Year: 2005.

Abstract: Mutual information is widely used, in a descriptive way, to measure the stochastic dependence of categorical random variables. In order to address questions such as the reliability of the descriptive value, one must consider sample-to-population inferential approaches. This paper deals with the posterior distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution. The exact analytical expression for the mean, and analytical approximations for the variance, skewness and kurtosis are derived. These approximations have a guaranteed accuracy level of the order O(n^{-3}), where n is the sample size. Leading order approximations for the mean and the variance are derived in the case of incomplete samples. The derived analytical expressions allow the distribution of mutual information to be approximated reliably and quickly. In fact, the derived expressions can be computed with the same order of complexity needed for descriptive mutual information. This makes the distribution of mutual information become a concrete alternative to descriptive mutual information in many applications which would benefit from moving to the inductive side. We discuss some of these prospective applications, and for one of them, namely feature selection, we show how inductive mutual information leads to significant improvements in the inferred models.

Published in Computational Statistics and Data Analysis 48(3), 633–657.

Credible classification for environmental problems

Author: Marco Zaffalon.
Year: 2005.

Abstract: Classifiers that aim at doing reliable predictions should rely on carefully elicited prior knowledge. Often this is not available so they should be able to start learning from data in condition of prior ignorance. This paper shows empirically, on an agricultural data set, that established methods of classification do not always adhere to this principle. Common ways to represent prior ignorance are shown to have an overwhelming weight compared to the information in the data, producing overconfident predictions. This point is crucial for problems, such as environmental ones, where prior knowledge is often scarce and even the data may not be known precisely. Credal classification, and in particular the naive credal classifier, are proposed as more faithful ways to represent states of ignorance. We show that with credal classification, conditions of ignorance may limit the power of the inferences, not the reliability of the predictions.

Published inEnvironmental Modelling & Software 20(8), 1003–1012.

Bayesian treatment of incomplete discrete data applied to mutual information and feature selection

Authors: Marcus Hutter and Marco Zaffalon.
Year: 2003.

Abstract: Given the joint chances of a pair of random variables one can compute quantities of interest, like the mutual information. The Bayesian treatment of unknown chances involves computing, from a second order prior distribution and the data likelihood, a posterior distribution of the chances. A common treatment of incomplete data is to assume ignorability and determine the chances by the expectation maximization (EM) algorithm. The two different methods above are well established but typically separated. This paper joins the two approaches in the case of Dirichlet priors, and derives efficient approximations for the mean, mode and the (co)variance of the chances and the mutual information. Furthermore, we prove the unimodality of the posterior distribution, whence the important property of convergence of EM to the global maximum in the chosen framework. These results are applied to the problem of selecting features for incremental learning and naive Bayes classification. A fast filter based on the distribution of mutual information is shown to outperform the traditional filter based on empirical mutual information on a number of incomplete real data sets.

Published in In Günter, A., Kruse, R., Neumann, B., KI-2003. Lecture Notes in Computer Science 2821, Springer, Heidelberg, 396–406.

A revised and updated version of this conference paper is Distribution of mutual information from complete and incomplete data.

Updating with incomplete observations

Authors: Gert de Cooman and Marco Zaffalon.
Year: 2003.

Abstract: Currently, there is renewed interest in the problem, raised by Shafer in 1985, of updating probabilities when observations are incomplete (or set-valued). This is a fundamental problem, and of particular interest for Bayesian networks. Recently, Grünwald and Halpern have shown that commonly used updating strategies fail here, except under very special assumptions. We propose a new rule for updating probabilities with incomplete observations. Our approach is deliberately conservative: we make no or weak assumptions about the so-called incompleteness mechanism that produces incomplete observations. We model our ignorance about this mechanism by a vacuous lower prevision, a tool from the theory of imprecise probabilities, and we derive a new updating rule using coherence arguments. In general, our rule produces lower posterior probabilities, as well as partially determinate decisions. This is a logical consequence of the ignorance about the incompleteness mechanism. We show how the new rule can properly address the apparent paradox in the `Monty Hall' puzzle. In addition, we apply it to the classification of new evidence in Bayesian networks constructed using expert knowledge. We provide an exact algorithm for this task with linear-time complexity, also for multiply connected nets.

Published in Kjærulff, U., Meek, C. (Eds), UAI 2003, Morgan Kaufmann 142–150.

Powerpoint slides of presentation are also available.

A revised and updated version of this conference paper is Updating beliefs with incomplete observations.

Tree-based credal networks for classification

Authors: Marco Zaffalon and Enrico Fagiuoli.
Year: 2003.

Abstract: Bayesian networks are models for uncertain reasoning successfully applied to the data-mining task of classification. Credal networks extend Bayesian nets to sets of distributions, or credal sets. This paper extends a state-of-the-art Bayesian net for classification, called tree-augmented naive Bayes classifier, to credal sets originated from probability intervals. We formalize the new model, develop an exact linear-time classification algorithm, and evaluate the credal net-based classifier on a number of real data sets. The empirical analysis shows that the new classifier is good and reliable, and raises a problem of overly caution that is discussed in the paper. Overall, given the favorable trade-off between expressiveness and efficient computation, the newly proposed classifier appear to be a good candidate for the wide-scale application of credal networks to real and complex tasks.

Published in Reliable Computing 9(6), 487–509.

Reliable diagnoses of dementia by the naive credal classifier inferred from incomplete cognitive data

Authors: Marco Zaffalon, Keith Wesnes and Orlando Petrini.
Year: 2003.

Abstract: Dementia is a serious personal, medical and social problem. Recent research indicates early and accurate diagnoses as the key to effectively cope with it. No definitive cure is available but in some cases when the impairment is still mild the disease can be contained. This paper describes a diagnostic tool that jointly uses the naive credal classifier and the most widely used computerized system of cognitive tests in dementia research, the Cognitive Drug Research system. The naive credal classifier extends the discrete naive Bayes classifier to imprecise probabilities. The naive credal classifier models both prior ignorance and ignorance about the likelihood by sets of probability distributions. This is a new way to deal with small or incomplete data sets that departs significantly from most established classification methods. In the empirical study presented here the naive credal classifier provides reliability and unmatched predictive performance. It delivers up to 95% correct predictions while being very robust with respect to the partial ignorance due to the largely incomplete data. The diagnostic tool also proves to be very effective in discriminating between Alzheimer's disease and Dementia with Lewy Bodies.

Published in Artificial Intelligence in Medicine 29(1–2), 61–79.

Robust feature selection by mutual information distributions

Authors: Marco Zaffalon and Marcus Hutter.
Year: 2002.

Abstract: Mutual information is widely used in artificial intelligence, in a descriptive way, to measure the stochastic dependence of discrete random variables. In order to address questions such as the reliability of the empirical value, one must consider sample-to-population inferential approaches. This paper deals with the distribution of mutual information, as obtained in a Bayesian framework by a second-order Dirichlet prior distribution. The exact analytical expression for the mean and an analytical approximation of the variance are reported. Asymptotic approximations of the distribution are proposed. The results are applied to the problem of selecting features for incremental learning and classification of the naive Bayes classifier. A fast, newly defined method is shown to outperform the traditional approach based on empirical mutual information on a number of real data sets. Finally, a theoretical development is reported that allows one to efficiently extend the above methods to incomplete samples in an easy and effective way.

Published in Darwiche, A., Friedman, N. (Eds), UAI 2002, Morgan Kaufmann 577–584.

A revised and updated version of this conference paper is Distribution of mutual information from complete and incomplete data.

Exact credal treatment of missing data

Author: Marco Zaffalon.
Year: 2002.

Abstract: This paper proposes an exact, no-assumptions approach to dealing with incomplete sets of multivariate categorical data. An incomplete data set is regarded as a finite collection of complete data sets, and a joint distribution is obtained from each of them, at a descriptive level. The tools to simultaneously treat all the possible joint distributions compatible with an incomplete set of data are given. In particular, a linear description of the set of distributions is formulated, and it is shown that the computation of bounds on the expectation of real-valued functions under such distributions is both possible and efficient, by means of linear programming. Specific algorithms are also developed whose complexity grows linearly in the number of observations. An analysis is then carried out to estimate population probabilities from incomplete multinomial samples. The descriptive tool extends in a straightforward way to the inferential problem by exploiting Walley's imprecise Dirichlet model.

Published in Journal of Statistical Planning and Inference 105(1), 105–122.

The naive credal classifier

Author: Marco Zaffalon.
Year: 2002.

Abstract: Convex sets of probability distributions are also called credal sets. They generalize probability theory by relaxing the requirement that probability values be precise. Classification, i.e. assigning class labels to instances described by a set of attributes, is an important domain of application of Bayesian methods, where the naive Bayes classifier has a surprisingly good performance. This paper proposes a new method of classification which involves extending the naive Bayes classifier to credal sets. Exact and effective solution procedures for naive credal classification are derived, and the related dominance criteria are discussed. Credal classification appears as a new method, based on more realistic assumptions and in the direction of more reliable inferences.

Published in Journal of Statistical Planning and Inference 105(1), 5–21.

Credal classification for dementia screening

Authors: Marco Zaffalon, Keith Wesnes and Orlando Petrini.
Year: 2001.

Abstract: Dementia is a very serious personal, medical and social problem. Early and accurate diagnoses seem to be the key to effectively cope with it. This paper presents a diagnostic tool that couples the most widely used computerized system of cognitive tests in dementia research, the Cognitive Drug Research system, with the naive credal classifier. Although the classifier is trained on an incomplete database, it provides unmatched predictive performance and reliability. The tool also proves to be very effective in discriminating between Alzheimer's disease and dementia with Lewy bodies, which is a problem on the frontier of research on dementia.

Published in Quaglini, S., Barahona P., Andreassen, S. (Eds), AIME '01. Lecture Notes in Computer Science 2101, Springer, 67–76.

A revised and updated version of this conference paper is Reliable diagnoses of dementia by the naive credal classifier inferred from incomplete cognitive data.

Robust discovery of tree-dependency structures

Author: Marco Zaffalon.
Year: 2001.

Abstract: The problem of inferring dependency structures from random samples is a very fundamental topic in artificial intelligence and statistics. This paper reviews an early result from Chow and Liu on the approximation of unknown multinomial distributions by tree-dependency distributions, at the light of imprecise probabilities. Imprecision, arising here from Walley's imprecise Dirichlet model, generally makes many tree structures be plausible given the data. This paper focuses on the inference of the substructure common to all the possible trees. Such common pattern is a set of reliable dependencies. The problem of identifying the common pattern is abstracted and solved here in the general context of graph algorithms. On this basis, an algorithm is developed that infers reliable dependencies in time O(k3), from a set of k binary random variables, that converge to a tree as the sample grows. The algorithm works by computing bounds on the solutions of global optimization problems. There are a number of reasons why trees are a very important special case of dependence graphs. This work appears as a significant step in the direction of discovering dependency structures under the realistic assumption of imprecise knowledge.

Published in de Cooman, G., Fine, T., Seidenfeld, T. (Eds), ISIPTA '01, Shaker Publishing, The Netherlands, 394–403.

A revised and updated version of this conference paper is Robust inference of trees.

Statistical inference of the naive credal classifier

Author: Marco Zaffalon.
Year: 2001.

Abstract: In the wish list of the characteristics of a classifier, there are a reliable approach to small data sets and a clear and robust treatment of incomplete samples. This paper copes with such difficult problems by adopting the paradigm of credal classification. By exploiting Walley's imprecise Dirichlet model, it defines how to infer the naive credal classifier from a possibly incomplete multinomial sample. The derived procedure is exact and linear in the number of attributes. The obtained classifier is robust to small data sets and to all the possible missingness mechanisms. The results of some experimental analyses that compare the naive credal classifier with naive Bayesian models support the presented approach.

Published in de Cooman, G., Fine, T., Seidenfeld, T. (Eds), ISIPTA '01, Shaker Publishing, The Netherlands, 384–393.

An optimization methodology for intermodal terminal management

Authors: Luca Maria Gambardella, Andrea E. Rizzoli, Monaldo Mastrolilli and Marco Zaffalon.
Year: 2001.

Published in Journal of Intelligent Manufacturing 12, 521–534.

Tree-augmented naive credal classifiers

Authors: Enrico Fagiuoli and Marco Zaffalon.
Year: 2000.

Abstract: Credal classification is a generalization of common classification which allows a coherent treatment of imprecision to be realized. This paper defines a credal classifier by extending the discrete tree-augmented naive Bayes classifiers to sets of probability distributions. A solution algorithm is developed whose computational complexity is linear in the number of features. A method to induce the credal classifier from data is proposed.

A revised and updated version of this conference paper is Tree-based credal networks for classification.

A credal approach to naive classification

Author: Marco Zaffalon.
Year: 1999.

Abstract: Convex sets of probability distributions are also called credal sets. They generalize probability theory with special regard to the relaxation of the precision requirement about the probability values. Classification, i.e., assigning class labels to instances described by a set of attributes, is a typical domain of application of Bayesian methods, where the naive Bayesian classifier is considered among the best tools. This paper explores the classification model obtained when the naive Bayesian classifier is extended to credal sets. Exact and effective solution procedures for classification are derived, and the related dominance criteria are discussed. A methodology to induce the classifier from data is proposed.

Published in de Cooman, G., Cozman, F. G., Moral, S., Walley, P. (Eds), ISIPTA '99, The Imprecise Probabilities Project, Universiteit Gent, Belgium, 405–414.

A revised and updated version of this conference paper is The naive credal classifier.

2U: an exact interval propagation algorithm for polytrees with binary variables

Authors: Enrico Fagiuoli and Marco Zaffalon.
Year: 1998.

Abstract: This paper addresses the problem of computing posterior probabilities in a discrete Bayesian network where the conditional distributions of the model belong to convex sets. The computation on a general Bayesian network with convex sets of conditional distributions is formalized as a global optimization problem. It is shown that such a problem can be reduced to a combinatorial problem, suitable to exact algorithmic solutions. An exact propagation algorithm for the updating of a polytree with binary variables is derived. The overall complexity is linear to the size of the network, when the maximum number of parents is fixed.

Published in Artificial Intelligence 106(1), 77–107.

A note about redundancy in influence diagrams

Authors: Enrico Fagiuoli and Marco Zaffalon.
Year: 1998.

Abstract: Influence diagrams are formal tools for modelling decision processes and for computing optimal strategies under risk. Like Bayesian networks, influence diagrams exploit the sparsity of the dependency relationships among the random variables in order to reduce computational complexity. In this note, we initially observe that an influence diagram can have some arcs that are not necessary for a complete description of the model. We show that while it may not be easy to detect such arcs, it is important, since a redundant graphical structure can exponentially increase the computational time of a solution procedure. Then we define a graphical criterion that is shown to allow the identification and removal of the redundant parts of an influence diagrams. This technical result is significant because it precisely defines what is relevant to know at the time of a decision. Furthermore, it allows a redundant influence diagram to be transformed into another influence diagram, that can be used to compute the optimal policy in an equivalent but more efficient way.

Published in International Journal of Approximate Reasoning 19(3–4), 231–246.

Simulation and planning of an intermodal container terminal

Authors: Luca Maria Gambardella, Andrea E. Rizzoli and Marco Zaffalon.
Year: 1998.

Abstract: A decision support system for the management of an intermodal container terminal is presented. Among the problems to be solved, there are the spatial allocation of containers on the terminal yard, the allocation of resources and the scheduling of operations in order to maximise a performance function based on some economic indicators. These problems are solved using techniques from optimisation, like job-shop scheduling, genetic algorithms or mixed-integer linear programming. At the terminal, the same problems are usually solved by the terminal manager, only using his/her experience. The manager can trust computer generated solutions only by validating them by means of a simulation model of the terminal. Thus, the simulation tool also becomes a means to introduce new approaches into traditional settings. In the present paper we focus on the resource allocation problem. We describe our modules for the optimisation of the allocation process and for the simulation of the terminal. The former is based on integer linear programming; the latter is a discrete event simulation tool, based on the process-oriented paradigm. The simulator provides a test bed for checking the validity and the robustness of the policy computed by the optimisation module. The case study of the Contship La Spezia Container Terminal, located in the Mediterranean Sea in Italy, is examined.

Published in Simulation 71(2), 107–116.