## University of Texas at El Paso Computer Science Department Abstracts of 2012 Reports

Technical Report UTEP-CS-12-48, December 2012
A Hybrid Algorithm to Extract Fuzzy Measures for Software Quality Assessment
Xiaojing Wang, Martine Ceberio, Shamsnaz Virani, Angel Garcia, and Jeremy Cummins

Published in Journal of Uncertain Systems, 2013, Vol. 7, No. 3, pp. 219-237.

Being able to assess software quality is essential as software is ubiquitous in every aspect of our day-to-day lives. In this paper, we build upon existing research and metrics for defining software quality and propose a way to automatically assess software quality based on these metrics. In particular, we show that the problem of assessing the quality of software can be viewed as a multi-criteria decision-making (MCDM) problem. In Multi-Criteria Decision Making (MCDM), decisions are based on several criteria that are usually conflicting and non-homogenously satisfied. We use non-additive (fuzzy) measures combined with the Choquet integral to solve MCDM problems for they allow to model and aggregate the levels of satisfaction of the several criteria of the problem at hand by considering their relationships. However, in practice, it is difficult to identify such fuzzy measures. An automated process is then necessary and can be used when sample data is available. We propose to automatically assess software by modeling experts' decision process and extracting the fuzzy measure corresponding to their reasoning process: to do this, we use samples of the target experts' decision as seed data to automatically extract the fuzzy measure corresponding to the experts' decision process. In particular, we propose an algorithm to efficiently extract fuzzy measures that is a combination of the Bees algorithm and an interval constraint solver. Our experimental results show that we are able to improve some of the results obtained through previous approaches to automatic software quality assessment that used traditional machine learning techniques.

Technical Report UTEP-CS-12-47, December 2012
Metrological Self-Assurance Of Data Processing Software

Published in Proceedings of the XX IMEKO World Congress: Metrology for Green Growth, Busan, Korea, September 9-14, 2012.

The metrological self-assurance for data processing software is discussed. The way to achieve this property for software is presented.

Technical Report UTEP-CS-12-46, December 2012
F-transform in view of trend extraction
Irina Perfilieva, Vladik Kreinovich, and Vilem Novak

Published in: M. Inuiguchi, Y. Kusunoki, and M. Seki (eds.), Proceedings of the 15th Czech-Japan Seminar on Data Analysis and Decision Making under Uncertainty CJS'2012, Osaka, Japan, September 24-27, 2012.

In the analysis of time series, it is important to decompose the original values into trend, cycle, seasonal component, and noise. In this paper, we provide a theoretical justification of the fact that the F-transform can be used for this purpose. We formulate "natural" requirements on the trend extraction procedure and then show that the inverse F-transform fulfils all of them.

Technical Report UTEP-CS-12-45, December 2012
Revised version UTEP-CS-12-45a, February 2013
Security Games with Interval Uncertainty
Christopher Kiekintveld, Md. Towhidul Islam, and Vladik Kreinovich

Published in: T. Ito, C. Jonker, M. Gini, and O. Shehory (eds.), Proceedings of The Twelfth International Conference on Autonomous Agents and Multiagent Systems AAMAS'2013, Saint Paul, Minnesota, May 6-10, 2013, pp. 231-238.

Security games provide a framework for allocating limited security resources in adversarial domains, and are currently used in applications including security at the LAX airport, scheduling for the Federal Air Marshals, and patrolling strategies for the U.S. Coast Guard. One of the major challenges in security games is finding solutions that are robust to uncertainty about the game model. Bayesian game models have been developed to model uncertainty, but algorithms for these games do not scale well enough for many applications, and the problem is NP-hard.

We take an alternative approach based on using intervals to model uncertainty in security games. We present a fast polynomial time algorithm for security games with interval uncertainty. This provides the first viable approach for computing robust solutions to very large security games. In addition, we introduce a methodology for approximating the solutions to infinite Bayesian games with distributional uncertainty using intervals to approximate the distributions. We show empirically that using intervals is an effective approach for approximating solutions to these Bayesian games; our algorithm is both faster and results in higher quality solutions than the best previous methods.

Original file in pdf
revised version in pdf

Technical Report UTEP-CS-12-44, December 2012
Updated version UTEPO-CS-12-44a, January 2013
Imprecise Probabilities in Engineering Analyses
Michael Beer, Scott Ferson, and Vladik Kreinovich

Published in Mechanical Systems and Signal Processing, 2013, Vol. 37, pp. 4-29.

Probabilistic uncertainty and imprecision in structural parameters and in environmental conditions and loads are challenging phenomena in engineering analyses. They require appropriate mathematical modeling and quantification to obtain realistic results when predicting the behavior and reliability of engineering structures and systems. But the modeling and quantification is complicated by the characteristics of the available information, which involves, for example, sparse data, poor measurements and subjective information. This raises the question whether the available information is sufficient for probabilistic modeling or rather suggests a set-theoretical approach. The framework of imprecise probabilities provides a mathematical basis to deal with these problems which involve both probabilistic and non-probabilistic information. A common feature of the various concepts of imprecise probabilities is the consideration of an entire set of probabilistic models in one analysis. The theoretical differences between the concepts mainly concern the mathematical description of the set of probabilistic models and the connection to the probabilistic models involved. This paper provides an overview on developments which involve imprecise probabilities for the solution of engineering problems. Evidence theory, probability bounds analysis with p-boxes, and fuzzy probabilities are discussed with emphasis on their key features and on their relationships to one another.

Original file in pdf
updated version in pdf

Technical Report UTEP-CS-12-43, November 2012
Ubiquity of Data and Model Fusion: from Geophysics and Environmental Sciences to Estimating Individual Risk During an Epidemic
Omar Ochoa, Aline Jaimes, Christian Servin, Craig Tweedie, Aaron Velasco, Martine Ceberio, and Vladik Kreinovich

In many practical situations, we need to combine the results of measuring a local value of a certain quantity with results of measuring average values of this same quantity. For example, in geosciences, we need to combine the seismic models (which describe density at different locations and depths) with gravity models which describe density averaged over certain regions. Similarly, in estimating the risk of an epidemic to an individual, we need to combine probabilities describe the risk to people of the corresponding age group, to people of the corresponding geographical region, etc. In this paper, we provide general techniques for solving such model fusion problems.

To properly perform data and model fusion, we need to know the accuracy of different data points. Sometimes, this accuracy is not given. For such situations, we describe how this accuracy can be estimated based on the available data.

Technical Report UTEP-CS-12-42, November 2012
Updated version UTEP-CS-12-42a, January 2013
Data Anonymization that Leads to the Most Accurate Estimates of Statistical Characteristics

Published in Proceedings of the IEEE Symposium on Computational Intelligence for Engineering Solutions CIES'2013, Singapore, April 16-19, 2013, pp. 163-170.

To preserve privacy, we divide the data space into boxes, and instead of original data points, only store the corresponding boxes. In accordance with the current practice, the desired level of privacy is established by having at least k different records in each box, for a given value k (the larger the value k, the higher the privacy level).

When we process the data, then the use of boxes instead of the original exact values leads to uncertainty. In this paper, we find the (asymptotically) optimal subdivision of data into boxes, a subdivision that provides, for a given statistical characteristic like variance, covariance, or correlation, the smallest uncertainty within the given level of privacy.

In areas where the empirical data density is small, boxes containing k points are large in size, which results in large uncertainty. To avoid this, we propose, when computing the corresponding characteristic, to only use data from boxes with a sufficiently large density. This deletion of data points increases the statistical uncertainty, but decreases the uncertainty caused by introducing the privacy-related boxes. We explain how to compute an optimal threshold for which the overall uncertainty is the smallest.

Original file in pdf
Updated version in pdf

Technical Report UTEP-CS-12-41, November 2012
Updated version UTEP-CS-12-41b, January 2013
From p-Boxes to p-Ellipsoids: Towards an Optimal Representation of Imprecise Probabilities
Konstantin K. Semenov and Vladik Kreinovich

Published in Proceedings of the IEEE Symposium on Computational Intelligence for Engineering Solutions CIES'2013, Singapore, April 16-19, 2013, pp. 149-156.

One of the most widely used ways to represent a probability distribution is by describing its cumulative distribution function (cdf) F(x). In practice, we rarely know the exact values of F(x): for each x, we only know F(x) with uncertainty. In such situations, it is reasonable to describe, for each x, the interval [F(x)] of possible values of x. This representation of imprecise probabilities is known as a p-box; it is effectively used in many applications.

Similar interval bounds are possible for probability density function, for moments, etc. The problem is that when we transform from one of such representations to another one, we lose information. It is therefore desirable to come up with a universal representation of imprecise probabilities in which we do not lose information when we move from one representation to another. We show that under reasonable objective functions, the optimal representation is an ellipsoid. In particular, ellipsoids lead to faster computations, to narrower bounds, etc.

Original file in pdf
Updated version in pdf

Technical Report UTEP-CS-12-40, November 2012
Should Voting be Mandatory? Democratic Decision Making from the Economic Viewpoint
Olga Kosheleva, Vladik Kreinovich, and Baokun Li

Published in International Journal of Innovative Management, Information & Production (IJIMIP), 2012, Vol. 3, No. 4, pp. 80-84.

Many decisions are made by voting. At first glance, the more people participate in the voting process, the more democratic -- and hence, better -- the decision. In this spirit, to encourage everyone's participation, several countries make voting mandatory. But does mandatory voting really make decisions better for the society? In this paper, we show that from the viewpoint of decision making theory, it is better to allow undecided voters not to participate in the voting process. We also show that the voting process would be even better -- for the society as a whole -- if we allow partial votes. This provides a solid justification for a semi-heuristic "fuzzy voting" scheme advocated by Bart Kosko.

Technical Report UTEP-CS-12-39, November 2012
Nigel G. Ward and Steven D. Werner

Searching in audio archives is potentially very useful, and good evaluations can guide development to realize that promise. However most current evaluation programs are technology-centric, rather than user-oriented and task-centric. This paper examines current and potential audio search needs and scenarios, and presents a sample set of thirty-two diverse audio search tasks to support more realistic evaluations.

Technical Report UTEP-CS-12-38, October 2012
Interval Uncertainty as the Basis for a General Description of Uncertainty: A Position Paper

Published in International Journal of Intelligent Technologies and Applied Statistics (IJITAS), 2017, Vol. 10, No. 2, pp. 1-5.

Uncertainty is ubiquitous. Depending on what information we have, we get different types of uncertainty. For each type of uncertainty, techniques have been developed for efficient representation and processing of this uncertainty. However, the plethora of different uncertainty techniques is often confusing for practitioners. The situation is especially difficult in frequent situations when we need to gauge the uncertainty of the result of complex multi-stage data processing, and different data inputs are known with different types of uncertainty. To avoid this problem, it is necessary to develop and implement a general approach to representing and processing different types of uncertainty. In this paper, we argue that the most appropriate foundation for this general approach is interval uncertainty.

Technical Report UTEP-CS-12-37, October 2012
How to Define Relative Approximation Error of an Interval Estimate: A Proposal

Published in Applied Mathematical Sciences, 2013, Vol. 7, No. 5, pp. 211-216.

The traditional definition of a relative approximation error of an estimate X as the ratio |X - x|/|x| does not work when the actual value x is 0. To avoid this problem, we propose a new definition |X - x|/|X|. We show how this definition can be naturally extended to the case when instead of a numerical estimate X, we have an interval estimate [x], i.e., an interval that is guaranteed to contain the actual (unknown) value x.

Technical Report UTEP-CS-12-36, October 2012
Revised version UTEP-CS-12-36a, November 2012
Final version UTEP-CS-12-36b, November 2012
Zadeh's Vision of Going from Fuzzy to Computing With Words: from the Idea's Origin to Current Successes to Remaining Challenges

Published in Mathware and Soft Computing Magazine, 2012, Vol. 19, No. 2, pp. 41-42.

Original file in pdf
Revised version in pdf
Final version in pdf

Technical Report UTEP-CS-12-35, September 2012
If Energy Is Not Preserved, Then Planck's Constant Is No Longer a Constant: A Theorem

Published in Mathematical Structures and Modeling, 2012, Vol. 26, pp. 57-63.

For any physical theory, to experimentally check its validity, we need to formulate an alternative theory and check whether the experimental results are consistent with the original theory or with an alternative theory. In particular, to check whether energy is preserved, it is necessary to formulate an alternative theory in which energy is not preserved. Formulating such a theory is not an easy task in quantum physics, where the usual Schroedinger equation implicitly assumes the existence of an energy (Hamiltonian) operator whose value is preserved. In this paper, we show that the only way to get a consistent quantum theory with energy non-conservation is to use Heisenberg representation in which operators representing physical quantities change in time. We prove that in this representation, energy is preserved if and only if Planck's constant remains a constant. Thus, an appropriate quantum analogue of a theory with non-preserved energy is a theory in which Planck's constant can change -- i.e., is no longer a constant, but a new field.

Technical Report UTEP-CS-12-34, September 2012
Towards Unique Physically Meaningful Definitions of Random and Typical Objects
Luc Longpre and Olga Kosheleva

Published in Mathematical Structures and Modeling, 2012, Vol. 26, pp. 49-56.

To distinguish between random and non-random sequence, Kolmogorov and Martin-Lof proposed a new definition of randomness, according to which an object (e.g., a sequence of 0s and 1s) if random if it satisfies all probability laws, i.e., in more precise terms, if it does not belong to any definable set of probability measure 0. This definition reflect the usual physicists' idea that events with probability 0 cannot happen. Physicists -- especially in statistical physics -- often claim a stronger statement: that events with a very small probability cannot happen either. A modification of Kolmogorov-Martin-Lof's (KLM) definition has been proposed to capture this physicists' claim. The problem is that, in contrast to the original KLM definition, the resulting definition of randomness is not uniquely determined by the probability measure: for the same probability measure, we can have several different definitions of randomness. In this paper, we show that while it is not possible to define, e.g., a unique set R of random objects, we can define a unique sequence Rn of such sets (unique in some reasonable sense).

Technical Report UTEP-CS-12-33, September 2012
Updated version UTEP-CS-12-33a, December 2012
Decision Making under Interval Uncertainty (and Beyond)

Published in: Peijun Guo and Witold Pedrycz (eds.), Human-Centric Decision-Making Models for Social Sciences, Springer Verlag, 2014, pp. 163-193.

To make a decision, we must find out the user's preference, and help the user select an alternative which is the best -- according to these preferences. Traditional utility-based decision theory is based on a simplifying assumption that for each two alternatives, a user can always meaningfully decide which of them is preferable. In reality, often, when the alternatives are close, the user is often unable to select one of these alternatives. In this chapter, we show how we can extend the utility-based decision theory to such realistic (interval) cases.

Original file in pdf
updated version in pdf

Technical Report UTEP-CS-12-32, August 2012
How to Define Mean, Variance, etc., for Heavy-Tailed Distributions: A Fractal-Motivated Approach

Published in International Journal of Innovative Management, Information & Production (IJIMIP), 2012, Vol. 3, No. 3, pp. 1-9.

In many practical situations, we encounter heavy-tailed distributions for which the variance -- and even sometimes the mean -- are infinite. We propose a fractal-motivated approach that enables us to gauge the mean and variance of such distributions.

Technical Report UTEP-CS-12-31, August 2012
Possible and Necessary Orders, Equivalences, etc.: From Modal Logic to Modal Mathematics
Francisco Zapata and Olga Kosheleva

Published in Journal of Uncertain Systems, 2013, Vol. 7, No. 3, pp. 208-218.

In practice, we are often interested in order relations (e.g., when we describe preferences) or equivalence relations (e.g., when we describe clustering). Often, we do not have a complete information about the corresponding relation; as a result, we have several relations consistent with our knowledge. In such situations, it is desirable to know which elements a and b are possibly connected by the relation and which are necessarily connected by this relation. In this paper, we provide a full description of all such possible and necessary orders and equivalence relations. For example, possible orders are exactly reflexive relations, while necessary orders are exactly order relations.

Technical Report UTEP-CS-12-30, August 2012
Kansei Engineering: Towards Optimal Set of Designs
Van Nam Huynh, L. Octavio Lerma, and Vladik Kreinovich

Published in International Journal of Innovative Management, Information & Production (IJIMIP), 2012, Vol. 3, No. 3, pp. 49-53.

In many engineering situations, we need to take into account subjective user preferences; taking such preference into account is known as {\em Kansei Engineering}. In this paper, we formulate the problem of selecting optimal set of designs in Kansei engineering as a mathematical optimization problem, and we provide an explicit solution to this optimization problem.

Technical Report UTEP-CS-12-29, August 2012
In Applications, A Rigorous Proof Is Not Enough: It Is Also Important to Have an Intuitive Understanding

Published in Applied Mathematical Sciences, 2012, Vol. 6, No. 125, pp. 6215-6219.

From a purely mathematical viewpoint, once a statement is rigorously proven, it should be accepted as true. Surprisingly, in applications, users are often reluctant to accept a rigorously proven statement until the proof is supplemented by its intuitive explanation. In this paper, we show that this seemingly unreasonable reluctance makes perfect sense: the proven statement is about the mathematical model which is an approximation to the actual system; an intuitive explanation provides some confidence that the statement holds not only for the model, but also for systems approximately equal to this model -- in particular, for the actual system of interest.

Technical Report UTEP-CS-12-28, August 2012
In Quantum Physics, Free Will Leads to Nonconservation of Energy

Published in Journal of Uncertain Systems, 2013, Vol. 7, No. 3, pp. 176-178.

Modern physical theories are deterministic in the sense that once we know the current state of the world, we can, in principle, predict all the future states. This was true for classical (pre-quantum) theories, this is true for modern quantum physics. On the other hand, we all know that we can make decision that change the state of the world -- even if, for most of us, a little bit. This intuitive idea of free will permeates all our life, all our activities -- and it seems to contradict the determinism of modern physics. It is therefore desirable to incorporate the idea of free will into physical theories. In this paper, we show that in quantum physics, free will leads to nonconservation of energy. This nonconservation is a microscopic purely quantum effect, but it needs to be taken into account in future free-will quantum theories.

Technical Report UTEP-CS-12-27, August 2012
From Unbiased Numerical Estimates to Unbiased Interval Estimates
Baokun Li, Gang Xiang, Vladik Kreinovich, and Panagis Moschopoulos

Published in Critical Review: A Publication of Society for Mathematics of Uncertainty, 2015, Vol. 9, pp. 1-12.

One of the main objectives of statistics is to estimate the parameters of a probability distribution based on a sample taken from this distribution. Of course, since the sample is finite, the estimate X is, in general, different from the actual value x of the corresponding parameter. What we can require is that the corresponding estimate is unbiased, i.e., that the mean value of the difference X - x is equal to 0: E[X] = x. In some problems, unbiased estimates are not possible. We show that in some such problems, it is possible to have interval unbiased estimates, i.e., interval-valued estimates [L,R] for which x is in [E[L], E[R]]. In some such cases, it is possible to have asymptotically sharp estimates, for which the interval [E[L], E[R]] is the narrowest possible.

Technical Report UTEP-CS-12-26, August 2012
First Revised version UTEP-CS-12-26b, December 2012
Second Revised version UTEP-CS-12-26c, January 2013
Bayesian Approach for Inconsistent Information
M. Stein, M. Beer, and V. Kreinovich

Published in Information Sciences, 2013, Vol. 245, No. 1, pp. 96-111.

In engineering situations, we usually have a large amount of prior knowledge that needs to be taken into account when processing data. Traditionally, the Bayesian approach is used to process data in the presence of prior knowledge. Sometimes, when we apply the traditional Bayesian techniques to engineering data, we get inconsistencies between the data and prior knowledge. These inconsistencies are usually caused by the fact that in the traditional approach, we assume that we know the {\it exact} sample values, that the prior distribution is {\it exactly} known, etc. In reality, the data is imprecise due to measurement errors, the prior knowledge is only approximately known, etc. So, a natural way to deal with the seemingly inconsistent information is to take this imprecision into account in the Bayesian approach -- e.g., by using fuzzy techniques. In this paper, we describe several possible scenarios for fuzzifying the Bayesian approach. Particular attention is paid to the interaction between estimated imprecise parameters.

In this paper, to implement the corresponding fuzzy versions of the Bayesian formulas, we use straightforward computations of the related expression -- which makes our computations reasonably time-consuming. Computations in the traditional (non-fuzzy) Bayesian approach are much faster -- because they use algorithmically efficient reformulations of the Bayesian formulas. We expect that similar reformulations of the fuzzy Bayesian formulas will also drastically decrease the computation time and thus, enhance the practical use of the proposed methods.

Original file in pdf
first revised version in pdf
second revised version in pdf

Technical Report UTEP-CS-12-25, July 2012
Updated version UTEP-CS-12-25a, September 2012
Why Clayton and Gumbel Copulas: A Symmetry-Based Explanation
Vladik Kreinovich, Hung T. Nguyen, and Songsak Sriboonchitta

Published in: Van-Nam Huynh, Vladik Kreinovich, Songsak Sriboonchitta, and Komsan Suriya (eds.), Uncertainty Analysis in Econometrics, with Applications, Springer Verlag, Berlin, Heidelberg, 2013, pp. 79-90.

In econometrics, many distributions are non-Gaussian. To describe dependence between non-Gaussian variables, it is usually not sufficient to provide their correlation: it is desirable to also know the corresponding copula. There are many different families of copulas; which family shall we use? In many econometric applications, two families of copulas have been most efficient: the Clayton and the Gumbel copulas. In this paper, we provide a theoretical explanation for this empirical efficiency, by showing that these copulas naturally follow from reasonable symmetry assumptions. This symmetry justification also allows us to provide recommendations about which families of copulas we should use when we need a more accurate description of dependence.

Original file in pdf
Updated version in pdf

Technical Report UTEP-CS-12-24, July 2012
Updated version UTEP-CS-12-24a, October 2012
Final version UTEP-CS-12-24c, April 2013
Towards a Better Understanding of Space-Time Causality: Kolmogorov Complexity and Causality as a Matter of Degree

Published in Proceedings of the Joint World Congress of the International Fuzzy Systems Association and Annual Conference of the North American Fuzzy Information Processing Society IFSA/NAFIPS'2013, Edmonton, Canada, June 24-28, 2013, pp. 1349-1353.

Space-time causality is one of the fundamental notions of modern physics; however, it is difficult to define in observational physical terms. Intuitively, the fact that a space-time event e=(t,x) can causally influence an event e'=(t',x') means that what we do in the vicinity of e changes what we observe at e'. If we had two copies of the Universe, we could perform some action at e in one copy but not in another copy; if we then observe the difference at e', this would be an indication of causality. However, we only observe one Universe, in which we either perform the action or we do not. At first glance, it may seem that in this case, there is no meaningful way to provide an operational definition of causality. In this paper, we show that such a definition is possible if we use the notions of algorithmic randomness and Kolmogorov complexity. The resulting definition leads to a somewhat unexpected consequence: that space-time causality is a matter of degree.

Original file in pdf
Updated version in pdf
Final version in pdf

Technical Report UTEP-CS-12-23, July 2012
Decision Making under Interval and Fuzzy Uncertainty: Towards an Operational Approach
Rafik Aliev, Oleg H. Huseynov, and Vladik Kreinovich

Published in Proceedings of the Tenth International Conference on Application of Fuzzy Systems and Soft Computing ICAFS'2012, Lisbon, Portugal, August 29-30, 2012.

Traditional decision theory is based on a simplifying assumption that for each two alternatives, a user can always meaningfully decide which of them is preferable. In reality, often, when the alternatives are close, the user is either completely unable to select one of these alternatives, or selects one of the alternatives only "to some extent". How can we extend the traditional decision theory to such realistic interval and fuzzy cases? In their previous papers, the first two authors proposed a natural generalization of the usual decision theory axioms to interval and fuzzy cases, and described decision coming from this generalization. In this paper, we make the resulting decisions more intuitive by providing commonsense operational explanation.

Technical Report UTEP-CS-12-22, July 2012
Membership Functions or alpha-Cuts? Algorithmic (Constructivist) Analysis Justifies an Interval Approach

Short version published in Proceedings of the International Conference "Mathematical Logics and Applications", Yerevan, Armenia, November 1-3, 2012, pp. 70-71; full paper published in Applied Mathematical Science, 2013, Vol. 7, No. 5, pp. 217-228.

In his pioneering papers, Igor Zaslavsky started an algorithmic (constructivist) analysis of fuzzy logic. In this paper, we extend this analysis to fuzzy mathematics and fuzzy data processing. Specifically, we show that the two mathematically equivalent representations of a fuzzy number -- by a membership function and by alpha-cuts -- are not algorithmically equivalent, and only the alpha-cut representation enables us to efficiently process fuzzy data.

Technical Report UTEP-CS-12-21, June 2012
An Evaluation Approach for Interactions between Abstract Workflows and Provenance Traces
Leonardo Salayandia, Ann Q. Gates, and Paulo Pinheiro

In the context of science, abstract workflows bridge the gap between scientists and technologists towards using computer systems to carry out scientific processes. Provenance traces provide evidence required to validate scientific products and support their secondary use. Assuming abstract workflows and provenance traces are based on formal semantics, a knowledge-based system that consistently merges both technologies allows scientists to document their processes of data collection and transformation; it also allows for secondary users of data to assess scientific processes and resulting data products. This paper presents an evaluation approach for interactions between abstract workflows and provenance traces. The claim is that both technologies should complement each other and align consistently to a scientist's perspective to effectively support science. The evaluation approach uses criteria that are derived from tasks performed by scientists using both technologies.

Technical Report UTEP-CS-12-20, May 2012
Updated version UTEP-CS-12-20a, May 2012
How to Define Average Class Size (and Deviations from the Average Class Size) in a Way Which Is Most Adequate for Teaching Effectiveness

Published in Proceedings of the Workshop on Informatics and Information Technologies in Education: Theory, Applications, Didactics, Novosibirsk, Russia, September 26-29, 2012, Vol. 1, pp. 113-120.

When students select a university, one of the important parameters is the average class size. This average is usually estimated as an arithmetic average of all the class sizes. However, it has been recently shown that to more adequately describe students' perception of a class size, it makes more sense to average not over classes, but over all students -- which leads to a different characteristics of the average class size. In this paper, we analyze which characteristic is most adequate from the viewpoint of efficient learning. Somewhat surprisingly, it turns out that the arithmetic average is the most adequate way to describe the average student's gain due to a smaller class size. However, if we want to describe the effect of deviations from the average class size on the teaching effectiveness, then, instead of the standard deviation of the class size, a more complex characteristic is most appropriate.

Original file in pdf
Updated version in pdf

Technical Report UTEP-CS-12-19, May 2012
Semi-Heuristic Target-Based Fuzzy Decision Procedures: Towards a New Interval Justification
Christian Servin, Van-Nam Huyhn, and Yoshiteru Nakamori

Published in the Proceedings of the 2012 Annual Conference of the North American Fuzzy Information Processing Society NAFIPS'2012, Berkeley, California, August 6-8, 2012.

To more adequately describe human decision making, V.-N. Nuynh, Y. Nakamori, and others proposed a special semi-heuristic target-based fuzzy decision procedure. A usual justification for this procedure is based on the selection of the simplest possible membership functions and "and"- and "or"-operations; if we use more complex membership functions and "and"- and "or"-operations, we get different results. Interestingly, in practical applications, the procedure based on the simplest choices most adequately describes human preferences. It is therefore desirable to come up with a justification that explains this empirical fact. Such a justification is proposed in this paper.

Technical Report UTEP-CS-12-18, May 2012
Updated version UTEP-CS-12-18a, July 2012
How to Divide Students into Groups so as to Optimize Learning: Towards a Solution to a Pedagogy-Related Optimization Problem

Published in Proceedings of the IEEE International Conference on Systems, Man, and Cybernetics IEEE SMC'2012, Seoul, Korea, October 14-17, 2012, pp. 1948-1953.

To enhance learning, it is desirable to also let students learn from each other, e.g., by working in groups. It is known that such groupwork can improve learning, but the effect strongly depends on how we divide students into groups. In this paper, based on a first approximation model of student interaction, we describe how to optimally divide students into groups so as to optimize the resulting learning. We hope that, by taking into account other aspects of student interaction, it will be possible to transform our solution into truly optimal practical recommendations.

Original file in pdf
updated version in pdf

Technical Report UTEP-CS-12-17, May 2012
Estimating Correlation under Interval and Fuzzy Uncertainty: Case of Hierarchical Estimation
Ali Jalal-Kamali

Published in Proceedings of the Annual Conference of the North American Fuzzy Information Processing Society NAFIPS'2012, Berkeley, California, August 6-8, 2012.

In many situations, we are interested in finding the correlation ρ between different quantities x and y based on the values xi and yi of these quantities measured in different situations i. The correlation is easy to compute when we know the exact sample values xi and yi. In practice, the sample values come from measurements or from expert estimates; in both cases, the values are not exact. Sometimes, we know the probabilities of different values of measurement errors, but in many cases, we only know the upper bounds Δxi and Δyi on the corresponding measurement errors. In such situations, after we get the measurement results Xi and Yi, the only information that we have about the actual (unknown) values xi and yi is that they belong to the corresponding intervals [Xi − Δxi, Xi + Δxi] and [Yi − Δyi, Yi + Δyi]. For expert estimates, we get different intervals corresponding to different degrees of certainty -- i.e., fuzzy sets. Different values of xi and yi lead, in general, to different values of the correlation ρ. It is therefore desirable to find the range of possible values of the correlation ρ when xi and yi take values from the corresponding intervals. In general, the problem of computing this range is NP-hard. In this paper, we provide a feasible (= polynomial-time) algorithm for computing at least one of the endpoints of this interval: for computing the upper endpoint when this endpoint is positive and for computing the lower endpoint when this endpoint is negative.

Technical Report UTEP-CS-12-16, April 2012
Simplicity Is Worse Than Theft: A Constraint-Based Explanation of a Seemingly Counter-Intuitive Russian Saying
Martine Ceberio, Olga Kosheleva, and Vladik Kreinovich

Preliminary version published in Proceedings of the Fifth International Workshop on Constraint Programming and Decision Making CoProD'12, Novosibirsk, Russia, September 23, 2012; final version published in: Martine Ceberio and Vladik Kreinovich (eds.), Constraint Programming and Decision Making, Springer Verlag, Berlin, Heidelberg, 2014, pp. 9-14.

In many practical situations, simplified models, models that enable us to gauge the quality of different decisions reasonably well, lead to far-from-optimal situations when used in searching for an optimal decision. There is even an appropriate Russian saying: simplicity is worse than theft. In this paper, we provide a mathematical explanation of this phenomenon.

Technical Report UTEP-CS-12-15, April 2012
Updated version UTEP-CS-12-15a, August 2013
Algorithmics of Checking Whether a Mapping Is Injective, Surjective, and/or Bijective
E. Cabral Balreira, Olga Kosheleva, and Vladik Kreinovich

Preliminary version published in Proceedings of the Fifth International Workshop on Constraint Programming and Decision Making CoProD'12, Novosibirsk, Russia, September 23, 2012; final version published in: Martine Ceberio and Vladik Kreinovich (eds.), Constraint Programming and Decision Making, Springer Verlag, Berlin, Heidelberg, 2014, pp. 1-8.

In many situations, we would like to check whether an algorithmically given mapping f:A --> B is injective, surjective, and/or bijective. These properties have a practical meaning: injectivity means that the events of the action f can be, in principle, reversed, while surjectivity means that every state b from the set B can appear as a result of the corresponding action. In this paper, we discuss when algorithms are possible for checking these properties.

Original file in pdf
Updated version in pdf

Technical Report UTEP-CS-12-14, April 2012
Kinematic Spaces and de Vries Algebras: Towards Possible Physical Meaning of de Vries Algebras
Olga Kosheleva and Francisco Zapata

Published in Mathematical Structures and Modeling, 2012, Vol. 25, pp. 91-99.

Traditionally, in physics, space-times are described by (pseudo-)Riemann spaces, i.e., by smooth manifolds with a tensor metric field. However, in several physically interesting situations smoothness is violated: near the Big Bang, at the black holes, and on the microlevel, when we take into account quantum effects. In all these situations, what remains is causality -- an ordering relation. To describe such situations, in the 1960s, geometers H. Busemann and R. Pimenov and physicists E. Kronheimer and R. Penrose developed a theory of kinematic spaces. Originally, kinematic spaces were formulated as topological ordered spaces, but it turned out that kinematic spaces allow an equivalent purely algebraic description as sets with two related orders: causality and "kinematic" causality (possibility to influence by particles with non-zero mass, particles that travel with speed smaller than the speed of light). In this paper, we analyze the relation between kinematic spaces and de Vries algebras -- another mathematical object with two similarly related orders.

Submitted file in pdf
Published version in pdf

Technical Report UTEP-CS-12-13, April 2012
Extending Java for Android Programming
Yoonsik Cheon

Android is one of the most popular platforms for developing mobile applications. However, its framework relies on programming conventions and styles to implement framework-specific concepts like activities and intents, causing problems such as reliability, readability, understandability, and maintainability. We propose to extend Java to support Android framework concepts explicitly as built-in language features. Our extension called Android Java will allow Android programmers to express these concepts in a more reliable, natural, and succinct way.

Technical Report UTEP-CS-12-12, April 2012
Image and Model Fusion: Unexpected Counterintuitive Behavior of Traditional Statistical Techniques and Resulting Need for Expert Knowledge
Omar Ochoa, Aaron Velasco, and Vladik Kreinovich

In many real-life situations, we have different types of data. For example, in geosciences, we have seismic data, gravity data, magnetic data, etc. Ideally, we should jointly process all this data, but often, such a joint processing is not yet practically possible. In such situations, it is desirable to "fuse" models (images) corresponding to different types of data: e.g., to fuse an image corresponding to seismic data and an image corresponding to gravity data. At first glance, if we assume that all the approximation errors are independent and normally distributed, then we get a reasonably standard statistical problem which can be solved by the traditional statistical techniques such as the Maximum Likelihood method. Surprisingly, it turns out that for this seemingly simple and natural problem, the traditional Maximum Likelihood approach leads to non-physical results. To make the fusion results physically meaningful, it is therefore necessary to take into account expert knowledge.

Technical Report UTEP-CS-12-11, April 2012
Modal Intervals as a New Logical Interpretation of the Usual Lattice Order Between Interval Truth Values
Francisco Zapata

Published in the Proceedings of the Annual Conference of the North American Fuzzy Information Processing Society NAFIPS'2012, Berkeley, California, August 6-8, 2012.

In the traditional fuzzy logic, we use numbers from the interval [0,1] to describe possible expert's degrees of belief in different statements. Comparing the resulting numbers is straightforward: if our degree of belief in a statement A is larger than our degree of belief in a statement B, this means that we have more confidence in the statement $A$ than in the statement B. It is known that to get a more adequate description of the expert's degree of belief, it is better to use not only numbers $a$ from the interval [0,1], but also subintervals [a1,a2] of this interval. There are several different ways to compare intervals. For example, we can say that [a1,a2] <= [b1,b2] if every number from the interval [a1,a2] is smaller than or equal to every number from the interval [b1,b2]. However, in interval-valued fuzzy logic, a more frequently used ordering relation between interval truth values is the relation [a1,a2] <= [b1,b2] if and only a1 <= b1 & a2 <= b2. This relation makes mathematical sense -- it make the set of all such interval truth values a lattice -- but, in contrast to the above relation, it does not have a clear logical interpretation. Since our objective is to describe logic, it is desirable to have a reasonable logical interpretation of this lattice relation. In this paper, we use the notion of modal intervals to provide such a logical interpretation.

Technical Report UTEP-CS-12-10, April 2012
Updated version UTEP-CS-12-10a, July 2012

Published in Soft Computing, 2013, Vol. 17, No. 8, pp. 1319-1327.

In many practical situations, we do not have enough observations to uniquely determine the corresponding probability distribution, we only have enough observations to estimate two parameters of this distribution. In such cases, the traditional statistical approach is to estimate the mean and the standard deviation. Alternatively, we can estimate the two bounds that form the range of the corresponding variable and thus, generate an interval. Which of these two approaches should we select? A natural idea is to select the most informative approach, i.e., an approach in which we need the smallest amount of additional information (in Shannon's sense) to obtain the full information about the situation. In this paper, we follow this idea and come up with the following conclusion: in practical situations in which a 95% confidence level is sufficient, interval bounds are more informative; however, in situations in which we need higher confidence, the moments approach is more informative.

Original file in pdf
Updated version in pdf

Technical Report UTEP-CS-12-09, April 2012
Updated version UTEP-CS-12-09a, July 2012
Final version UTEP-12-09b, August 2012
Orders on Intervals Over Partially Ordered Sets: Extending Allen's Algebra and Interval Graph Results
Francisco Zapata, Vladik Kreinovich, Cliff Joslyn, and Emilie Hogan

Published in Soft Computing, 2013, Vol. 17, No. 8, pp. 1379-1391.

To make a decision, we need to compare the values of quantities. In many practical situations, we know the values with interval uncertainty. In such situations, we need to compare intervals. Allen's algebra describes all possible relations between intervals on the real line which are generated by the ordering of endpoints; ordering relations between such intervals have also been well studied. In this paper, we extend this description to intervals in an arbitrary partially ordered set (poset). In particular, we explicitly describe ordering relations between intervals that generalize relation between points. As auxiliary results, we provide a logical interpretation of the relation between intervals, and extend the results about interval graphs to intervals over posets.

Original file in pdf
Updated version in pdf
Final version in pdf

Technical Report UTEP-CS-12-08, April 2012
Research-related projects for graduate students as a tool to motivate graduate students in classes outside their direct interest areas

Published in: Arunkumar Pennathur, Vivek Tandon, and Louis Everett (eds.), Proceedings of the 2012 Annual Gulf Southwest Conference of the American Society of Engineering Education ASEE-GSW'2012 "Bridging Theory and Practice in Engineering and Technology Education", El Paso, Texas, April 4-6, 2012, pp. 1-9.

In most graduate programs, students are required to take both "depth" classes -- classes in the areas of the student's direct interest -- and "breadth" classes, classes outside their direct interest areas. Naturally, the student's interest in "breadth" classes is often naturally lower than their interest in the "depth" classes. To enhance the students' interest in the "breadth" classes, a natural idea is to make research-related project an important part of the class, a project in which the student can apply the skills that he or she learns in the class to the research area of direct interest to this student. In this paper, we describe results of using this idea in Theory of Computation classes.

Technical Report UTEP-CS-12-07, February 2012
Do Constraints Facilitate or Inhibit Creative Problem Solving: A Theoretical Explanation of Two Seemingly Contradictory Experimental Studies
Karen Villaverde, Olga Kosheleva, and Martine Ceberio

Published in Proceedings of the 2012 Annual Conference of the North American Fuzzy Information Processing Society NAFIPS'2012, Berkeley, California, August 6--8, 2012.

Do constraints facilitate or inhibit creative problem solving? Recently, two experimental studies appeared, one showing that removing constraints may enhance creativity, another showing that adding constraints can facilitate creative problem solving. In this paper, we provide a theoretical explanation of these two seemingly contradictory experimental results.

Technical Report UTEP-CS-12-06, February 2012
Validated Templates for Specification of Complex LTL Formulas
Salamah Salamah, Ann Gates, and Vladik Kreinovich

Published in Journal of Systems and Software, 2012, Vol. 85, pp. 1915-1929.

Formal verification approaches that check software correctness against formal specifications have been shown to improve program dependability. Tools such as Specification Pattern System (SPS) and Property Specification (Prospec) support the generation of formal specifications. SPS has defined a set of patterns (common recurring properties) and scopes (system states over which a pattern must hold) that allows a user to generate formal specifications by using direct substitution of propositions into parameters of selected patterns and scopes. Prospec extended SPS to support the definition of patterns and scopes that include the ability to specify parameters with multiple propositions (referred to as composite propositions or CPs), allowing the specification of sequential and concurrent behavior. Prospec generates formal specifications in Future Interval Logic (FIL) using direct substitution of CPs into pattern and scope parameters. While substitution works trivially for FIL, it does not work for Linear Temporal Logic (LTL), a highly expressive language that supports specification of software properties such as safety and liveness. LTL is important because of its use in the model checker Spin, the ACM 2001 system Software Award winning tool, and NuSMV. This paper introduces abstract LTL templates to support automated generation of LTL formulas for complex properties in Prospec. In addition, it presents formal proofs and testing to demonstrate that the templates indeed generate the intended LTL formulas.

Technical Report UTEP-CS-12-05, February 2012
Updated version UTEP-CS-12-05a, May 2012
Locating Local Extrema under Interval Uncertainty: Multi-D Case

Published in Proceedings of the 2012 Annual Conference of the North American Fuzzy Information Processing Society NAFIPS'2012, Berkeley, California, August 6-8, 2012.

In many practical situations, we need to locate local maxima and/or local minima of a function which is only know with interval uncertainty. For example, in radioastronomy, components of a radiosource are usually identified by locations at which the observed brightness reaches a local maximum. In clustering, different clusters are usually identified with local maxima of the probability density function (describing the relative frequency of different combinations of values). In the 1-D case, a feasible (polynomial-time) algorithm is known for locating local extrema under interval (and fuzzy) uncertainty. In this paper, we extend this result to the general multi-dimensional case.

Original file in pdf
updated version in pdf

Technical Report UTEP-CS-12-04, January 2012
Updated version UTEP-CS-12-04a, April 2012
Optimizing Computer Representation and Computer Processing of Epistemic Uncertainty for Risk-Informed Decision Making: Finances etc.
Vladik Kreinovich, Nitaya Buntao, and Olga Kosheleva

Published in Proceedings of the International Conference on Probabilistic Safety Assessment and Management / European Safety and Reliability PSAM11/ESREL'12, Helsinki, Finland, June 25-29, 2012.

Uncertainty is usually gauged by using standard statistical characteristics: mean, variance, correlation, etc. Then, we use the known values of these characteristics (or the known bounds on these values) to select a decision. Sometimes, it becomes clear that the selected characteristics do not always describe a situation well; then other known (or new) characteristics are proposed. A good example is description of volatility in finance: it started with variance, and now many descriptions are competing, all with their own advantages and limitations.

In such situations, a natural idea is to come up with characteristics tailored to specific application areas: e.g., select the characteristic that maximize the expected utility of the resulting risk-informed decision making.

With the new characteristics, comes the need to estimate them when the sample values are only known with interval uncertainty. Algorithms originally developed for estimating traditional characteristics can often be modified to cover new characteristics.

Original file in pdf
updated file in pdf

Technical Report UTEP-CS-12-03, January 2012
A Framework to Create Ontologies for Scientific Data Management
Leonardo Salayandia, Paulo Pinheiro, and Ann Q. Gates

Scientists often build and use highly customized systems to support observation and analysis efforts. Creating effective ontologies to manage and share data products created from those systems is a difficult task that requires collaboration among domain experts, e.g., scientists and knowledge representation experts. A framework is presented that scientists can use to create ontologies that describe how customized systems capture and transform data into products that support scientific findings. The framework establishes an abstraction that leverages knowledge representation expertise to describe data transformation processes in a consistent way that highlights properties relevant to data users. The intention is to create effective ontologies for scientific data management by focusing on scientist-driven descriptions. The framework consists of an upper-level ontology specified with description logic and supported with a graphical language with minimal constructs that facilitates use by scientists. Evaluation of the framework's usefulness for scientists is presented.

Technical Report UTEP-CS-12-02, January 2012
Estimating Statistical Characteristics of Lognormal and Delta-Lognormal Distributions under Interval Uncertainty: Algorithms and Computational Complexity
Nitaya Buntao, Sa-aat Niwitpong, and Vladik Kreinovich

Traditional statistical estimates S(x1, ..., xn) for different statistical characteristics S (such as mean, variance, etc.) implicitly assume that we know the sample values x1, ..., xn exactly. In practice, the sample values Xi come from measurements and are, therefore, in general, different from the actual (unknown) values xi of the corresponding quantities. Sometimes, we know the probabilities of different values of the measurement error di = Xi - xi, but often, the only information that we have about the measurement error is the upper bound Di on its absolute value -- provided by the manufacturer of the corresponding measuring instrument. In this case, the only information that we have about the actual values xi is that they belong to the intervals [xi] = [Xi - Di, Xi + Di].

In general, different values xi from the intervals [xi] lead to different values of the corresponding estimate S(xi, ..., xi). In this case, it is desirable to find the range of all possible values of this characteristic.

In this paper, we consider the problem of computing the corresponding range for the cases of lognormal and delta-lognormal distributions. Interestingly, it turns out that, in contrast to the case of normal distribution for which it is feasible to compute the range of the mean, for lognormal and delta-lognormal distributions, computing the range of the mean is an NP-hard problem.

Technical Report UTEP-CS-12-01, January 2012
Revised version UTEP-CS-12-01a, April 2012
How to Describe and Propagate Uncertainty When Processing Time Series: Metrological and Computational Challenges, with Potential Applications to Environmental Studies
Christian Servin, Martine Ceberio, Aline Jaimes, Craig Tweedie, and Vladik Kreinovich

Published in: Shyi-Ming Chen and and Witold Pedrycz (eds.), Time Series Analysis, Modeling and Applications: A Computational Intelligence Perspective, Springer Verlag, 2013, pp. 279-299.

Time series comes from measurements, and measurements are never absolutely accurate. Traditionally, when we deal with an individual measurement or with a sample of measurement results, we subdivide a measurement error into random and systematic components: systematic error does not change from measurement to measurement which random errors corresponding to different measurements are independent. In time series, when we measure the same quantity at different times, we can also have correlation between measurement errors corresponding to nearby moments of time. To capture this correlation, environmental science researchers proposed to consider the third type of measurement errors: periodic. This extended classification of measurement error may seem ad hoc at first glance, but it leads to a good description of the actual errors. In this paper, we provide a theoretical explanation for this semi-empirical classification, and we show how to efficiently propagate all types of uncertainty via computations.

Original file in pdf
Updated version in pdf