Computer Science Department

Abstracts of 2007 Reports

Technical Report UTEP-CS-07-60, December 2007

Reasons why Mobile Telephone Conversations may be Annoying: Considerations and Pilot Studies

Nigel G. Ward, Anais G. Rivera, and Alejandro Vega

Supplement to a paper to appear in *21st International Symposium on
Human Factors in Telecommunication*, Kuala Lumpur, Malaysia, March
17-20, 2008.

Mobile telephone conversations in public places are often annoying to bystanders. Previous work has focused on the psychological and social causes for this, but has not examined the possible role of properties of the communication channel. In our paper "Do Bystanders and Dialog Participants Differ in Preferences for Telecommunications Channels?" (21st International Symposium on Human Factors in Telecommunication, 2008) we consider the possibility that a reason for the annoyance could be that bystander preferences differ from talker preferences, but conclude that this is in fact unlikely to be a major factor. This technical report provides supplemental information, specifically a broader view of the likely causes of annoyance and more details on the ten pilot studies and the data collected.

Technical Report UTEP-CS-07-59, November 2007

How to Estimate, Take Into Account, and Improve Travel Time Reliability in Transportation Networks

Ruey L. Cheu, Vladik Kreinovich, Francois Modave, Gang Xiang, Tao Li, and Tanja Magoc

Published in: Rafi L. Muhanna and Robert L. Mullen (eds.),
*Proceedings of the International Workshop on Reliable
Engineering Computing REC'08*,
Savannah, Georgia, February 20-22, 2008, pp. 289-332.

Many urban areas suffer from traffic congestion. Intuitively, it may seem that a road expansion (e.g., the opening of a new road) should always improve the traffic conditions. However, in reality, a new road can actually worsen traffic congestion. It is therefore extremely important that before we start a road expansion project, we first predict the effect of this project on traffic congestion.

Traditional approach to this prediction is based on the assumption
that for any time of the day, we know the exact amount of traffic
that needs to go from each *o*rigin city zone A to every other
*d*estination city zone B (these values form an
*OD-matrix*), and that we know the exact capacity of each road
segment. Under this assumption, known efficient algorithms produce
the equilibrium traffic flows.

In reality, the road capacity may unpredictably change due to weather conditions, accidents, etc. Drivers take this uncertainty into account when planning their trips: e.g., if a driver does not want to be late, he or she may follow a slower route but with a guaranteed arrival time instead of a (on average) faster but unpredictable one. We must therefore take this uncertainty into account in traffic simulations. In this paper, we describe algorithms that take this uncertainty into account.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-07-58, November 2007

Statistical Hypothesis Testing Under Interval Uncertainty: An Overview

Vladik Kreinovich, Hung T. Nguyen, and Sa-aat Niwitpong

Published in *International Journal of Intelligent
Technologies and Applied Statistics*, 2008, Vol. 1, No. 1,
pp. 1-32.

An important part of statistical data analysis is hypothesis testing. For example, we know the probability distribution of the characteristics corresponding to a certain disease, we have the values of the characteristics describing a patient, and we must make a conclusion whether this patient has this disease. Traditional hypothesis testing techniques are based on the assumption that we know the exact values of the characteristic(s) x describing a patient. In practice, the value X comes from measurements and is, thus, only known with uncertainty: X =/= x. In many practical situations, we only know the upper bound D on the (absolute value of the) measurement error dx = X - x. In such situation, after the measurement, the only information that we have about the (unknown) value x of this characteristic is that x belongs to the interval [X - D, X + D].

In this paper, we overview different approaches on how to test a hypothesis under such interval uncertainty. This overview is based on a general approach to decision making under interval uncertainty, approach developed by the 2007 Nobelist L. Hurwicz.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-07-57, November 2007

A Fitness Function to Find Feasible Sequences of Method Calls for Evolutionary Testing of Object-Oriented Programs

Myoung Yee Kim and Yoonsik Cheon

In evolutionary testing of an object-oriented program,
the search objective is to find a sequence of method calls that can
successfully produce a test object of an interesting state.
This is challenging because not all call sequences are feasible;
each call of a sequence has to meet the assumption of the called
method.
The effectiveness of an evolutionary testing thus depends in part
on the quality of the so-called *fitness function*
that determines the degree of the fitness of a candidate solution.
In this paper, we propose a new fitness function based on assertions
such as method preconditions to find feasible sequences of method calls.
We show through experiments that to obtain the best search result
the fitness function should consider the structures of method call
sequences, which are essentially trees of assertions. We also
provide a framework for combining multiple fitness values and
analyzing different fitness functions.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-07-56, November 2007

Propagation and Provenance of Probabilistic and Interval Uncertainty in Cyberinfrastructure-Related Data Processing and Data Fusion

Paulo Pinheiro da Silva, Aaron Velasco, Martine Ceberio, Christian Servin, Matthew G. Averill, Nicholas Del Rio, Luc Longpre, and Vladik Kreinovich

Published in Rafi L. Muhanna and Robert L. Mullen (eds.),
*Proceedings of the International Workshop on Reliable
Engineering Computing REC'08*,
Savannah, Georgia, February 20-22, 2008, pp. 199-234.

In the past, communications were much slower than computations. As a result, researchers and practitioners collected different data into huge databases located at a single location such as NASA and US Geological Survey. At present, communications are so much faster that it is possible to keep different databases at different locations, and automatically select, transform, and collect relevant data when necessary. The corresponding cyberinfrastructure is actively used in many applications. It drastically enhances scientists' ability to discover, reuse and combine a large number of resources, e.g., data and services.

Because of this importance, it is desirable to be able to gauge the
the uncertainty of the results obtained by using
cyberinfrastructure. This problem is made more urgent by the fact
that the level of uncertainty associated with cyberinfrastructure
resources can vary greatly -- and that scientists have much less
control over the quality of different resources than in the
centralized database. Thus, with the cyberinfrastructure promise
comes the need to analyze how data uncertainty *propagates* via
this cyberinfrastructure.

When the resulting accuracy is too low, it is desirable to produce
the *provenance* of this inaccuracy: to find out which data
points contributed most to it, and how an improved accuracy of these
data points will improve the accuracy of the result. In this paper,
we describe algorithms for propagating uncertainty and for finding
the provenance for this uncertainty.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-07-55, October 2007

Updated version UTEP-CS-07-55a, April 2008

Impact of Checkpoint Latency on the Optimal Checkpoint Interval and Execution Time

Sarala Arunagiri, Seetharami Seelam, Ron A. Oldfield, Maria Ruiz Varela, Patricia J. Teller, and Rolf Riesen

The massive scale of current and next-generation massively parallel processing (MPP) systems presents significant challenges related to fault tolerance. In particular, the standard approach to fault tolerance, application-directed checkpointing, puts an incredible strain on the storage system and the interconnection network. This results in overheads on the appliation that severely impact performance and scalability. The checkpoint overhead can be reduced by decreasing the checkpoint latency, which is the time to write a checkpoint file, or by increasing the checkpoint interval, which is the compute time between writing checkpoint files. However, increasing the checkpoint interval may increase execution time in the presence of failures. The relationship among the mean time to interruption (MTTI), the checkpoint parameters, and the expected application execution time can be explored using a model, e.g., the model developed by researchers at Los Alamos National Laboratory (LANL). Such models may be used to calculate the optimal periodic checkpoint interval. In this paper, we use the LANL model of checkpointing and thorough mathematical analysis we show the impact of a change in the checkpoint latency on the optimal checkpoint interval and the overall execution time of the application.

For checkpoint latencies, d1 and d2, and the corresponding optimal checkpoint intervals, t1 and t2, our analysis shows the following results: (1) For a given MTTI, if d1 is greater than d2, t1 is greater than or equal to t2. (2) When the checkpoint interval is fixed, a decrease in checkpoint latency results in a decrease in application execution time. (3) A reduction in checkpoint latency, from d1 to d2, and a corresponding change of the checkpoint interval from the optimal checkpoint interval associated with d1, t1, to that associated with d2, t2, translates to reduced application execution time when the difference between t1 and t2 exceeds a certain threshold value, which can be as large as 12% of t_opt.

In terms of application execution times, the approximation error of the optimal checkpoint interval is not significant. However, when we consider other performance metrics of the application, such as network bandwidth consumption and I/O bandwidth consumption, we conjecture that the information obtained by the analysis presented in this report could be of value in reducing resource consumption.

Original file in pdf

Updated version in pdf

Technical Report UTEP-CS-07-54, October 2007

Estimating Quality of Support Vector Machines Learning Under Probabilistic and Interval Uncertainty: Algorithms and Computational Complexity

Canh Hao Nguyen, Tu Bao Ho, and Vladik Kreinovich

Published in: Van-Nam Huynh, Yoshiteru Nakamori, Hiroakira Ono, Jonathan Lawry,
Vladik Kreinovich, and Hung T. Nguyen (eds.), *Interval/Probabilistic
Uncertainty and Non-Classical Logics*, Springer-Verlag,
Berlin-Heidelberg-New York, 2008, pp. 57-69.

Support Vector Machines (SVM) is one of the most widely used technique in machines leaning. After the SVM algorithms process the data and produce some classification, it is desirable to learn how well this classification fits the data. There exist several measures of fit, among them the most widely used is kernel target alignment. These measures, however, assume that the data are known exactly. In reality, whether the data points come from measurements or from expert estimates, they are only known with uncertainty. As a result, even if we know that the classification perfectly fits the nominal data, this same classification can be a bad fit for the actual values (which are somewhat different from the nominal ones). In this paper, we show how to take this uncertainty into account when estimating the quality of the resulting classification.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-07-53, October 2007

Updated version UTEP-CS-07-53b, December 2007

Interval Computations and Interval-Related Statistical Techniques: Tools for Estimating Uncertainty of the Results of Data Processing and Indirect Measurements

Vladik Kreinovich

Published in Franco Pavese and Alistair B. Forbes (eds), *Data Modeling for
Metrology Testing and Testing in Measurement Science,*, Birkhauser-Springer,
Boston, 2009, pp. 117-145.

In many practical situations, we only know the upper bound D on the (absolute value of the) measurement error d, i.e., we only know that the measurement error is located on the interval [-D,D]. The traditional engineering approach to such situations is to assume that d is uniformly distributed on [-D,D], and to use the corresponding statistical techniques. In some situations, however, this approach underestimates the error of indirect measurements. It is therefore desirable to directly process this interval uncertainty. Such "interval computations" methods have been developed since the 1950s. In this chapter, we provide a brief overview of related algorithms, results, and remaining open problems.

Original file in pdf and
in Compressed Postscript

Updated version in pdf and
in Compressed Postscript

Technical Report UTEP-CS-07-52, October 2007

Updated version UTEP-CS-07-52a, February 2008

Final version UTEP-CS-07-52b, May 2008

Application-Motivated Combinations of Fuzzy, Interval, and Probability Approaches, and Their Use in Geoinformatics, Bioinformatics, and Engineering

Vladik Kreinovich

Short version published in *Proceedings of the International Conference on
Information Technology InTech'07*, Sydney, Australia, December
12-14, 2007, pp. 11-20. Full version published in *International
Journal of Automation and Control (IJAAC)*, 2008, Vol. 2, No. 2/3,
pp. 317-339.

Most data processing techniques traditionally used in scientific and engineering practice are statistical. These techniques are based on the assumption that we know the probability distributions of measurement errors etc. In practice, often, we do not know the distributions, we only know the bound D on the measurement accuracy - hence, after the get the measurement result X, the only information that we have about the actual (unknown) value x of the measured quantity is that x belongs to the interval [X - D, X + D]. Techniques for data processing under such interval uncertainty are called interval computations; these techniques have been developed since 1950s. Many algorithms have been designed to deal with interval uncertainty.

In many practical problems, we have a combination of different types of uncertainty, where we know the probability distribution for some quantities, intervals for other quantities, and expert information for yet other quantities. It is therefore desirable to extend interval techniques to the situations when, in addition to intervals, we also have a partial probabilistic and/or expert information. We provide an overview of related algorithms, results, and remaining open problems, and we emphasize the following three application areas: computer engineering, bioinformatics, and geoinformatics.

Original file in pdf and
in Compressed Postscript

Updated file in pdf and
in Compressed Postscript

Final version in pdf and
in Compressed Postscript

Technical Report UTEP-CS-07-51, September 2007

Revised version UTEP-CS-07-51a, October 2007

How to Avoid Gerrymandering: A New Algorithmic Solution

Gregory B. Lush, Esteban Gamez, and Vladik Kreinovich

Published in *Proceedings of the International Conference on
Information Technology InTech'07*, Sydney, Australia, December
12-14, 2007, pp. 146-151.

Subdividing an area into voting districts is often a very controversial issue. If we divide purely geographically, then minority groups may not be properly represented. If we start changing the borders of the districts to accommodate different population groups, we may end up with very artificial borders -- borders which are often to set up in such a way as to give an unfair advantage to incumbents. In this paper, we describe redistricting as a precise optimization problem, and we propose a new algorithm for solving this problem.

Original file in pdf and
in Compressed Postscript;

updated file in pdf and
in Compressed Postscript

Technical Report UTEP-CS-07-50, September 2007

WDO-It! A Tool for Building Scientific Workflows from Ontologies

Paulo Pinheiro da Silva, Leonardo Salayandia, and Ann Gates

One of the factors that limits scientists from fully adopting e-Science technologies and infrastructure to advance their work is the technical knowledge needed to specify and execute scientific workflows. In this paper we introduce WDO-It!, a scientist-centered tool that facilitates the scientist’s task of encoding discipline knowledge in the form of workflow-driven ontologies (WDOs) and presenting process knowledge in the form of model-based workflows (MBWs). The goal of WDO-It! is to facilitate the adoption of e-Science technologies and infrastructures by allowing scientist to encode their discipline knowledge and process knowledge with minimal assistance from technologists. MBWs have demonstrated potential to facilitate knowledge understanding and transfer of key scientific processes for a wide range of people, including students. An experimental version of WDO-It! has been developed as a proof of concept and scientists are using the tool to develop experimental WDOs and MBWs for several scientific disciplines.

Technical Report UTEP-CS-07-49, September 2007

Updated version UTEP-CS-07-49a, October 2007

Fast Algorithms for Computing Statistics under Interval Uncertainty: An Overview

Vladik Kreinovich and Gang Xiang

In: Van-Nam
Huynh, Yoshiteru Nakamori, Hiroakira Ono, Jonathan Lawry, Vladik
Kreinovich, and Hung T. Nguyen (eds.), *Interval/Probabilistic
Uncertainty and Non-Classical Logics*, Springer-Verlag,
Berlin-Heidelberg-New York, 2008, pp. 19-31.

In many areas of science and engineering, it is desirable to estimate statistical characteristics (mean, variance, covariance, etc.) under interval uncertainty. For example, we may want to use the measured values x(t) of a pollution level in a lake at different moments of time to estimate the average pollution level; however, we do not know the exact values x(t) -- e.g., if one of the measurement results is 0, this simply means that the actual (unknown) value of x(t) can be anywhere between 0 and the detection limit DL. We must therefore modify the existing statistical algorithms to process such interval data.

Such a modification is also necessary to process data from statistical databases, where, in order to maintain privacy, we only keep interval ranges instead of the actual numeric data (e.g., a salary range instead of the actual salary).

Most resulting computational problems are NP-hard -- which means, crudely speaking, that in general, no computationally efficient algorithm can solve all particular cases of the corresponding problem. In this paper, we overview practical situations in which computationally efficient algorithms exist: e.g., situations when measurements are very accurate, or when all the measurements are done with one (or few) instruments.

Original file in pdf and in Compressed Postscript

updated file in pdf and in Compressed Postscript

Technical Report UTEP-CS-07-48, September 2007

Technical Report UTEP-CS-07-48a, October 2007

Trade-Off Between Sample Size and Accuracy: Case of Dynamic Measurements under Interval Uncertainty

Hung T. Nguyen, Olga Kosheleva, Vladik Kreinovich, and Scott Ferson

In: Van-Nam Huynh, Yoshiteru
Nakamori, Hiroakira Ono, Jonathan Lawry, Vladik Kreinovich, and Hung T.
Nguyen (eds.), *Interval/Probabilistic Uncertainty and
Non-Classical Logics*, Springer-Verlag, Berlin-Heidelberg-New York,
2008, pp. 45-56.

In many practical situations, we are not satisfied with the accuracy of the existing measurements. There are two possible ways to improve the measurement accuracy:

- first, instead of a
*single*measurement, we can make*repeated*measurements; the additional information coming from these additional measurements can improve the accuracy of the result of this series of measurements; - second, we can replace the
*current*measuring instrument with a*more accurate*one; correspondingly, we can use a more accurate (and more expensive) measurement procedure provided by a measuring lab -- e.g., a procedure that includes the use of a higher quality reagent.

Original file in pdf and
in Compressed Postscript

Updated version in pdf and
in Compressed Postscript

Technical Report UTEP-CS-07-47c, June 2008

Trade-Off Between Sample Size and Accuracy: Case of Measurements under Interval Uncertainty

Hung T. Nguyen, Olga Kosheleva, Vladik Kreinovich, and Scott Ferson

Published in *International Journal of Approximate Reasoning*, 2009,
Vol. 50, No. 8, pp. 1164-1176.

In many practical situations, we are not satisfied with the accuracy of the existing measurements. There are two possible ways to improve the measurement accuracy:

- first, instead of a
*single*measurement, we can make*repeated*measurements; the additional information coming from these additional measurements can improve the accuracy of the result of this series of measurements; - second, we can replace the
*current*measuring instrument with a*more accurate*one; correspondingly, we can use a more accurate (and more expensive) measurement procedure provided by a measuring lab -- e.g., a procedure that includes the use of a higher quality reagent.

Technical Report UTEP-CS-07-47, September 2007

Updated version UTEP-CS-07-47a, October 2007

Trade-Off Between Sample Size and Accuracy: Case of Static Measurements under Interval Uncertainty

Hung T. Nguyen and Vladik Kreinovich

In: Van-Nam Huynh, Yoshiteru Nakamori, Hiroakira Ono, Jonathan Lawry,
Vladik Kreinovich, and Hung T. Nguyen (eds.), *Interval/Probabilistic
Uncertainty and Non-Classical Logics*, Springer-Verlag,
Berlin-Heidelberg-New York, 2008, pp. 32-44.

In many practical situations, we are not satisfied with the accuracy of the existing measurements. There are two possible ways to improve the measurement accuracy:

- first, instead of a
*single*measurement, we can make*repeated*measurements; the additional information coming from these additional measurements can improve the accuracy of the result of this series of measurements; - second, we can replace the
*current*measuring instrument with a*more accurate*one; correspondingly, we can use a more accurate (and more expensive) measurement procedure provided by a measuring lab -- e.g., a procedure that includes the use of a higher quality reagent.

Original file in pdf and
in Compressed Postscript

Updated version in pdf and
in Compressed Postscript

Technical Report UTEP-CS-07-46, August 2007

Revised version UTEP-07-46a, October 2007

Aggregation in Biological Systems: Computational Aspects

Vladik Kreinovich and Max Shpak

Published in: Tomasz G. Smolinski,
Mariofanna M. Milanova, and Aboul-Ella Hassanien (eds.),
*Applications of Computational Intelligence in Biology: Current
Trends and Open Problems*, Springer Verlag, Berlin-Heidelberg,
2008, pp. 281-305.

Many biologically relevant dynamical systems are aggregable, in the sense that one can divide their (micro) variables x1,...,xn into several (k) non-intersecting groups and find functions y1,...,yk (k < n) from these groups (macrovariables) whose dynamics only depend on the initial state of the macrovariable. For example, the state of a population genetic system can be described by listing the frequencies xi of different genotypes, so that the corresponding dynamical system describe the effects of mutation, recombination, and natural selection. The goal of aggregation approaches in population genetics is to find macrovariables y1,...,yk to which aggregated mutation, recombination, and selection functions could be applied. Population genetic models are formally equivalent to genetic algorithms, and are therefore of wide interest in the computational sciences.

Another example of a multi-variable biological system of interest arises in ecology. Ecosystems contain many interacting species, and because of the complexity of multi-variable nonlinear systems, it would be of value to derive a formal description that reduces the number of variables to some macrostates that are weighted sums of the densities of individual species.

In this chapter, we explore different computational aspects of aggregability for linear and non-linear systems. Specifically, we investigate the problem of conditional aggregability (i.e., aggregability restricted to modular states) and aggregation of variables in biologically relevant quadratic dynamical systems.

Original file in pdf and
in Compressed Postscript

updated file in pdf and
in Compressed Postscript

Technical Report UTEP-CS-07-45, July 2007

Traffic Assignment for Risk Averse Drivers in a Stochastic Network

Ruey Long Cheu, Vladik Kreinovich, and Srinivasa R. Manduva

Published in the *Proceedings of the Annual Meeting of the
Transportation Research
Board of the National Academies TRB'07*, Washington, DC,
January 13-17, 2008.

Most traffic assignment tasks in practice are performed by using deterministic network (DN) models, which assume that the link travel time is uniquely determined by a link performance function. In reality, link travel time, at a given link volume, is a random variable. Such stochastic network (SN) models are not widely used because the traffic assignment algorithms are much more computationally complex and difficult to understand by practitioners. In this paper, we derive an equivalent link disutility (ELD) function, for the case of risk averse drivers in a SN, without assuming any distribution of link travel time. We further derive a simpler form of the ELD function in a SN which can be easily implemented in deterministic user equilibrium traffic assignment algorithms like a DN. By comparing our two derived ELD functions, the bound of the coefficient of the simpler ELD functions is obtained, so that drivers will make the same risk averse route choice decisions. A method to estimate the coefficient of the simpler ELD function has been proposed and demonstrated with questionnaire survey data gathered in El Paso, Texas. The results of user equilibrium traffic assignments in a test network using the Bureau of Public Roads (BPR) function and the simpler ELD function are then compared. Our simpler ELD function provides a mean for practitioners to use deterministic user equilibrium traffic assignment algorithms to solve the traffic assignment problem in a SN for risk averse drivers during the peak hour commute.

Technical Report UTEP-CS-07-44, July 2007

The Gravity Data Ontology: Laying the Foundation for Workflow-Driven Ontologies

Ann Q. Gates, G. Randy Keller, Flor Salcedo, Paulo Pinheiro da Silva, and Leonardo Salayandia

A workflow-driven ontology is an ontology that encodes discipline-specific knowledge in the form of concepts and relationships and that facilitates the composition of services to create products and derive data. Early work on the development of such an ontology resulted in the construction of a gravity data ontology and the categorization of concepts: "Data," "Method," and "Product." "Data" is further categorized as "Raw Data" and "Derived Data," e.g., reduced data. The relationships that are defined capture inputs to and outputs from methods, e.g., derived data and products are output from methods, as well as other associations that are related to workflow computation. This paper describes the construction of a workflow-driven ontology that documents the methods and processes associated with gravity data and related products. In addition, the paper describes the progress done on the process to create workflow-driven ontologies, such that scientists are supported to create and validate such ontologies, while still enabling the automatic generation of executable workflow applications.

Technical Report UTEP-CS-07-43, July 2007

Updated version UTEP-CS-07-43a, January 2008

Identifying and Explaining Map Quality Through Provenance: A User Study

Nicholas Del Rio and Paulo Pinheiro da Silva

Applications deployed on cyber-infrastructures often rely on multiple data sources and distributed compute resources to access, process, and derive results. When application results are maps, it is possible that non-intentional imperfections can get introduced into the map generation processes because of several reasons including the use of low quality datasets, use of data filtering techniques incompatible for the kind of map to be generated, or even the use of inappropriate mapping parameters, e.g., low-resolution gridding parameters. Without some means for accessing and visualizing the provenance associated with map generation processes, i.e., metadata about information sources and methods used to derive the map, it may be impossible for most scientists to discern whether or not a map is of a required quality. Probe-It! is a tool that provides provenance visualization for results from cyber-infrastructure-based applications including maps. In this paper, we describe a quantitative user study on how Probe-It! can help scientists discriminate between high and low quality contour maps. The study had the participation of twenty active scientists from five domains with different levels of expertise with regards to gravity data and GIS. The study demonstrates that only a very small percentage of the scientists can identify imperfections using maps without the help of knowledge provenance. The study also demonstrates that most scientists, whether GIS experts, subject matter experts (i.e., experts on gravity data maps) or not, can identify and explain several kinds of map imperfections when using provenance to inspect maps.

Original file in pdf

updated version in pdf

Technical Report UTEP-CS-07-42, June 2007

Updated version UTEP-CS-07-42b, August 2008

Towards a Combination of Interval and Ellipsoid Uncertainty

Vladik Kreinovich, Arnold Neumaier, and Gang Xiang

Published in: *Computational Technologies*, 2008, Vol. 13, No. 6, pp.
5-16.

In many real-life situations, we do not know the probability distribution of measurement errors but only upper bounds on these errors. In such situations, once we know the measurement results, we can only conclude that the actual (unknown) values of a quantity belongs to some interval. Based on this interval uncertainty, we want to find the range of possible values of a desired function of the uncertain quantities. In general, computing this range is an NP-hard problem, but in a linear approximation, valid for small uncertainties, there is a linear time algorithm for computing the range. In other situations, we know an ellipsoid that contains the actual values. In this case, we also have a linear time algorithm for computing the range of a linear function.

In some cases, however, we have a combination of interval and ellipsoid uncertainty. In this case, the actual values belong to the intersection of a box and an ellipsoid. In general, estimating the range over the intersection enables us to get a narrower range for the function. In this paper, we provide two algorithms for estimating the range of a linear function over an intersection in linear time: a simpler O(n log(n)) algorithm and a (somewhat more complex) linear time algorithm. Both algorithms can be extended to the l^p-case, when instead of an ellipsoid we have a set defined by a p-norm.

Updated version in pdf and in
Compressed Postscript

Technical Report UTEP-CS-07-41, June 2007

Abstraction in Assertion-Based Test Oracles

Yoonsik Cheon

Assertions can be used as test oracles. However, writing effective assertions of right abstraction levels is difficult because on the one hand, detailed assertions are preferred for through testing (i.e., to detect as many errors as possible), but on the other hand abstract assertions are preferred for readability, maintainability, and reusability. As assertions become a practical tool for testing and debugging programs, this is an important and practical problem to solve for the effective use of assertions. We advocate the use of model variables---specification-only variables of which abstract values are given as mappings from concrete program states---to write abstract assertions for test oracles. We performed a mutation testing experiment to evaluate the effectiveness of the use of model variables in assertion-based test oracles. According to our experiment, assertions written in terms of model variables are as effective as assertions written without using model variables in detecting (injected) faults, and the execution time overhead of model variables are negligible. Our findings are applicable to other use of runtime checkable assertions.

File in PDF and in Compressed Postscript

Technical Report UTEP-CS-07-40, June 2007

A Quick Tutorial on JET

Yoonsik Cheon

JET is an automated unit testing tool for Java classes annotated with JML specifications; JML is a formal interface specification language for Java to document the behavior of Java classes and interfaces. JET tests each method of the class under test separately. For each method, it generates a collection of test data, executes them, and decides test results (i.e., pass/fail) by using JML specifications as test oracles, thereby fully automating unit testing of Java classes. This document gives a quick tutorial introduction to JET.

File in PDF and in Compressed Postscript

Technical Report UTEP-CS-07-39, June 2007

Updated version UTEP-CS-07-39a, August 2007

Towards a More Physically Adequate Definition of Randomness: A Topological Approach

Vladik Kreinovich

Published in *Journal of Uncertain Systems*, 2007, Vol. 1, No. 4,
pp. 256-266.

Kolmogorov-Martin-Lof definition describes a random sequence as a sequence which satisfies all the laws of probability. This notion formalizes the intuitive physical idea that if an event has a probability 0, then this event cannot occur. Physicists, however, also believe that if an event has a very small probability, then it cannot occur. In our previous papers, we proposed a modification of the Kolmogorov-Martin-Lof definition which formalizes this idea as well. It turns out that our original definition is too general: e.g., it includes some clearly non-physical situations when the set of all random elements is a one-point set. In this paper, we propose a new definition which avoids such situations and is, thus, a more physically adequate description of randomness.

Original file in pdf and
in Compressed Postscript

updated version in pdf and
in Compressed Postscript

Technical Report UTEP-CS-07-38, June 2007

Any (True) Statement Can Be Generalized So That It Becomes Trivial: A Simple Formalization of D. K. Faddeev's Belief

Vladik Kreinovich

Published in *Applied Mathematical Sciences*, 2009, Vol. 3, No. 47, pp. 2343-2347.

In his unpublished lectures on general algebra, a well-known algebraist D. K. Faddeev expressed a belief that every true mathematical statement can be generalized in such a way that it becomes trivial. To the best of our knowledge, this belief has never been formalized before. In this short paper, we provide a simple formalization (and proof) of this belief.

Technical Report UTEP-CS-07-37, June 2007

Computing at Least One of Two Roots of a Polynomial Is, in General, Not Algorithmic

Vladik Kreinovich

In our previous work, we provided a theoretical explanation for an empirical fact that it is easier to find a unique root than the multiple roots. In this short note, we strengthen that explanation by showing that finding one of many roots is also difficult.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-07-36, June 2007

Evaluation Of HF RFID for Implanted Medical Applications

Eric Freudenthal, David Herrera, Frederick Kautz, Carlos Natividad, Alexandria Ogrey, Justin Sipla, Abimael Sosa, Carlos Betancourt, and Leonardo Estevez

Low cost HF RFID scanner subsystems that both deliver power and provide high bandwidth bidirectional communication channels have recently become available. These devices are anticipated to become ubiquitous in next-generation cell phones and enable a wide range of emerging e-commerce applications.

This paper considers the use of HF RFID to power and communicate with implantable medical devices. We successfully communicated with ten transponders that were implanted at three locations within a human cadaver. In this paper, we present measurements collected from four of these transponders that represent a wide range of transponder sizes. We also describe how RFID for medical implantation requires significantly different privacy protections than previously investigated medical uses of RFID.

Our experiments, which measure communication range, detect a strong sensitivity to the thickness of the insulator separating a transponder's antenna from nearby tissue. This thickness can be tuned to achieve communication range similar to non-implanted transponders. This sensitivity can potentially be exploited to construct specialized implantable pressure sensors useful for a variety of applications.

Technical Report UTEP-CS-07-35, June 2007

From (Idealized) Exact Causality-Preserving Transformations to Practically Useful Approximately-Preserving Ones: A General Approach

Vladik Kreinovich and Olga Kosheleva

Published in *International Journal of Theoretical Physics*, 2008, Vol. 27,
No. 4, pp. 1083-1091.

It is known that every causality-preserving transformation of
Minkowski space-time is a composition of
Lorentz transformations, shifts, rotations, and dilations. In
principle, this result means that by only knowing the causality relation,
we can determine the coordinate and metric
structure on the space-time. However,
strictly speaking, the theorem only says that this reconstruction is
possible if we know the *exact* causality relation. In practice,
measurements are never 100% accurate. It is therefore desirable to
prove that if a transformation *approximately* preserves
causality, then it is approximately equal to an above-described
composition.

Such a result was indeed proven, but only for a very particular case of approximate preservation.

In this paper, we prove that simple compactness-related ideas can lead to a transformation of the exact causality-preserving result into an approximately-preserving one.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-07-34, May 2007

Revised version UTEP-CS-07-34a, December 2007

Computational Complexity of Determining Which Statements about Causality Hold in Different Space-Time Models

Vladik Kreinovich and Olga Kosheleva

Published in *Theoretical Computer Science*, 2008, Vol. 405, No. 1-2,
pp. 50-63.

Causality is one of the most fundamental notions of physics. It is therefore important to be able to decide which statements about causality are correct in different models of space-time. In this paper, we analyze the computational complexity of the corresponding deciding problems. In particular, we show that: for Minkowski space-time, the deciding problem is as difficult as the Tarski's decision problem for elementary geometry, while for a natural model of primordial space-time, the corresponding decision problem is of the lowest possible complexity among all possible spacetime models.

Original file in pdf and
in Compressed Postscript;

updated version in pdf and
in Compressed Postscript

Technical Report UTEP-CS-07-33, May 2007

When Are Two Wave Functions Distinguishable: A New Answer to Pauli's Question, with Potential Applications to Quantum Cosmology

Luc Longpre and Vladik Kreinovich

Published in *International Journal on Theoretical Physics*, 2008, Vol. 47,
No. 3, pp. 814-831.

Traditional quantum mechanics (QM) predicts probabilities of different events. If we describe an elementary particle, then, experimentally, these probabilities mean that if we repeat the same measurement procedure with multiple particles in the same state, the resulting sequence of measurement results will be random w.r.t. the corresponding probability measure. In quantum cosmology, QM is used to describe the world as a whole; we have only one copy of the world, so multiple measurements are impossible. How to interpret these probabilities?

In this paper, we use the approach of the algorithmic information theory to come up with a reasonable interpretation. This interpretation is in good accordance with the arguments presented by several physicists (such as D. Finkelstein) that a wave function is not always a physically reasonable description of a quantum state.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-07-32, May 2007

Interval Computations as an Important Part of Granular Computing: An Introduction

Vladik Kreinovich

Published in: W. Pedrycz, A. Skowron, and V. Kreinovich (eds),
*Handbook on Granular Computing*, Wiley, Chichester, UK,
2008, pp. 3-31.

This chapter provides a general introduction to interval computations, especially to interval computations as an important part of granular computing. This introduction is aimed at researchers who would like to learn more - and eventually to use - the main ideas and techniques of granular computing.

We explain how intervals naturally appear in data processing, which techniques exist for processing intervals, and how these techniques have been historically developed.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-07-31, May 2007

Updated version UTEP-CS-07-31a, August 2007

Towards Efficient Prediction of Decisions under Interval Uncertainty

Van Nam Huynh, Vladik Kreinovich, Yoshiteru Nakamori, and Hung T. Nguyen

Published in: Roman Wyrzykowski, Jack Gondarra,
Konrad Karczewski, and Jerzy Wasniewski (eds.), *Parallel Processing
and Applied Mathematics*, Proceedings of the Seventh International
Conference on Parallel Processing and Applied Mathematics
PPAM'2007, Gdansk, Poland, September 9-12, 2007, Springer Lecture
Notes in Computer Science, 2008, Vol. 4967, pp. 1372-1381.

In many practical situations, users select between n alternatives a1, ..., an, and the only information that we have about the utilities vi of these alternatives are bounds vi- <= vi <= v-+. In such situations, it is reasonable to assume that the values vi are independent and uniformly distributed on the corresponding intervals [vi-,vi+]. Under this assumption, we would like to estimate, for each i, the probability pi that the alternative ai will be selected. In this paper, we provide efficient algorithms for computing these probabilities.

Original file in pdf and
in Compressed Postscript;

updated file in pdf and
in Compressed Postscript

Technical Report UTEP-CS-07-30, May 2007

Updated version UTEP-CS-07-30a, September 2007

In Some Curved Spaces, One Can Solve NP-Hard Problems in Polynomial Time

Vladik Kreinovich and Maurice Margenstern

Published in *Notes of Mathematical Seminars of St. Petersburg Department of
Steklov Institute of Mathematics*, 2008, Vol. 358, pp. 224-250; reprinted in
*Journal of Mathematical Sciences*, 2009, Vol. 158, No. 5, pp. 727-740.

In the late 1970s and the early 1980s, Yuri Matiyasevich actively used his knowledge of engineering and physical phenomena to come up with parallelized schemes for solving NP-hard problems in polynomial time. In this paper, we describe one such scheme in which we use parallel computation in curved spaces.

Original file in pdf and
in Compressed Postscript;

updated version in pdf and
in Compressed Postscript

Technical Report UTEP-CS-07-29, May 2007

Updated version UTEP-CS-07-29b, July 2007

Final version UTEP-CS-07-27c, August 2007

Static Space-Times Naturally Lead to Quasi-Pseudometrics

Hans-Peter A. Kuenzi and Vladik Kreinovich

Published in *Theoretical Computer Science*, 2008, Vol. 405, No. 1-2,
pp. 64-72.

The standard 4-dimensional Minkowski space-time of special relativity is based on the 3-dimensional Euclidean metric. In 1967, H.~Busemann showed that similar static space-time models can be based on an arbitrary metric space. In this paper, we search for the broadest possible generalization of a metric under which a construction of a static space-time leads to a physically reasonable space-time model. It turns out that this broadest possible generalization is related to the known notion of a quasi-pseudometric.

Original file in pdf and
in Compressed Postscript

Updated version in pdf and
in Compressed Postscript

Final version in pdf and
in Compressed Postscript

Technical Report UTEP-CS-07-28, May 2007

On Probability of Making a Given Decision: A Theoretically Justified Transition from Interval to Fuzzy Uncertainty

Van Nam Huynh, Yoshiteru Nakamori, and Francois Modave

Published in: Marek
Reformat and Michael R. Berthold (eds.),*Proceedings of the 26th
International Conference of the North American Fuzzy Information Processing Society
NAFIPS'2007*, San Diego, California, June 24-27, 2007,
pp. 532-536.

In practice, it is often necessary to make a decision under uncertainty.

In the case of interval uncertainty, for each alternative i, instead of the exact value vi of the objective function, we only have an interval of possible values. In this case, it is reasonable to assume that each value vi is uniformly distributed on the corresponding interval, and to take the probability that vi is the largest as the probability of selecting the i-th alternative.

In some practical situations, we have fuzzy uncertainty, i.e., for every alternative i, we have a fuzzy number describing the value of the objective function. Then, for every degree alpha, we have an interval, the alpha-cut of the corresponding fuzzy number. For each alpha, we can assume the uniform distributions on the corresponding alpha-cuts and get a probability Pi(alpha) that i will be selected for this alpha. From the practical viewpoint, it is desirable to combine these probabilities into a single probability corresponding to fuzzy uncertainty.

In deriving the appropriate combination, we use the fact that fuzzy values are not uniquely defined, different procedures can lead to differently scaled values. It turns out that the only scaling-invariant distribution on the set of all the degrees alpha is a uniform distribution. So, we justify the choice of an integral of Pi(alpha) over alpha as the probability that under fuzzy uncertainty, an alternative i will be selected.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-07-27, May 2007

Towards Optimal Scheduling for Global Computing under Probabilistic, Interval, and Fuzzy Uncertainty, with Potential Applications to Bioinformatics

Roberto Araiza, Michela Taufer, and Ming-Ying Leung

Published in: Marek
Reformat and Michael R. Berthold (eds.), *Proceedings of the 26th
International Conference of the North American Fuzzy Information
Processing Society
NAFIPS'2007*, San Diego, California, June 24-27, 2007,
pp. 520-525.

In many practical situations, in particular in many bioinformatics problems, the amount of required computations is so huge that the only way to perform these computations in reasonable time is to distribute them between multiple processors. The more processors we engage, the faster the resulting computations; thus, in addition to processor exclusively dedicated to this job, systems often use idle time on other processors. The use of these otherwise engaged processors adds additional uncertainty to computations.

How should we schedule the computational tasks so as to achieve the best utilization of the computational resources? Because of the presence of uncertainty, this scheduling problem is very difficult not only to solve but even to formalize (i.e., to describe in precise terms). In this paper, we provide the first steps towards formalizing and solving this scheduling problem.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-07-26, May 2007

Computing Statistical Characteristics When We Know Probabilities with Interval or Fuzzy Uncertainty: Computational Complexity

Gang Xiang and Jim W. Hall

Published in: Marek
Reformat and Michael R. Berthold (eds.), *Proceedings of the 26th
International Conference of the North American Fuzzy Information
Processing Society
NAFIPS'2007*, San Diego, California, June 24-27, 2007,
pp. 576-581.

In traditional statistics, we usually assume that we know the exact probability distributions. In practice, we often only know the probabilities with interval uncertainty.

The main emphasis on taking this uncertainty into account has been on situations in which we know a cumulative distribution function (cdf) with interval uncertainty. However, in some cases, we know the probability density function (pdf) with interval uncertainty. We show that in this situations, the exact range of some statistical characteristics can be efficiently computed. Surprisingly, for some other characteristics, similar statistical problems which are efficiently solvable for interval-valued cdf become computationally difficult (NP-hard) for interval-valued pdf.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-07-25, May 2007

Detecting Duplicates in Geoinformatics: From Intervals and Fuzzy Numbers to General Multi-D Uncertainty

Scott A. Starks, Luc Longpre, Roberto Araiza, Vladik Kreinovich, and Hung T. Nguyen

Published in: Marek
Reformat and Michael R. Berthold (eds.),

Geospatial databases generally consist of measurements related to points (or pixels in the case of raster data), lines, and polygons. In recent years, the size and complexity of these databases have increased significantly and they often contain duplicate records, i.e., two or more close records representing the same measurement result. In this paper, we address the problem of detecting duplicates in a database consisting of point measurements. As a test case, we use a database of measurements of anomalies in the Earth's gravity field that we have compiled.

In our previous papers, we have proposed a new fast (O(n log(n))) duplication deletion algorithm for the case when closeness of two points (x1,y1) and (x2,y2) is described as closeness of both coordinates. In this paper, we extend this algorithm to the case when closeness is described by an arbitrary metric.

Both algorithms have been successfully applied to gravity databases.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-07-24, April 2007

Von Mises Failure Criterion in Mechanics of Materials: How to Efficiently Use It Under Interval and Fuzzy Uncertainty

Gang Xiang, Andrzej Pownuk, Olga Kosheleva, and Scott A. Starks

Published in: Marek
Reformat and Michael R. Berthold (eds.),

One of the main objective of mechanics of materials is to predict when the material experiences fracture (fails), and to prevent this failure. With this objective in mind, it is desirable to use {it ductile} materials, i.e., materials which can sustain large deformations without failure. Von Mises criterion enables us to predict the failure of such ductile materials. To apply this criterion, we need to know the exact stresses applied at different directions. In practice, we only know these stresses with interval or fuzzy uncertainty. In this paper, we describe how we can apply this criterion under such uncertainty, and how to make this application computationally efficient.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-07-23, April 2007

Under Interval and Fuzzy Uncertainty, Symmetric Markov Chains are More Difficult to Predict

Roberto Araiza, Gang Xiang, Olga Kosheleva, and Damjan Skulj

Published in: Marek Reformat and Michael R. Berthold (eds.),

Markov chains are an important tool for solving practical problems. In particular, Markov chains have been successfully applied in bioinformatics. Traditional statistical tools for processing Markov chains assume that we know the exact probabilities p(i,j) of a transition from the state i to the state j. In reality, we often only know these transition probabilities with interval (or fuzzy) uncertainty. We start the paper with a brief reminder of how the Markov chain formulas can be extended to the cases of such interval and fuzzy uncertainty.

In some practical situations, there is another restriction on the Markov chain--that this Markov chain is symmetric in the sense that for every two states i and j, the probability of transitioning from i to j is the same as the probability of transitioning from j to i: p(i,j)=p(j,i). In general, symmetry assumptions simplify computations. In this paper, we show that for Markov chains under interval and fuzzy uncertainty, symmetry has the opposite effect: it makes the computational problems more difficult.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-07-22, April 2007

Throttling I/O Streams to Accelerate File-I/O Performance

Seetharami Seelam, Andre Kerstens, and Patricia Teller

To increase the scale and performance of scientific applications, scientists commonly distribute computation over multiple processors. Often without realizing it, file I/O is parallelized with the computation. An implication of this I/O parallelization is that multiple compute tasks are likely to concurrently access the I/O nodes of an HPC system. When a large number of I/O streams concurrently access an I/O node, I/O performance tends to degrade. In turn, this impacts application execution time.

This paper presents experimental results that show that controlling the number of synchronous file-I/O streams that concurrently access an I/O node can enhance performance. We call this mechanism file-I/O stream throttling. The paper (1) describes this mechanism and demonstrates how it can be applied either at the application or system software layers, and (2) presents results of experiments driven by the cosmology application benchmark MADbench, executed on a variety of computing systems, that demonstrate the effectiveness of file-I/O stream throttling. The results imply that dynamic selection of the number of synchronous file-I/O streams that are allowed to access an I/O node can result in improved application performance. Note that the I/O pattern of MADbench resembles that of a large class of HPC applications.

Technical Report UTEP-CS-07-21, April 2007

Fitting a Normal Distribution to Interval and Fuzzy Data

Gang Xiang, Vladik Kreinovich, and Scott Ferson

Published in: Marek Reformat and Michael R. Berthold (eds.),

In traditional statistical analysis, if we know that the distribution is normal, then the most popular way to estimate its mean a and standard deviation s from the data sample x1,...,xn is to equate a and s to the arithmetic mean and sample standard deviation of this sample. After this equation, we get the cumulative distribution function F(x)=F0((x-a)/s) of the desired distribution.

In many practical situations, we only know intervals [xi] that contain the actual (unknown) values of xi or, more generally, a fuzzy number that describes xi. Different values of xi lead, in general, to different values of F(x). In this paper, we show how to compute, for every x, the resulting interval [F(x)] of possible values of F(x) - or the corresponding fuzzy numbers.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-07-20, April 2007

Towards a General Description of Interval Multiplications: Algebraic Analysis and its Relation to T-Norms

Olga Kosheleva, Guenter Mayer, and Vladik Kreinovich

Published in: Marek
Reformat and Michael R. Berthold (eds.),

It is well known that interval computations are very important, both by themselves (as a method for processing data known with interval uncertainty) and as a way to process fuzzy data. In general, the problem of computing the range of a given function under interval uncertainty is computationally difficult (NP-hard). As a result, there exist different methods for estimating such a range: some methods require a longer computation time and lead to more accurate results, other methods lead to somewhat less accurate results but are much faster than the more accurate techniques. In particular, different methods exist for interval multiplication, i.e., for computing the range of a product of two numbers known with interval uncertainty. To select a method which is the best in a given situation, it is desired to be able to describe all possible methods. In this paper, we provide a description of all possible operations for interval multiplication; this description is based on the same ideas as a known description of t-norms in fuzzy logic.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-07-19, April 2007

Set-Valued Extensions of Fuzzy Logic: Classification Theorems

Gilbert Ornelas and Vladik Kreinovich

Published in: Marek
Reformat and Michael R. Berthold (eds.),

Experts are often not 100% confident in their
statements. In traditional fuzzy logic, the expert's degree of
confidence in each of his or her statements is described by a number
from the interval [0,1]. However, due to similar uncertainty, an
expert often cannot describe his or her degree by a *single*
number. It is therefore reasonable to describe this degree by, e.g.,
a *set* of numbers. In this paper, we show that under reasonable
conditions, the class of such sets coincides either with the class
of all 1-point sets (i.e., with the traditional fuzzy set set of all
numbers), or with the class of all subintervals of the interval
[0,1], or with the class of all closed subsets of the interval
[0,1]. Thus, if we want to go beyond standard fuzzy logic and
still avoid sets of arbitrary complexity, we have to use intervals.
These classification results shows the importance of interval-valued
fuzzy logics.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-07-18, April 2007

Architectural Assertions: Checking Architectural Constraints at Run-Time

Hyotaeg Jung, Carlos E. Rubio-Medrano, Eric Wong, and Yoonsik Cheon

The inability to express architectural concepts and constraints explicitly in implementation code invites the problem of architectural drift and corrosion. We propose runtime checks as a solution to mitigate this problem. The key idea of our approach is to express architectural constraints or properties in an assertion language and use the runtime assertion checker of the assertion language to detect any violations of the constraints. The architectural assertions are written in terms of architectural concepts such as components, connectors, and configurations, and thus they can be easily mapped to or traced back to the original high-level constraints written in an architectural description language. We believe that our approach is effective and more practical than and complements static techniques.

Technical Report UTEP-CS-07-17, March 2007

Updated version UTEP-CS-07-17a, May 2005

Interval Approach to Preserving Privacy in Statistical Databases: Related Challenges and Algorithms of Computational Statistics

Luc Longpre, Gang Xiang, Vladik Kreinovich, and Eric Freudenthal

Published in: V. Gorodetsky, I. Kotenko, and V. A. Skormin (eds.),
*Proceedings of the
International Conference "Mathematical Methods,
Models and Architectures for Computer Networks Security"
MMM-ACNS-07*, St. Petersburg, Russia, September 13-15, 2007,
Springer Lecture Notes in Computer Science, 2007, Vol. CCIS-1,
pp. 346-361.

In many practical situations, it is important to store large amounts of data and to be able to statistically process the data. A large part of the data is confidential, so while we welcome statistical data processing, we do not want to reveal sensitive individual data. If we allow researchers to ask all kinds of statistical queries, this can lead to violation of people's privacy. A sure-proof way to avoid these privacy violations is to store ranges of values (e.g., between 40 and 50 for age) instead of the actual values. This idea solves the privacy problem, but it leads to a computational challenge: traditional statistical algorithms need exact data, but now we only know data with interval uncertainty. In this paper, we describe new algorithms designed for processing such interval data.

Original file in pdf and
in Compressed Postscript

updated file in pdf and
in Compressed Postscript

Technical Report UTEP-CS-07-16, March 2007

Updated version UTEP-CS-07-16a, September 2007

Random Fuzzy Sets

Hung T. Nguyen, Vladik Kreinovich, and Gang Xiang

Published in: Hsiao-Fan Wang (ed.), *Intelligent Data Analysis: Developing
New Methodologies Through Pattern Discovery and Recovery*, IGI Global,
Hershey, Pennsylvania, 2008, pp. 18-44.

It is well known that in decision making under uncertainty, while we are guided by a general (and abstract) theory of probability and of statistical inference, each specific type of observed data requires its own analysis. Thus, while textbook techniques treat precisely observed data in multivariate analysis, there are many open research problems when data are censored (e.g., in medical or bio-statistics), missing, or partially observed (e.g., in bioinformatics). Data can be imprecise due to various reasons, e.g., due to fuzziness of linguistic data. Imprecise observed data are usually called {\it coarse data}. In this chapter, we consider coarse data which are both random and fuzzy.

Fuzziness is a form of imprecision often encountered in perception-based information. In order to develop statistical reference procedures based on such data, we need to model random fuzzy data as bona fide random elements, i.e., we need to place random fuzzy data completely within the rigorous theory of probability. This chapter presents the most general framework for random fuzzy data, namely the framework of random fuzzy sets. We also describe several applications of this framework.

Original file in pdf and
in Compressed Postscript;

updated version in pdf and
in Compressed Postscript;

Technical Report UTEP-CS-07-15c, June 2007

Final version UTEP-CS-07-15d, September 2007

Verification of Automatically Generated Pattern-Based LTL Specifications

Salamah Salamah, Ann Q. Gates, Vladik Kreinovich, and Steve Roach

Published in *Proceedings of
the 10th IEEE High Assurance Systems Engineering Symposium HASE'07*,
Dallas, Texas, November 14-16, 2007, pp. 341-348.

The use of property classifications and patterns, i.e., high-level abstractions that describe common behavior, have been shown to assist practitioners in generating formal specifications that can be used in formal verification techniques. The Specification Pattern System (SPS) provides descriptions of a collection of patterns. The extent of program execution over which a pattern must hold is described by the notion of scope. SPS provides a manual technique for obtaining formal specifications from a pattern and a scope. The Property Specification Tool (Prospec) extends SPS by introducing Composite Propositions (CPs), a classification for defining sequential and concurrent behavior to represent pattern and scope parameters, and provides a tool to support users.

This work provides general templates for generating formal specifications in Linear Temporal Logic (LTL) for all pattern, scope, and CP combinations. In addition, the work explains the methodology for the verification of the correctness of these templates.

Original file in pdf and
in Compressed Postscript

final version in pdf and
in Compressed Postscript

Technical Report UTEP-CS-07-15, March 2007

Updated version UTEP-CS-07-15a, June 2007

Final version UTEP-CS-07-15b, August 2007

Using Patterns and Composite Propositions to Automate the Generation of Complex LTL

Salamah Salamah, Ann Q. Gates, Vladik Kreinovich, and Steve Roach

Published in: K. Namjoshi, T. Yoneda,
T. Higashino, and Y. Okamura (Eds.), *Proceedings of the 5th
International Symposium on Automated Technology for Verification and
Analysis ATVA'2007*, Tokyo, Japan, October 22-25, 2007, Springer
Lecture Notes in Computer Science, 2007, Vol. 4762, pp. 533-542.

Property classifications and patterns, i.e., high-level abstractions that describe common behavior, have been used to assist practitioners in specifying properties. The Specification Pattern System (SPS) provides descriptions of a collection of patterns. Each pattern is associated with a scope that defines the extent of program execution over which a property pattern is considered. Based on a selected pattern, SPS provides a specification for each type of scope in multiple formal languages including Linear Temporal Logic (LTL). The (Prospec) tool extends SPS by introducing the notion of Composite Propositions (CP), which are classifications for defining sequential and concurrent behavior to represent pattern and scope parameters.

In this work, we provide formal definitions of patterns and scopes when defined using CP classes. In addition, we provide general (abstract) LTL formulas that can be used to generate LTL specifications for all combinations of pattern, scope, and CP classes.

Original file in pdf and
in Compressed Postscript

updated version in pdf

final version in pdf and
in Compressed Postscript

Technical Report UTEP-CS-07-14, March 2007

Updated version UTEP-CS-07-14a, April 2007

Generating Linear Temporal Logic Formulas for Pattern-Based Specifications

Salamah Salamah, Vladik Kreinovich, and Ann Q. Gates

Published in the *Proceedings of the Nineteenth International
Conference on Software Engineering and Knowledge Engineering
SEKE'2007*, Boston, Massachusetts, July 9-11, 2007, pp. 422-427.

Software property classifications and patterns, i.e., high-level abstractions that describe program behavior, have been used to assist practitioners in specifying properties. The Specification Pattern System (SPS) provides descriptions of a collection of patterns. Each pattern is associated with a scope that defines the extent of program execution over which a property pattern is considered. Based on a selected pattern, SPS provides a specification for each type of scope in multiple formal languages including Linear Temporal Logic (LTL). The Property Specification tool (Prospec) extends SPS by introducing the notion of Composite Propositions (CP), that classify sequential and concurrent behavior over pattern and scope parameters.

In this work, we present an approach to support the automated generation of Linear Temporal Logic (LTL) formulas for complex pattern-based software specifications that use CPs. We define general LTL formulas for the Response pattern, and provide formal descriptions of the different CP classes. In addition, we formally describe the Response pattern and the different scopes that use CPs.

Original file in pdf and
in Compressed Postscript

updated version in pdf and
in Compressed Postscript

Technical Report UTEP-CS-07-13, March 2007

Updated version UTEP-CS-03-13b, February 2009

On Decision Making Under Interval Uncertainty: A New Justification of Hurwicz Optimism-Pessimism Approach and its Use in Group Decision Making

Van Nam Huynh, Chenyi Hu, Yoshiteru Nakamori, and Vladik Kreinovich

Published in the *Proceedings of the 39th International Symposium on
Multiple-Valued Logic ISMVL'2009*,
Naha, Okinawa, Japan, May 21-23, 2009, pp. 214-220.

If we know the exact consequences of each action, then we can select an action with the largest value of the objective function. In practice, we often only know these values with interval uncertainty. If two intervals intersect, then some people may prefer the alternative corresponding to the first interval, and some prefer the alternative corresponding to the second interval. How can we describe the portion of people who select the first alternative? In this paper, we provide a new theoretical justification for Hurwicz optimism-pessimism approach, and we show how this approach can be used in group decision making.

Original file in pdf and
in Compressed Postscript

updated version in pdf and
in Compressed Postscript

Technical Report UTEP-CS-07-12, March 2007

Three Prosodic Features That Cue Back-Channel Feedback in Northern Mexican Spanish

Anais G. Rivera and Nigel G. Ward

In order to demonstrate attentiveness during a conversation it is generally necessary for the listener to provide back-channel feedback. To some extent, the times when back-channel feedback is welcome are determined by the speaker and conveyed to the listener with prosodic cues. In this study we sought to identify the cues used for this purpose in Northern Mexican Spanish. Based on quantitative analysis of a corpus of unstructured conversations, we found three cues, of which the most common is a pitch downslope followed by a pitch rise accompanied by a rate reduction on the last syllable and a drop in energy leading to a slight pause.

Technical Report UTEP-CS-07-11, March 2007

Random Test Data Generation for Java Classes Annotated with JML Specifications

Yoonsik Cheon and Carlos E. Rubio-Medrano

The hidden states of objects create a barrier to designing and generating test data automatically. For example, the state of an object has to be established indirectly through a sequence of method invocations. For a non-trivial class, however, it is extremely unlikely that a randomly-chosen sequence of method invocations can construct an object successfully, as each invocation has to satisfy the state invariants. Nonetheless, automated random testing can reduce the cost of testing dramatically and has potential for finding errors that are difficult to find in other ways because it eliminates the subjectiveness in constructing test data. We propose a new approach to generating test data automatically for Java classes annotated with JML specifications. The key idea behind our approach is to construct an object incrementally in that each method call in the sequence is checked before the next call is chosen. We use JML's runtime assertion checker to check the validity of a method invocation. Other ingredients of our approach include object pooling and object equivalence checking. These are to increase the probability of constructing feasible call sequences and to remove redundancy among the successfully-built sequences. We have implemented our approach for JET, a fully-automated testing tool for Java, and our experiment with JET showed a promising result, 10 to 200% increase in the number of generated test cases.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-07-10, February 2007

Final version UTEP-CS-07-10a, March 2007

Towards a General Description of Physical Invariance in Category Theory

John Symons, Julio C. Urenda, and Vladik Kreinovich

Published in *Journal of Uncertain Systems*, 2007, Vol. 1, No. 3,
pp. 201-205.

Invariance is one of the most important notions in applications of mathematics. It is one of the key concepts in modern physics, is a computational tool that helps in solving complex equations, etc. In view of its importance, it is desirable to come up with a definition of invariance which is as general as possible. In this paper, we describe how to formulate a general notion of invariance in categorial terms.

updated version in pdf and
in Compressed Postscript

Technical Report UTEP-CS-07-09, February 2007

Final version UTEP-CS-07-09a, March 2007

Russian Peasant Multiplication Algorithm, RSA Cryptosystem, and a New Explanation of Half-Orders Of Magnitude

J. Ivan Vargas and Olga Kosheleva

Published in *Journal of Uncertain Systems*, 2007, Vol. 1, No. 3,
pp. 178-184.

In his papers, J. Hobbs has observed that when people make crude estimates, they usually feel reasonably comfortable choosing between alternatives which differ by a half order of magnitude (HOM). He also provided an explanation for this level of granularity based on the need for the resulting crude estimates to represent both the original data and the result of processing this data. According to this explanation, HOM are optimal -- when we limit ourselves to these first crude estimates.

In many practical situations, we do not stop with the original estimate, we refine it one or more times by using granules of smaller and smaller size. In this paper, we show that the need to optimally process such refined estimates leads to the same HOM granularity. Thus, we provide a new explanation for this level of granularity.

updated version in pdf and
in Compressed Postscript

Technical Report UTEP-CS-07-08, February 2007

Logit Discrete Choice Model: A New Distribution-Free Justification

Ruey L. Cheu, Hung T. Nguyen, Tanja Magoc, and Vladik Kreinovich

Published in *Soft Computing*, 2009, Vol. 13, No. 2, pp. 133-137.

According to decision making theory, if we know the user's utility
Ui=U(si) of all possible alternatives si, then we can
uniquely predict the user's preferences. In practice, we often only
know approximate values Vi~Ui of the user's utilities.
Based on these approximate values, we can only make probabilistic
predictions of the user's preferences. It is empirically known that
in many real-life situations, the corresponding probabilities are
described by a *logit* model, in which the probability pi of
selecting the alternative si is equal to
pi=exp(b*Vi)/(exp(b*V1)+...+exp(b*Vn)).
There exist many theoretical explanations of
this empirical formula, some of these explanations led to a 2001
Nobel prize. However, it is known that the logit formula is
empirically valid even when the assumptions behind the existing
justifications do not hold. To cover such empirical situations, it
is therefore desirable to provide a new distribution-free
justification of the logit formula. Such a justification is provided
in this paper.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-07-07, January 2007

Automated Random Testing to Detect Specification-Code Inconsistencies

Yoonsik Cheon

An interface specification language such as JML provides a means to document precisely the behavior of program modules such as Java classes, and it is being adopted by industry. However, few practical tools exist for programmers to assure the correctness of their interface specifications. Nonetheless, the correctness of an interface specification is a prerequisite for the use of the specification, both as a precise API documentation and as a foundation for formal verification of and reasoning about the implementation. We propose automated random testing as a practical tool to assure the correctness of interface specifications. The key idea of our approach is to fully automate dynamic, random testing to detect as many inconsistencies as possible between the specification and its implementation. For this, we use a runtime assertion checker as a test oracle, and the goal of our testing is to generate as many non-duplicate test cases as possible that incur a certain type of runtime assertion violations. Our approach has been implemented for Java/JML in a prototype tool called JET, and a preliminary experiment shows that it has potential to be a valuable testing tool for Java/JML. Our approach can be adapted for other interface specification languages.

Technical Report UTEP-CS-07-06, January 2007

Building a Seismology Workflow-Driven Ontology: A Case Study

Leo Salayandia and Aaron Velasco

Technical Report UTEP-CS-07-05, January 2007

Baby-Steps Towards Building a Spanglish Language Model

Juan C. Franco and Thamar Solorio

Abstract. Spanglish is the simultaneous use, or alternating of both, traditional Spanish and English within the same conversational event. This interlanguage is commonly used in U.S. populations with large percentages of Spanish speakers. Despite the popularity of this dialect, and the wide spread of automated voice systems, currently there are no spoken dialog applications that can process Spanglish. In this paper we present the first attempt towards creating a Spanglish language model.

Technical Report UTEP-CS-07-04, January 2007

Exponential Disutility Functions in Transportation Problems: A New Theoretical Justification

Ruey L. Cheu and Vladik Kreinovich

In modeling drivers' route choice in stochastic networks, several researchers have successfully used exponential disutility functions. A usual justification for these functions is that they are consistent with common sense and that they lead to simpler computations than several other alternative disutility functions. In principle, such a justification leaves open a possibility that there is some other (yet un-tried) disutility function which is also consistent with common sense and for which the computations are even simpler than for the exponential function. In this paper, we prove that exponential disutility functions are the only ones that are consistent with the (appropriately formalized) common sense and the only ones for which computations can be simplified.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-07-03, January 2007

The Mathematical Analysis of T-Norms is Logically Non-Trivial

Sandor Jenei, Vladik Kreinovich, David Ponevac, and Yaffa Al Bayyari

It is well-known that t-norms are widely applicable in certain
models, which describe human reasoning about uncertainty, and that
for different applications, different t-norms fit better. Thus,
given a practical problem, it is important to be able to find a
t-norm which is the most suitable for that particular problem. To
solve such optimization problems, it would be desirable to know the
structure of the class of all possible t-norms. Toward this --
probably unreachable -- goal there are many interesting open
problems. If the corresponding mathematical problems are expressed
in terms of quantifiers and logical connectives, then we get
formulas which are very similar to formulas about real numbers. A.
Tarski has proved that there is a *deciding* algorithm -- i.e.,
an algorithm that, given a formula for real numbers, decides whether
it is true or not -- for real numbers. So, the natural question is
whether we can extend Tarski's algorithm to a class of mathematical
statements about t-norms? The answer is "no": once we
allow quantifiers over t-norms, no deciding algorithm exists. In
this sense, in general, the analysis of the mathematical properties
of t-norms is logically non-trivial.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-07-02, January 2007

2006 Survey Results for CS1310: Introduction to Programming Using Media Computation

Aida Gandara

In Spring of 2006, the Computer Science Department at the University of Texas at El Paso began an initiative to offer a course named CS 1310 Introduction to Computing, focused on media computation. The course was directed toward Liberal Arts or non-CS students with the main goal of exposing these students to fundamental concepts of computer science using a highly abstract language called Python. The course was structured similar to that of the CS 1315 Introduction to Media Computation course offered at Georgia Institute of Technology, designed by Associate Professor, Mark Guzdial.

The remainder of this document serves to evaluate the survey and coursework of CS 1310’s first two semesters. Each semester, two surveys were collected from the students; one during the first week and the other during the final week. These surveys as well as the corresponding class observations are the basis for the tables, diagrams, and written evaluations that follow.

It is the goal in publishing these results that we begin to assess both the academic and personal impact of offering a course of this type to students of various disciplines at UTEP. Exposing students of all disciplines supports the fact that computing and the need for computer adept professionals is a multi-disciplinary requisite. This need can be addressed by providing courses focused on the fundamentals of computing using techniques that promote interest, challenges and creativity for the students.

File in pdf (caution: it is 3MB) and in compressed (zipped) pdf

Technical Report UTEP-CS-07-01, January 2007

Updated version UTEP-CS-07-01a, February 2007

Use of Deterministic Traffic Assignment Algorithms in Stochastic Networks: Analysis of Equivalent Link Disutility Functions

Ruey L. Cheu and Vladik Kreinovich

At present, in practice, most traffic assignment tasks are performed by using deterministic network (DN) models, which assume that the link travel time is uniquely determined by the link volume and link capacity. In reality, for the same link volume and link capacity, a link may have different travel times. However, the corresponding stochastic network (SN) models are not widely used because they are much more computationally complex than the DN models. In the past research, it was shown that in the important particular case, when the link travel time follows Gamma distribution, the traffic assignment problem for SN can be reformulated in terms of deterministic equivalent link disutility function. Thus in this case the traffic assignment can be solved by the standard Frank-Wolfe algorithm. In this paper, we show that a similar equivalent link disutility function exists in the general case of an arbitrary distribution of link travel time. Therefore, we can use the Frank-Wolfe algorithm in the general SN case, both for the risk averse and risk prone driver route choice behavior. We also provide an explicit expression for this equivalent link disutility function in terms of the link volume and link capacity.

Original file in pdf

updated file in pdf