Computer Science Department

Abstracts of 2006 Reports

Technical Report UTEP-CS-06-50e, June 2009

Updated version UTEP-CS-06-50f, August 2009

From Interval Computations to Constraint-Related Set Computations: Towards Faster Estimation of Statistics and ODEs Under Interval and p-Box Uncertainty

Vladik Kreinovich

Short version published in: Andrej Bauer, Ruth Dillhage, Peter Hertling, Ker-I Ko, and
Robert Rettinger (eds.), *Proceedings of the Sixth International
Conference on Computability and Complexity in Analysis CCA'2009*,
Ljubljana, Slovenia, August 18-22, 2009, pp. 4-15; full version
published in: Alexander K. Belyaev and Robin S. Langley (eds.), Proceedings of
the International Union for Theoretical and Applied Mechanics
(IUTAM) Symposium on The Vibration Analysis of Structures with
Uncertainties, Saint Petersburg, Russia, July 6-10, 2009,
Springer Verlag, Dordrecht, Heidelberg, London, New York, 2011,
pp. 85-98.

Original file in pdf

Updated version in pdf

Technical Report UTEP-CS-06-50, December 2006

1st updated version UTEP-CS-06-50a, January 2007

2nd updated version UTEP-CS-06-50b, February 2008

short version UTEP-CS-06-50d, May 2009

From Interval Computations to Constraint-Related Set Computations: Towards Faster Estimation of Statistics and ODEs Under Interval, P-Box, and Fuzzy Uncertainty

Martine Ceberio, Vladik Kreinovich, Andrzej Pownuk, and Barnabas Bede

Short version published in: Patricia Melin, Oscar Castillo, Luis T. Aguilar, Janusz Kacprzyk, and Witold Pedrycz (eds.), Foundations of Fuzzy Logic and Soft Computing, Proceedings of the World Congress of the International Fuzzy Systems Association IFSA'2007, Cancun, Mexico, June 18-21, 2007, Springer Lecture Notes on Artificial Intelligence, 2007, Vol. 4529, pp. 33-42; full version published in in JinTao Yao (ed.), Novel Developments in Granular Computing: Applications for Advanced Human Reasoning and Soft Computation, IGI Global Publisher, 2010, pp. 131-147.

In interval computations, at each intermediate stage of the computation, we have intervals of possible values of the corresponding quantities. In our previous papers, we proposed an extension of this technique to set computations, where on each stage, in addition to intervals of possible values of the quantities, we also keep sets of possible values of pairs (triples, etc.). In this paper, we show that in several practical problems, such as estimating statistics (variance, correlation, etc.) and solutions to ordinary differential equations (ODEs) with given accuracy, this new formalism enables us to find estimates in feasible (polynomial) time.

Original file in pdf and in Compressed Postscript

1st updated version in pdf and in Compressed Postscript

2nd updated version in pdf and in Compressed Postscript

short version in pdf and in Compressed Postscript

Technical Report UTEP-CS-06-49, December 2006

Estimating Variance under Interval and Fuzzy Uncertainty: Case of Hierarchical Estimation

Gang Xiang and Vladik Kreinovich

Published in: Patricia Melin, Oscar Castillo, Luis T. Aguilar, Janusz Kacprzyk, and Witold Pedrycz (eds.), Foundations of Fuzzy Logic and Soft Computing, Proceedings of the World Congress of the International Fuzzy Systems Association IFSA'2007, Cancun, Mexico, June 18-21, 2007, Springer Lecture Notes on Artificial Intelligence, 2007, Vol. 4529, pp. 3-12.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-06-48, December 2006

Fast Algorithms for Computing Statistics under Interval and Fuzzy Uncertainty, and Their Applications

Gang Xiang and Vladik Kreinovich

Published in *Proceedings of the International
Conference on Fuzzy Mathematics and Its Applications,*
Ahmednagar, Maharashtra, India, January 27-29, 2007.

In many engineering applications, we have to combine probabilistic, interval, and fuzzy uncertainty. For example, in environmental analysis, we observe a pollution level x(t) in a lake at different moments of time t, and we would like to estimate standard statistical characteristics such as mean, variance, autocorrelation, correlation with other measurements. In environmental measurements, we often only measure the values with interval uncertainty. We must therefore modify the existing statistical algorithms to process such interval data.

In this paper, we provide a brief survey of algorithms for computing various statistics under interval (and fuzzy) uncertainty and of their applications, including applications to the seismic inverse problem in geosciences, to chip design in computer engineering, and to radar data processing.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-06-47, December 2006

For Piecewise Smooth Signals, L1 Method is the Best Among Lp: An Interval-Based Justification of an Empirical Fact

Vladik Kreinovich and Arnold Neumaier

Published in *Journal of Computational Technologies*, 2017,
Vol. 22, No. 2, pp. 50-58.

Traditional engineering techniques use the Least Squares method (i.e., in mathematical terms, the l2-norm) to process data. It is known that in many practical situations, lp-methods with p=/=2 lead to better results. In different practical situations, different values of p are optimal. It is known that in several situations when we need to reconstruct a piecewise smooth signal, the empirically optimal value of p is close to 1. In this paper, we provide a new interval-based theoretical explanation for this empirical fact.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-06-46, December 2006

Updated version UTEP-CS-06-46a, April 2007

Adding Constraints to Situations when, in Addition to Intervals, We Also Have Partial Information about Probabilities

Martine Ceberio, Vladik Kreinovich, Gang Xiang, Scott Ferson, and Cliff Joslyn

Published in *IEEE Proceedings of the 12th GAMM-IMACS
International Symposium on Scientific Computing, Computer Arithmetic
and Validated Numerics*, Duisburg, Germany, September 26-29, 2006.

In many practical situations, we need to combine probabilistic and interval uncertainty. For example, we need to compute statistics like population mean E=(x1+...+xn)/n or population variance V=(x1^2+...+xn^2)/n-E^2 in the situations when we only know intervals [xi] of possible values of xi. In this case, it is desirable to compute the range of the corresponding characteristic.

Some range computation problems are NP-hard; for these problems, in general, only an enclosure is possible. For other problems, there are efficient algorithms. In many practical situations, we have additional information that can be used as constraints on possible cumulative distribution functions (cdfs). For example, we may know that the actual (unknown) cdf is Gaussian. In this paper, we show that such constraints enable us to drastically narrow down the resulting ranges -- and sometimes, transform the originally intractable (NP-hard) computational problem of computing the exact range into an efficiently solvable one.

This possibility is illustrated on the simplest example of an NP-problem from interval statistics: the problem of computing the range [V] of the variance V.

We also describe how we can estimate the amount of information under such combined intervals-and-constraints uncertainty.

Original file in pdf and in Compressed Postscript

updated version in pdf and in Compressed Postscript

Technical Report UTEP-CS-06-45, December 2006

Updated version UTEP-CS-06-45a, April 2007

Fern: An Updatable Authenticated Dictionary Suitable for Distributed Caching

Eric Freudenthal, David Herrera, Steve Gutstein, Ryan Spring, and Luc Longpre

Fern is an updatable cryptographically authenticated dictionary developed to propagate identification and authorization information within and among distributed systems. Conventional authenticated dictionaries permit authorization information to be disseminated by untrusted proxies, however these proxies must maintain full duplicates of the dictionary structure. In contrast, Fern incrementally distributes components of its dictionary as required to satisfy client requests and thus is suitable for deployments where clients are likely to require only a small fraction of a dictionary's contents and connectivity may be limited.

When dictionary components must be obtained remotely, the latency of lookup and validation operations is dominated by communication time. This latency can be reduced through the exploitation of locality-sensitive caching of dictionary components. Fern dictionary's components are suitable for caching and distribution via autonomic scalable locality-aware Content Distribution Networks (CDNs) and therefore can provide these properties without requiring the provisioning of a dedicated distribution infrastructure.

Others have proposed the construction of incrementally distributed
authenticated dictionaries that utilize either trees that dynamically
re-balance or skiplists. The structural changes that result from tree
rebalancing can reduce the effectiveness of caching. Skiplists do not
require balancing and thus are more amenable to caching. However a
client lookup from a skiplist-based dictionary must sequentially
transfer two-to-three times as many components as a client of a
dictionary based on self-balancing trees. In both cases, these
transfers are *necessarily serialized*, and thus skiplists will
incur proportionally increased latency.

Fern's dictionary structure utilizes a novel randomized trie that has
the desirable characteristics of both of these approaches. While
Fern's algorithm is far simpler than self-balancing trees, a Fern trie
will have similarly short (average and expected worst case) path
lengths, and thus requires that clients obtain approximately the same
number of vertices. Furthermore, like skiplists, Fern's trie *does not
require rebalancing* and thus is similarly amenable to caching.

A prototype implementation of Fern has been constructed that utilizes the CoralCDN scalable, locality-aware, and autonomic content distribution network. We provide an informal analysis of bandwidth requirements for the Fern authenticated dictionary that agrees with experimental results. We are not aware of other implemented systems with similar properties or comparable analysis of such systems' performance and bandwidth requirements.

The potential integration of Fern within the CDN on which it relies could yield symbiotic benefits. The indexing infrastructure for autonomic CDNs such as Coral are vulnerable to disruption by malicious participants. Therefore, a CDN's integrity could be guarded against malicious interference through the dissemination of up-to-date authorization information provided by Fern. In a complementary manner, a CDN so fortified by Fern could potentially provide more reliable content distribution service to Fern and thus also improve Fern's availability and performance.

Original file in pdf

Updated version in pdf

Technical Report UTEP-CS-06-44, November 2006

Updated version UTEP-06-44d, March 2008

Towards More Adequate Representation of Uncertainty: From Intervals to Set Intervals, with the Possible Addition of Probabilities and Certainty Degrees

J. T. Yao, Y. Y. Yao, V. Kreinovich, P. Pinheiro da Silva, S. A. Starks, G. Xiang, and H. T. Nguyen

Published in *Proceedings of the IEEE World Congress on
Computational Intelligence WCCI'2008*, Hong Kong, China,
June 1-6, 2008, pp. 983-990.

In the ideal case of complete knowledge, for each property Pi
(such as "high fever", "headache", etc.), we know the exact set
Si of all the objects that satisfy this property. In practice, we
usually only have partial knowledge. In this case, we only know the
set Si- of all the objects about which we know that
Pi holds and the set Si+ about which we know that
Pi may hold (i.e., equivalently, that we have not yet excluded
the possibility of Pi). This pair of sets is called a *set
interval*.

Based on the knowledge of the original properties, we would like to
describe the set S of all the values that satisfy some combination
of the original properties: e.g., high fever *and* headache
*and not* rash. In the ideal case when we know the exact set Si of
all the objects satisfying each property, it is sufficient to apply
the corresponding set operation (composition of union, intersection,
and complement) to the known sets Si. In this paper, we describe
how to compute the class [S] of all possible sets S.

Original file in pdf and in Compressed Postscript

Updated version in pdf and in Compressed Postscript

Technical Report UTEP-CS-06-43, October 2006

Updated version UTEP-CS-06-43a, December 2006

Final version UTEP-CS-43b, April 2007

Why Intervals? Why Fuzzy Numbers? Towards a New Justification

Vladik Kreinovich

Short version published in *Proceedings of the IEEE Conference on Foundations of
Computational Intelligence FOCI'07*, Honolulu, Hawaii,
April 1-5, 2007, pp. 113-119; full version published in
*International Journal of Information Technology
and Intelligent Computing*, 2007, Vol. 2, No. 1.

The purpose of this paper is to present a new characterization of the set of all intervals (and of the corresponding set of fuzzy numbers). This characterization is based on several natural properties useful in mathematical modeling; the main of these properties is the necessity to be able to combine (fuse) several pieces of knowledge.

Original file in pdf and in Compressed Postscript

Updated version in pdf and in Compressed Postscript

Final version in pdf and in Compressed Postscript

Technical Report UTEP-CS-06-42, September 2006

compPknots: A Framework for Parallel Prediction and Comparison of RNA Secondary Structures with Pseudoknots

Trilce Estrada, Abel Licon, and Michela Taufer

Codes for RNA structure prediction based on energy minimization are usually very time and resource intensive. For this reason several codes have been significantly simplified: in some cases they are unable to predict complex secondary structures such as pseudoknots, while at other times they are able to predict structures with reduced lengths, or they are only able to predict some elementary and simple pseudoknots. Each of the existing codes has its strengths and weaknesses. Providing scientists with tools that are able to combine the strengths of the several codes is a worthwhile objective.

To address this need, we present compPknots, a parallel framework that uses a combination of existing codes such as Pknots-RE and Pknots-RG, to predict RNA secondary structures concurrently and automatically compare them with reference structures from databases or literature. In this paper compPknots is used to compare and contrast the prediction accuracies of 217 RNA secondary structures from the PseudoBase database using Pknots-RE and Pknots-RG separately, or both together. Its parallel master-slave architecture allowed us to prove that combinations of prediction codes can provide scientists with higher prediction accuracies in a very short time.

Technical Report UTEP-CS-06-41, September 2006

Revised version UTEP-CS-06-41a, April 2007

Decomposable Aggregability in Population Genetics and Evolutionary Computations: Algorithms and Computational Complexity

Vladik Kreinovich and Max Shpak

Published in: Arpad Kelemen, Ajith Abraham, and Yulan Liang (Eds.), Computational Intelligence in Medical Informatics, Springer-Verlag, Berlin-Heidelberg, 2008, pp. 69-92.

Many dynamical systems are decomposably *aggregable* in the sense that one can
divide their (micro)variables x1,...,xn into several (k)
non-overlapping blocks and find combinations y1,...,yk of
variables
from these blocks (*macrovariables*) whose dynamics depend
only on the initial values of
the macrovariables. For example, the state of a biological
population can be described by listing the frequencies xi of
different genotypes i; in this example, the corresponding
functions fi(x1,...,xn) describe the effects of mutation,
recombination, and natural selection in each generation.

Another example of a system where detecting aggregability is important is a one that describes the dynamics of an evolutionary algorithm - which is formally equivalent to models from population genetics.

For very large systems, finding such an aggregation is often the only way to perform a meaningful analysis of such systems. Since aggregation is important, researchers have been trying to find a general efficient algorithm for detecting aggregability.

In this paper, we show that in general, detecting aggregability is
NP-hard even for linear systems, and thus (unless P=NP), we can only
hope to find efficient detection algorithms for specific classes of
systems. Moreover, even detecting *approximate* aggregability is
NP-hard.

We also show that in the linear case, once the groups are known, it is possible to efficiently find appropriate linear combinations ya.

Original file in pdf and in Compressed Postscript

Updated file in pdf and in Compressed Postscript

Technical Report UTEP-CS-06-40, August 2006

Wavesurfer: A Tool for Sound Analysis

Ernesto Medina and Thamar Solorio

Researchers in the Interactive Systems Group at UTEP have been using a research tool called Didi for some time now. It was originally designed to be easily adaptable. This tool has proven to be adaptable as it has been changed by different researchers to suit particular needs. As a result, multiple versions of the program exist. In addition to this, the tool only works in Linux and has grown quite a bit. To solve these problems, the different versions could have been consolidated into one program and modified to produce a version that worked on other platforms, or another program could be customized instead. In order to make an informed decision about which alternative was more feasible, research was conducted on a potential replacement candidate called Wavesurfer. This paper briefly describes Didi, Wavesurfer, the initial modifications to Wavesurfer, and possible future modifications to Wavesurfer.

Technical Report UTEP-CS-06-39, August 2006

A Model-Based Workflow Approach for Scientific Applications

Leonardo Salayandia, Paulo Pinheiro da Silva, Ann Q. Gates, and Alvaro Rebellon

Productive design of scientific workflows often depends on the effectiveness of the communication between the discipline domain experts and computer scientists, including their ability to share their specific needs in the design of the workflow. Discipline domain experts and computer scientists, however, tend to have distinct needs for designing workflows including terminology, level of abstraction, workflow aspects that should be included in the design. This paper discusses the use of a Model-Based Workflow (MBW) approach as an abstract way to specify workflows that conciliate the needs of domain and computer scientists. Within the context of GEON, an NSF cyberinfrastructure for Earth Sciences, the paper discusses the benefits of using a Gravity Map MBW generated from an ontology about gravity. The Gravity Map MBW is based on terms derived from the gravity ontology that was developed by geophysicists; it does not include some of the workflow properties that tend to make workflow specifications look too complex for discipline domain experts to understand; and it provides a framework for developing strategies to derive executable Gravity Map workflow encodings with only limited interaction from computer scientists.

Technical Report UTEP-CS-06-38, August 2006

Workflow-Driven Ontologies: An Earth Sciences Case Study

Leonardo Salayandia, Paulo Pinheiro da Silva, Ann Q. Gates, and Flor Salcedo

A goal of the Geosciences Network (GEON) is to develop cyber-infrastructure that will allow earth scientists to discover access, integrate and disseminate knowledge in distributed environments such as the Web, changing the way in which research is conducted. The earth sciences community has begun the complex task of creating ontologies to support this effort. A challenge is to coalesce the needs of the earth scientists, who wish to capture knowledge in a particular discipline through the ontology, with the need to leverage the knowledge to support technology that will facilitate computation, for example, by helping the composition of services. This paper describes an approach for defining workflow-driven ontologies that capture classes and relationships from domain experts and use that knowledge to support composition of services. To demonstrate the capability afforded by this type of ontology, the paper presents examples of workflow specifications generated from a workflow-driven ontology that has been defined for representing knowledge about gravity data.

Technical Report UTEP-CS-06-37, August 2006

Finding Least Expensive Tolerance Solutions and Least Expensive Tolerance Revisions: Algorithms and Computational Complexity

Inna Pivkina and Vladik Kreinovich

For an engineering design, tolerances in design parameters are selected so that within these tolerances, we guarantee the desired functionality. Feasible algorithms are known for solving the corresponding computational problems: the problem of finding tolerances that guarantee the given functionality, and the problem of checking whether given tolerances guarantee this functionality.

In this paper, we show that in many practical problems, the problem of choosing the optimal tolerances can also be solved by a feasible algorithm. We prove that a slightly different problem of finding the optimal tolerance revision is, in contrast, computationally difficult (namely, NP-hard). We also show that revision programming algorithms can be used to check whether a given combination of tolerance changes is optimal under given constraints -- and even to find a combination that is optimal under given constraints.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-06-36, August 2006

Canica: an IDE for the Java Modeling Language

Angelica B. Perez, Yoonsik Cheon, and Ann Q. Gates

Canica is an integrated development environment for the Java Modeling Language (JML), a formal behavioral interface specification language for Java. The JML distribution includes several support tools, such as a syntax checker, a compiler, and a document generator, and there are several third-party tools available for JML. However, most of these tools are command-line-based and work in isolation. Canica glues and streamlines these tools to provide a GUI-based, integrated environment for JML; for example, it automates unit testing completely from test data generation to test execution and test result determination. In this paper, we describe the key features of Canica and explain its design and implementation. We also discuss the lessons that we learned from our development effort.

Technical Report UTEP-CS-06-35, July 2006

Updated version UTEP-06-35a, September 2007

UFuzzy Prediction Models in Measurement

Leon Reznik and Vladik Kreinovich

Published in *IEEE Transactions on Fuzzy Systems*, 2008, Vol. 16, No. 4,
pp. 851-862.

The paper investigates a feasibility of fuzzy models application in measurement procedures. It considers the problem of measurement information fusion from different sources, when one of the sources provides predictions regarding approximate values of the measured variables or their combinations. Typically this information is given by an expert but may be mined from available data also. This information is formalized as fuzzy prediction models and is used in combination with the measurement results to improve the measurement accuracy. The properties of the modified estimates are studied in comparison with the conventional ones. The conditions when fuzzy models application can achieve a significant accuracy gain are derived, the gain value is evaluated, the recommendations on fuzzy prediction model production and formalization in practical applications are given.

Original file in pdf

updated version in pdf

Technical Report UTEP-CS-06-34, July 2006

Updated version UTEP-CS-06-34c, January 2007

Towards Adding Probabilities and Correlations to Interval Computations

Daniel Berleant, Martine Ceberio, Gang Xiang, and Vladik Kreinovich

Published in *International Journal of Approximate Reasoning*,
2007, Vol. 46, No. 3, pp. 499-510.

The paper is a continuation of our previous work towards the use of probability information in interval computations. While in the previous work, bounds on the first order moments are taken into account, the contribution of this article is to deal with correlations. Specifically, in this paper, we develop a new method that takes into account both correlation among measured parameters and bounds on their expected values when doing interval computation.

Original file in pdf and
in Compressed Postscript

updated version in pdf and
in Compressed Postscript

Technical Report UTEP-CS-06-33, July 2006

Fast Computation of Exact Ranges of Symmetric Convex and Concave Functions under Interval Uncertainty

Gang Xiang

Many statistical characteristics y=f(x1,...,xn) are continuous, symmetric, and either concave or convex; examples include population variance V=(1/n)*(x1^2+...+xn^2)-E^2 (where E=(1/n)*(x1+...+xn), Shannon's entropy S=-p1*log(p1)-..-pn*log(pn), and many other characteristics. In practice, often, we often only know the intervals Xi=[xi-,xi+] that contain the (unknown) actual inputs xi. Since different values xi from Xi lead, in general, to different values of f(x1,...,xn), we need to find the range Y={f(x1,...,xn):x1 in X1,...,xn in Xn}, i.e., the maximum and the minimum of f(x1,...,xn) over the box X1 x ... x Xn. It is known that for convex functions, there exists a feasible (polynomial-time) algorithm for computing its minimum, but computing its maximum is, in general, NP-hard. It is therefore desirable to find feasible algorithms that compute the maximum in practically reasonable situations. For variance and (negative) entropy, such algorithms are known for the case when the inputs satisfy the following subset property}: [xi-,x-+] is not contained in (xj-,xj+) for all i and j. In this paper, we show that these algorithms can be extended to the case of general symmetric convex characteristics.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-06-32, July 2006

Detecting Filled Pauses in Tutorial Dialogs

Gaurav Garg and Nigel Ward

As dialog systems become more capable, users tend to talk more spontaneously and less formally. Spontaneous speech includes features which convey information about the user's state. In particular, filled pauses, such as `um' and `uh', can indicate that the user is having trouble, wants more time, wants to hold the floor, or is uncertain. In this paper we present a first study of the acoustic characteristics of filled pauses in tutorial dialogs. We show that in this domain, as in other domains, filled pauses typically have flat pitch and fairly constant energy. We present a simple algorithm based on these features which detects filled pauses with 80% coverage and 67% accuracy. Analysis of the prediction failures shows that some are due to filled pauses of unusual types and related phenomena: filled pauses marking a change of state, cases where uncertainty is marked by lengthening a vowel in a word, and filled pauses which seque directly into a word.

Technical Report UTEP-CS-06-31, July 2006

For Complex Intervals, Exact Range Computation is NP-Hard Even For Single Use Expressions (Even for the Product)

Martine Ceberio, Vladik Kreinovich, and Guenter Mayer

One of the main problems of interval computations is to compute the range Y of the given function f(x1,...,xn) under interval uncertainty. Interval computations started with the invention of straightforward interval computations, when we simply replace each elementary arithmetic operation in the code for f with the corresponding operation from interval arithmetic. In general, this technique only leads to an enclosure for the desired range, but in the important case of single use expressions (SUE), in which each variable occurs only once, we get the exact range. Thus, for SUE expressions, there exists a feasible (polynomial-time) algorithm for computing the exact range.

We show that in the complex-valued case, computing the exact range is NP-hard even for SUE expressions. Moreover, it is NP-hard even for such simple expressions as the product f(z1,...,zn)=z1*...*z_n.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-06-30, July 2006

The Effectiveness of Threshold-Based Scheduling Policies on BOINC Projects

Trilce Estrada, David A. Flores, Michela Taufer, Patricia J. Teller, Andre Kerstens, and David P. Anderson

Several scientific projects use BOINC (Berkeley Open Infrastructure for Network Computing) to perform large-scale simulations using volunteers’ computers (workers) across the Internet. In general, the scheduling of tasks in BOINC uses a First-Come-First-Serve policy and no attention is paid to workers’ past performance, such as whether they have tended to perform tasks promptly and correctly. In this paper we use SimBA, a discrete-event simulator of BOINC applications, to study new threshold-based scheduling strategies for BOINC projects that use availability and reliability metrics to classify workers and distribute tasks according to this classification. We show that if availability and reliability thresholds are selected properly, then the workers’ throughput of valid results increases significantly in BOINC projects.

Technical Report UTEP-CS-06-29, July 2006

Aggregability is NP-Hard

Vladik Kreinovich and Max Shpak

Published in *ACM SIGACT News*, 2006, Vol. 37, No. 3, pp. 97-104.

Many dynamical systems are *aggregable* in the sense that we can
divide their variables x1,...,xn into several (k)
non-intersecting groups and find combinations y1,...,yk of
variables
from these groups (*macrovariables*)
whose dynamics depend only on the initial values of
the macrovariables. For very large systems, finding such an
aggregation is often the only way to perform a meaningful analysis
of such systems. Since aggregation is important, researchers have
been trying to find a general efficient algorithm for detecting
aggregability. In this paper, we show that in general, detecting
aggregability is NP-hard even for linear systems, and thus (unless
P=NP), we can only hope to find efficient detection algorithms for
specific classes of systems.

We also show that in the linear case, once the groups are known, it is possible to efficiently find appropriate combinations ya.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-06-28, June 2006

Updated version UTEP-CS-06-28b, November 2006

Second updated version UTEP-CS-06-28c, October 2007

Final version UTEP-CS-06-28d, November 2007

Computing Population Variance and Entropy under Interval Uncertainty: Linear-Time Algorithms

Gang Xiang, Martine Ceberio, and Vladik Kreinovich

Published in *Reliable Computing*, 2007, Vol. 13,
No. 6, pp. 467-488.

In statistical analysis of measurement results, it is often necessary to compute the range [V-,V+] of the population variance V=((x1-E)^2+...+(xn-E)^2)/n (where E=(x1+...+xn)/n) when we only know the intervals [Xi-Di,Xi+Di] of possible values of the xi. While V- can be computed efficiently, the problem of computing V+ is, in general, NP-hard. In our previous paper "Population Variance under Interval Uncertainty: A New Algorithm" (Reliable Computing, 2006, Vol. 12, No. 4, pp. 273-280), we showed that in a practically important case, we can use constraints techniques to compute V+ in time O(n*log(n)). In this paper, we provide new algorithms that compute V- and, for the above case, V+ in linear time O(n).

Similar linear-time algorithms are described for computing the range of the entropy S=-p1*log(p1)-...-pn*log(p_n) when we only know the intervals [pi-,pi+] of possible values of probabilities pi.

In general, a statistical characteristic f can be more complex so that even computing f can take much longer than linear time. For such f, the question is how to compute the range [y-,y+] in as few calls to f as possible. We show that for convex symmetric functions f, we can compute y+ in n calls to f.

Original file in pdf and
in Compressed Postscript;

updated file in pdf and
in Compressed Postscript;

second updated file in pdf and
in Compressed Postscript;

final file in pdf and
in Compressed Postscript

Technical Report UTEP-CS-06-27, June 2006

Statistical Data Processing under Interval Uncertainty: Algorithms and Computational Complexity

Vladik Kreinovich

Published in Jonathan
Lawry, Enrique Miranda, Alberto Bugarin, Shoumei Li, Maria Angeles
Gil, Przemyslaw Grzegorzewski, and Olgierd Hryniewicz (eds.),
*Soft Methods for Integrated Uncertainty Modeling*, Springer-Verlag,
2006, pp. 11-26.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-06-26, June 2006

Automatic Labeling of Back Channels

Udit Sajjanhar and Nigel Ward

In dialog, the proper production of back-channels is an important way for listeners to cooperate with speakers. Developing quantitative models of this process is important both for improving spoken dialog systems and for teaching second language learners. An essential step for the development of such models is labeling all back-channels in corpora of human-human dialogs. Currently this is done by hand. This report describes a method for automatically identifying back-channels in conversation corpora, using only the patterns of speech and silence by the speaker and the listener in the local context. Tested on Arabic, Spanish, and English, this method identifies most of the actual back-channels, but it also mistakenly identifies many utterances which are not-back-channels: across 293 minutes of data in these three languages, the coverage is 70.2% and the accuracy is 40.8%. Thus this method is probably useful not as a fully automatic method, but only as a way to reduce the amount of human labor required. The method is parameterized, and it is possible to obtain slightly better performance by replacing the generic, language-independent parameters with language-specific parameters.

Technical Report UTEP-CS-06-25, June 2006

Measuring Privacy Loss in Statistical Databases

Vinod Chirayath, Luc Longpre, and Vladik Kreinovich

Published in H. Leung and G. Pighizzini (Eds.), *Proceedings of
the Workshop on Descriptional Complexity of Formal Systems DCFS
2006*, Las Cruces, New Mexico, June 21-23, 2006, pp. 16-25

Protection of privacy in databases has become of increasing importance. While a number of techniques have been proposed to query databases while preserving privacy of individual records in the database, very little is done to define a measure on how much privacy is lost after statistical releases. We suggest a definition based on information theory. Intuitively, the privacy loss is proportional to how much the descriptional complexity of a record decreases relative to the statistical release. There are some problems with this basic definition and we suggest ways to address these problems.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-06-24a, October 2006

TWO ETUDES ON Combining Probabilistic and Interval Uncertainty: Processing Correlations and Measuring Loss of Privacy

Martine Ceberio, Gang Xiang, Luc Longpre, Vladik Kreinovich, Hung T. Nguyen, Daniel Berleant

Published in *Proceedings of the 7th International
Conference on Intelligent Technologies InTech'06*, Taipei, Taiwan,
December 13-15, 2006, pp. 8-17.

In many practical situations, there is a need to combine interval and probabilistic uncertainty. The need for such a combination leads to two types of problems: (1) how to process the given combined uncertainty, and (2) how to gauge the amount of uncertainty and -- a related question -- how to best decrease this uncertainty. In our research, we concentrate on these two types of problems. In this paper, we present two examples that illustrate how the corresponding problems can be solved.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-06-24, June 2006

How to Measure Loss of Privacy

Luc Longpre and Vladik Kreinovich

To compare different schemes for preserving privacy, it is important to be able to gauge loss of privacy. Since loss of privacy means that we gain new information about a person, it seems natural to measure the loss of privacy by the amount of information that we gained. However, this seemingly natural definition is not perfect: when we originally know that a person's salary is between $10,000 and $20,000 and later learn that the salary is between $10,000 and $15,000, we gained exactly as much information (one bit) as when we learn that the salary is an even number -- however, intuitively, in the first case, we have a substantial privacy loss while in the second case, the privacy loss is minimal. In this paper, we propose a new definition of privacy loss that is in better agreement with our intuition. This new definition is based on estimating worst-case financial losses caused by the loss of privacy.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-06-23, May 2006

Bilinear Models from System Approach Justified for Classification, with Potential Applications to Bioinformatics

Richard Alo, Francois Modave, Vladik Kreinovich, David Herrera, and Xiaojing Wang

Published in the Proceedings of the *Second International
Conference on Fuzzy Sets and Soft Computing in Economics and Finance
FSSCEF'2006*, St. Petersburg, Russia, June 28 - July 1, 2006, pp.
55-62.

When we do not know the dynamics of a complex system, it is natural to use common sense to get a reasonable first approximation -- which turns out to be a bilinear dynamics. Surprisingly, for classification problems, a similar bilinear approximation turns out to be unexpectedly accurate. In this paper, we provide an explanation for this accuracy.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-06-22, May 2006

Interval and Fuzzy Techniques in Business-Related Computer Security: Intrusion Detection, Privacy Protection

Mohsen Beheshti, Jianchao Han, Luc Longpre, Scott A. Starks, J. Ivan Vargas, and Gang Xiang

Published in the Proceedings of the *Second International
Conference on Fuzzy Sets and Soft Computing in Economics and Finance
FSSCEF'2006*, St. Petersburg, Russia, June 28 - July 1, 2006, pp.
23--30.

E-commerce plays an increasingly large role in business. As a result, business-related computer security becomes more and more important. In this talk, we describe how interval and fuzzy techniques can help in solving related computer security problems.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-06-21, May 2006

Economics of Engineering Design under Interval (and Fuzzy) Uncertainty: Case Study of Building Design

Carlos Ferregut, Jan Beck, Araceli Sanchez, and Vladik Kreinovich

Published in the Proceedings of the *Second International
Conference on Fuzzy Sets and Soft Computing in Economics and Finance
FSSCEF'2006*, St. Petersburg, Russia, June 28 - July 1, 2006, pp.
39-46.

One of the main objectives of engineering design is to find a design that is the cheapest among all designs that satisfy given constraints. Most of the constraints must be satisfied under all possible values within certain ranges. Checking all possible combinations of values is often very time-consuming. In this paper, we propose a faster algorithm for checking such constraints.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-06-20, May 2006

Growth Rates under Interval Uncertainty

Janos Hajagos and Vladik Kreinovich

Published in the Proceedings of the

For many real-life systems ranging from financial to population-related to medical, dynamics is described by a system of linear equations. For such systems, the growth rate lambda can be determined as the largest eigenvalue of the corresponding matrix A. In many practical situations, we only know the components of the matrix A with interval (or fuzzy) uncertainty. In such situations, it is desirable to find the range of possible values of lambda. In this paper, we propose an efficient algorithm for computing lambda for a practically important case when all the components of the matrix A are non-negative.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-06-19, May 2006

Updated version UTEP-CS-06-19b, April 2007

Towards Combining Probabilistic, Interval, Fuzzy Uncertainty, and Constraints: On the Example of Inverse Problem in Geophysics

G. Randy Keller, Scott A. Starks, Aaron Velasco, Matthew Averill, Roberto Araiza, Gang Xiang, and Vladik Kreinovich

Original paper published in the Proceedings of the *IEEE
Proceedings of the 12th GAMM-IMACS International Symposium on
Scientific Computing, Computer Arithmetic and Validated Numerics*,
Duisburg, Germany, September 26-29, 2006.

In many real-life situations, we have several types of uncertainty: measurement uncertainty can lead to probabilistic and/or interval uncertainty, expert estimates come with interval and/or fuzzy uncertainty, etc. In many situations, in addition to measurement uncertainty, we have prior knowledge coming from prior data processing, prior knowledge coming from prior interval constraints. In this paper, on the example of the seismic inverse problem, we show how to combine these different types of uncertainty.

Original file in pdf and
in Compressed Postscript

updated version in pdf and
in Compressed Postscript

Technical Report UTEP-CS-06-18, April 2006

Topaz: A Firefox Protocol Extension for GridFTP Based on Data Flow Diagrams

Richard Zamudio, Daniel Catarino, Michela Taufer, Brent Stearn, and Karan Bhatia

As grid infrastructures mature, an increasing challenge is to provide end-user scientists with intuitive interfaces to computational services, data management capabilities, and visualization tools. The current approach used in a number of cyber-infrastructure projects is to leverage the capabilities of the Mozilla framework to provide rich end-user tools that seamlessly integrate with remote resources such as web/grid services and data repositories.

In this paper we apply rigorous software engineering tools, Data Flow Diagrams or DFDs, to guide the design, implementation, and performance analysis of Topaz, a GridFTP protocol extension to the Firefox browser. GridFTP servers, similar to FTP servers used on the Internet, provide a data repository for files and are optimized for grid use (support for very large file sizes, high-performance data transfer, third-party transfer, integration with Grid Security Infrastructure). Topaz provides users with a familiar and user-friendly interface with which to access arbitrary GridFTP servers by providing upload and download functionalities, as well as by obtaining and managing certificates.

Since it is integrated into the Firefox browser, Topaz introduces an additional layer of abstraction in the communication between client and server. In this paper, we compare upload and download transmission times of our protocol extension with other traditionally used tools such as UberFTP, the globus-url-copy command, an FTP client, and the Secure Copy Protocol (scp) command. DFDs enable us to identify the causes of performance slow-downs in Topaz when transferring small files with respect to grid tools such as UberFTP and the globus-url-copy command. For large files, Topaz performs as well as the traditionally used tools providing, in addition, scientists with a user-friendly interface.

Index Terms: Distributed computing, Data communication, Data access and management, Middleware and toolkits, Grid integration issues.

Technical Report UTEP-CS-06-17, April 2006

Updated version UTEP-CS-06-17a, October 2006

How to Efficiently Process Uncertainty Within a Cyberinfrastructure without Sacrificing Privacy and Confidentiality

Luc Longpre and Vladik Kreinovich

Published in: Nadia Nedjah, Ajith Abraham, and Luiza de Macedo
Mourelle (Eds.), *Computational Intelligence in Information
Assurance and Security*, Springer-Verlag, 2007, pp. 155-173.

In this paper, we propose a simple solution to the problem of estimating uncertainty of the results of applying a black-box algorithm -- without sacrificing privacy and confidentiality of the algorithm.

Original file in pdf and
in Compressed Postscript;

updated file in pdf and
in Compressed Postscript

Technical Report UTEP-CS-06-16, April 2006

Revised version UTEP-CS-06-16a, June 2006

Unimodality, Independence Lead to NP-Hardness of Interval Probability Problems

Daniel J. Berleant, Olga Kosheleva, Vladik Kreinovich, and Hung T. Nguyen

Published in *Reliable Computing*, 2007, Vol. 13,
No. 3, pp. 261-282.

In many real-life situations, we only have partial information about probabilities. This information is usually described by bounds on moments, on probabilities of certain events, etc. -- i.e., by characteristics c(p) which are linear in terms of the unknown probabilities pj. If we know interval bounds on some such characteristics ai <= ci(p) <= Ai, and we are interested in a characteristic c(p), then we can find the bounds on c(p) by solving a linear programming problem.

In some situations, we also have additional conditions on the probability distribution -- e.g., we may know that the two variables x1 and x2 are independent, or that the distribution of x1 and x2 is unimodal. We show that adding each of these conditions makes the corresponding interval probability problem NP-hard.

Original file in pdf and
in Compressed Postscript

Revised version in pdf and
in Compressed Postscript

Technical Report UTEP-CS-06-15, April 2006

Optimized Sampling Frequencies for Weld Reliability Assessments of Long Pipeline Segments

Cesar J. Carrasco and Vladik Kreinovich

Published in the *Proceedings of the 25th International Conference of the
North American Fuzzy Information Processing Society NAFIPS'2006*,
Montreal, Quebec, Canada, June 3-8, 2006, pp. 605-610.

In this paper, we describe new faster algorithms that design an optimal testing strategy for long pipeline segments.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-06-14, April 2006

Fast Computation of Centroids for Constant-Width Interval-Valued Fuzzy Sets

Jerry M. Mendel, Hongwei Wu, Vladik Kreinovich, and Gang Xiang

Published in the *Proceedings of the 25th International Conference of the
North American Fuzzy Information Processing Society NAFIPS'2006*,
Montreal, Quebec, Canada, June 3-8, 2006, pp. 621-626.

Interval-valued fuzzy sets provide a more adequate description of uncertainty than traditional fuzzy sets; it is therefore important to use interval-valued fuzzy sets in applications. One of the main applications of fuzzy sets is fuzzy control, and one of the most computationally intensive part of fuzzy control is defuzzification. Since a transition to interval-valued fuzzy sets usually increases the amount of computations, it is vitally important to design faster algorithms for the corresponding defuzzification. In this paper, we provide such an algorithm for a practically important case of constant-width interval-valued fuzzy sets.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-06-13, April 2006

A Formal Specification in JML of the Java Security Package

Poonam Agarwal, Carlos E. Rubio-Medrano, Yoonsik Cheon, and Patricia J. Teller

The Java security package allows a programmer to add security features to Java applications. Although the package provides a complex application programming interface (API), its informal description, e.g., Javadoc comments, is often ambiguous or imprecise. Nonetheless, the security of an application can be compromised if the package is used without a concrete understanding of the precise behavior of the API classes and interfaces, which can be attained via formal specification. In this paper, we present our experiences in formally specifying the Java security package in JML, a formal behavior interface specification language for Java. We illustrate portions of our JML specifications and discuss the lessons that we learned, from this specification effort, about specification patterns and the effectiveness of JML. In addition, we put forward our JML wish list. Our specifications are not only a precise document for the API but also provide a foundation for formally reasoning and verifying the security aspects of applications. We believe that our specification techniques and patterns can be used to specify other Java packages and frameworks.

File in PDF and in Compressed Postscript

Technical Report UTEP-CS-06-12, March 2006

Computing Variance under Interval Uncertainty: A New Algorithm and its Potential Application to Privacy in Statistical Databases

Richard Alo, Mohsen Beheshti, and Gang Xiang

Published in the *Proceedings
of the International Conference on Information Processing and
Management of Uncertainty in Knowledge-Based Systems IPMU'06*,
Paris, France, July 2-7, 2006, pp. 810-816.

Computation of population mean E=(x1+...+xn)/n and population variance V=(x1^2+...+xn^2)/n -E^2 is an important first step in statistical analysis. In many practical situations, we do not know the exact values of the sample quantities xi, we only know the intervals [Xi-Di, Xi+Di] that contain the actual (unknown) values of xi. Different values of xi from these intervals lead, in general, to different value of population variance. It is therefore desirable to compute the range [V]=[V-,V+] of possible values of V.

This problem of computing population variance under interval uncertainty is, in general, NP-hard. It is known that in some reasonable cases, there exist feasible algorithms for computing the interval [V]: e.g., such algorithms are known for the case when for some constant c, any collection of more than c "narrowed" intervals [Xi - Di/n, Xi + Di/n] has no common intersection, and for the case when none of the two narrowed intervals are subsets of each other.

In this paper, we provide a new polynomial time algorithm for computing population variance under interval uncertainty, an algorithm that is applicable to all situations where previously, feasible algorithms were known.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-06-11, March 2006

3-D Image Registration Using Fast Fourier Transform, with Potential Applications to Geoinformatics and Bioinformatics

Roberto Araiza, Matthew G. Averill, G. Randy Keller, Scott A. Starks, and Chandrajit Bajaj

Published in the *Proceedings
of the International Conference on Information Processing and
Management of Uncertainty in Knowledge-Based Systems IPMU'06*,
Paris, France, July 2-7, 2006, pp. 817-824.

FFT-based techniques are actively used to register 2-D images, i.e., to find the shift, rotation, and scaling necessary to align one image with the other. It is desirable to extend these techniques to the problem of registering 3-D images. Registration of 3-D images is an important problem in areas such as bioinformatics (e.g., in protein docking) and geoinformatics (e.g., in earth modeling).

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-06-10, March 2006

Updated version UTEP-CS-06-10a, April 2006

Estimating Information Amount under Interval Uncertainty: Algorithmic Solvability and Computational Complexity

Gang Xiang, Olga Kosheleva, and George J. Klir

Short version published in the *Proceedings
of the International Conference on Information Processing and
Management of Uncertainty in Knowledge-Based Systems IPMU'06*,
Paris, France, July 2-7, 2006, pp. 840-847.

In most real-life situations, we have uncertainty: we do not know the exact state of the world, there are several (n) different states which are consistent with our knowledge. In such situations, it is desirable to gauge how much information we need to gain to determine the actual state of the world. A natural measure of this amount of information is the average number of "yes"-"no" questions that we need to ask to find the exact state. When we know the probabilities p1,...,pn of different states, then, as Shannon has shown, this number of questions can be determined as S=-p1 log(p1)-...-pn log(pn).

In many real-life situations, we only have partial information about the probabilities; for example, we may only know intervals [pi]=[p-i,p+i] of possible values of pi. For different values pi from Pi, we get different values S. So, to gauge the corresponding uncertainty, we must find the range [S]=[S-,S+] of possible values of S. In this paper, we show that the problem of computing [S] is, in general, NP-hard, and we provide algorithms that efficiently compute [S] in many practically important situations.

updated version in pdf and
in Compressed Postscript

Technical Report UTEP-CS-06-09, March 2006

Revised version UTEP-CS-06-09c, March 2009

Fast Convolution and Fast Fourier Transform under Interval Uncertainty

Guoqing Liu and Vladik Kreinovich

Published in *Journal of Computer and System Sciences*, 2010, Vol. 76, No. 1, pp. 63-76.

Convolution y(t), defined as an integral of a(t-s)*x(s) ds, is one of the main techniques in digital signal processing. A straightforward computation of the convolution y(t) requires O(n^2) steps, where n is the number of observations x(t0),...,x(t(n-1)). It is well known that by using the Fast Fourier Transform (FFT) algorithm, we can compute convolution much faster, with computation time O(n*log(n)).

In practice, we only know the signal x(t) and the function a(t) with uncertainty; sometimes, we know them with interval uncertainty, i.e., we know intervals [x-(t),x+(t)] and [a-(t),a+(t)] that contain the actual (unknown) functions x(t) and a(t). In such situations, it is desirable, for every t, to compute the range [y-(t),y+(t)] of possible values of y(t). Of course, it is possible to use straightforward interval computations to compute this range, i.e., replace every computational step in FFT by the corresponding operations of interval arithmetic. However, the resulting enclosure is too wide.

In this paper, we show how to provide asymptotically accurate ranges for y(t) in time O(n*log(n)).

Original file in pdf and
in Compressed Postscript

Updated file in pdf and
in Compressed Postscript

Technical Report UTEP-CS-06-08, March 2006

Updated version UTEP-CS-06-08b, May 2006

Helping Students to Become Researchers: What We Can Gain from Russian Experience

Vladik Kreinovich, Ann Gates, and Olga Kosheleva

Shorter version published in the *Proceedings of the 2006
International Sun Conference on Teaching and Learning*, El Paso,
Texas, March 3-4, 2006,
http://sunconference.utep.edu/SunHome/2006/proceedings2006.html;
final version published in the *Proceedings of the 36th ASEE/IEEE
Frontiers in Education Conference FIE'2006*, San Diego, California,
October 28-31, 2006, pp. M3G-26 - M3G-31.

The fact that many internationally renowned scientists have been educated in the former Soviet Union shows that many features of its education system were good. In this session, we briefly describe the features that we believe to have been good. Some of these features have already been successfully implemented (with appropriate adjustments) in affinity research groups at the Department of Computer Science of the University of Texas at El Paso (UTEP).

Original file in pdf and
in Compressed Postscript

Final version in pdf

Technical Report UTEP-CS-06-07, March 2006

Towards Secure Cyberinfrastructure for Sharing Border Information

Ann Gates, Vladik Kreinovich, Luc Longpre, Paulo Pinheiro da Silva, and G. Randy Keller

Published in *Proceedings of the Lineae Terrarum: International Border
Conference*, El Paso, Las Cruces, and Cd. Juarez, March 27-30,
2006.

In many border-related issues ranging from economic collaboration to border security, it is extremely important that bordering countries share information. One reason why such sharing is difficult is that different countries use different information formats and data structures. It is therefore desirable to design infrastructure to facilitate this information sharing.

UTEP is a lead institution in a similar NSF-sponsored multi-million geoinformatics project, whose goal is to combine diverse and complex geophysical and geographical data stored in different formats and data structures. We describe our experience in using and developing related web service techniques, and we explain how this experience can be applied to border collaboration.

Since many security-related data are sensitive, we describe how to make sure that the designed cyberinfrastructure provides secure sharing.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-06-06, January 2006

Updated version UTEP-CS-06-06a, November 2006

How to Take into Account Dependence Between the Inputs: From Interval Computations to Constraint-Related Set Computations, with Potential Applications to Nuclear Safety, Bio- and Geosciences

Martine Ceberio, Scott Ferson, Vladik Kreinovich, Sanjeev Chopra, Gang Xiang, Adrian Murguia, and Jorge Santillan

The first version published in *Proceedings of the Second International
Workshop on Reliable Engineering Computing*,
Savannah, Georgia, February 22-24, 2006, pp. 127-154; a revised
version published in *Journal of Uncertain Systems*, 2007,
Vol. 1, No. 1, pp. 11-34.

In many real-life situations, in addition to knowing the intervals Xi of possible values of each variable xi, we also know additional restrictions on the possible combinations of xi; in this case, the set X of possible values of x=(x1,..,xn) is a proper subset of the original box X1 x ... x Xn. In this paper, we show how to take into account this dependence between the inputs when computing the range of a function f(x1,...,xn).

updated version in pdf and
in Compressed Postscript

Technical Report UTEP-CS-06-05a, January 2006

Static Timing Analysis Based on Partial Probabilistic Description of Delay Uncertainty

Wei-Shen Wang, Vladik Kreinovich, and Michael Orshansky

Published in the *Proceedings of the 43th Design Automation
Conference, San Francisco, California*, July 24-28, 2006,
pp. 161-166.

Technical Report UTEP-CS-06-05, January 2006

Interval-Based Robust Statistical Techniques for Non-Negative Convex Functions with Application to Timing Analysis of Computer Chips

Michael Orshansky, Wei-Shen Wang, Gang Xiang, and Vladik Kreinovich

Published in *Proceedings of the Second International
Workshop on Reliable Engineering Computing*,
Savannah, Georgia, February 22-24, 2006, pp. 197-212.

In chip design, one of the main objectives is to decrease its clock cycle; however, the existing approaches to timing analysis under uncertainty are based on fundamentally restrictive assumptions. Statistical timing analysis techniques assume that the full probabilistic distribution of timing uncertainty is available; in reality, the complete probabilistic distribution information is often unavailable. Additionally, the existing alternative of treating uncertainty as interval-based, or affine, is limited since it cannot handle probabilistic information in principle. In this paper, a fundamentally new paradigm for timing uncertainty description is proposed as a way to consistently and rigorously handle partially available descriptions of timing uncertainty. The paradigm is based on a formal theory of interval probabilistic models that permit handling parameter uncertainty that is described in a distribution-free mode - just via the range, the mean, and the variance. This strategy permits effectively handling multiple real-life challenges, including imprecise and limited information about the distributions of process parameters, parameters coming from different populations, and the sources of uncertainty that are too difficult to handle via full probabilistic measures (e.g. on-chip supply voltage variation). Specifically, analytical techniques for bounding the distributions of probabilistic interval variables are proposed. Also, a provably correct strategy for fast Monte Carlo simulation based on probabilistic interval variables is introduced. A path-based timing algorithm implementing the novel modeling paradigm, as well as handling the traditional variability descriptions, has been developed. The results indicate the proposed technique can improve the upper bound of the 95th-percentile circuit delay, on average, by 4.8% across the ISCAS'85 benchmark circuits, compared to the worst-case timing analysis that uses only the interval information of the partially specified parameters.

Technical Report UTEP-CS-06-04, January 2006

Modeling Correlation and Dependence Among Intervals Scott Ferson and Vladik Kreinovich

Published in *Proceedings of the Second International
Workshop on Reliable Engineering Computing*,
Savannah, Georgia, February 22-24, 2006, pp. 115-126.

This note introduces the notion of dependence among intervals to account for observed or theoretical constraints on the relationships among uncertain inputs in mathematical calculations. We define dependence as any restriction on the possible pairings of values within respective intervals and define nondependence as the degenerate case of no restrictions (which we carefully distinguish from independence in probability theory). Traditional interval calculations assume nondependence, but alternative assumptions are possible, including several which might be practical in engineering settings that would lead to tighter enclosures on arithmetic functions of intervals. We give best possible formulas for addition of intervals under several of these dependencies. We also suggest some potentially useful models of correlation, which are single-parameter families of dependencies, often ranging from the identity dependence (u=v) representing maximal correlation, through nondependence, to opposite dependence (1-u=v) representing maximally negative correlation.

Technical Report UTEP-CS-06-03, January 2006

Interval Finite Element Methods: New Directions

Rafi Muhanna, Vladik Kreinovich, Pavel Solin, Jack Chessa, Roberto Araiza, and Gang Xiang

Published in *Proceedings of the Second International
Workshop on Reliable Engineering Computing*,
Savannah, Georgia, February 22-24, 2006, pp. 229-243.

File in pdf and in Compressed Postscript

Technical Report UTEP-CS-06-02, January 2006

The UTEP Corpus of Iraqi Arabic

Nigel G. Ward, David G. Novick, and Salamah I. Salamah

The rules governing turn-taking phenomena are not well understood in general and almost completely undocumented for Arabic. As the first step to modeling these phenomena, we have collected a small corpus of Iraqi Arabic spoken dialogs. The corpus is in three parts. Part A is 110 minutes of unstructured conversations. Parts B1 and B2 are 176 minutes of direction-giving dialogs, most including a greeting phase, a smalltalk phase, a request phase, and a direction-giving phase. Parts A and B1 were recorded with 13 native speakers of Iraqi Arabic, interacting in pairs. In Part B2 the direction-getter is an American with only very basic Arabic knowledge. This document describes the intended uses of the corpus, the dialog types, the speakers, and the details of the data collection.

Technical Report UTEP-CS-06-01, January 2006

Updated version UTEP-CS-06-01b, March 2006

Towards Optimal Use of Multi-Precision Arithmetic: A Remark

Vladik Kreinovich and Siegfried Rump

Published in *Reliable Computing*,
2006, Vol. 12, No. 5, pp. 365-369.

If standard-precision computations do not lead to the desired accuracy, then it is reasonable to increase precision until we reach this accuracy. What is the optimal way of increasing precision? One possibility is to choose a constant q>1, so that if the precision which requires the time t did not lead to a success, we select the next precision that requires time q*t. It was shown that among such strategies, the optimal (worst-case) overhead is attained when q=2. In this paper, we show that this "time-doubling" strategy is optimal among all possible strategies, not only among the ones in which we always increase time by a constant q>1.

Original file in pdf and
in Compressed Postscript

Updated version in pdf and
in Compressed Postscript