4th Joint UTEP/NMSU Workshop on Mathematics,

Computer Science, and Computational Sciences

El Paso, Texas

Saturday, November 8, 2008

Roberto Araiza

(Bionformatics Program, UTEP)

USING INTERVALS FOR THE CONCISE REPRESENTATION OF INVERTED REPEATS IN RNA

Ribonucleic acid, or RNA, is a nucleic acid consisting of a long sequence of chained nucleotide units. Nucleotides consist of a sugar, a phosphate group, and a nitrogenous base, which in RNA is generally adenine (A), cytosine (C), guanine (G) or uracil (U). RNA plays many important roles within a cell, for example, messenger RNA (mRNA) carries the protein-building blueprint from the DNA template to the ribosomes (the protein-building sites); transfer RNA (tRNA) transfers aminoacids (the protein-building blocks) to the ribosomes; and ribosomal RNA (rRNA), the central component of the ribosome, decodes mRNA and builds the protein by chaining together the aminoacids transferred by tRNA.

To achieve functionality, RNA molecules often require a specific tertiary structure, defined by the three-dimensional atomic coordinates. The scaffold for the tertiary structure is provided by the secondary structure, defined by the hydrogen bonds between nitrogenous bases. In RNA, typically cytosine (C) bonds with guanine (G) and adenine (A) bonds with uracil (U), although other base-pairs are possible.

Most RNA secondary structures are defined by these base-pairings between nucleotides. If we represent an RNA molecule as a string over the alphabet [GUCA], the RNA molecule 'GUCACCCCUGAC' can base-pair positions 1-4 with positions 9-12 and create a stem-loop (where the pairs form the stem, or helix, and the unpaired positions form the loop), a common secondary-structure pattern that is a building-block for more complex secondary structures.

Several approaches to secondary structure prediction consider finding these
palindromic inverted repeats, which equate to base-pairs in a molecule. For
example, EMBOSS ("European Molecular Biology Open Software Suite") provides a
*palindrome* utility that searches for inverted repeats.

Results from EMBOSS's palindrome utility are correct but sometimes not very concise. For example, for a sequence 'AAAACCCCUUUUUUUUUU', palindrome reports possible base-pairings 1-4:9-12, 1-4:10-13, 1-4:11-14, 1-4:12-15, 1-4:13-16, 1-4:14-17, and 1-4:15-18. If we use intervals to represent this data, we can report instead base-pairs [1, 4]:[9, 18].

The approach can be extended to more complex sequences where the helices consist of more than one base. For example, for a sequence 'CAAAACCCCUUUUUUUUUUG' we have pairings [1-5]:[10-20] with bulges of sizes [0-6]. Bulges arise because stem-halves need not be continuous, for example, we can have, in the above sequence, pairings 1:20 and 2-5:14-17, with positions 18 and 19 being in a bulge. Further constraints, such as reasonable loop and bulge sizes can be developed as well, also using intervals.

Sarala Arunagiri, Patricia J. Teller, Seetharami Seelam, Ron A. Oldfield, Maria Ruiz Varela, and Rolf Riesen

(Presenter from the Department of Computer Science, UTEP)

ANALYSIS OF A CHECKPOINT RESTART MODEL TO UNCOVER PROPERTIES THAT CAN LEAD TO IMPROVED APPLICATION PERFORMANCE

The massive scale of current and next-generation massively parallel processing (MPP) systems presents significant challenges related to fault tolerance. In particular, the standard approach to fault tolerance, application-directed, periodic checkpointing, puts an incredible strain on the storage system and the interconnection network. This results in overheads that severely impact application performance and scalability. Checkpoint overhead can be reduced by decreasing the checkpoint latency, which is the time to write a checkpoint file, or by increasing the length of the checkpoint interval, which defines both the compute time between checkpoints and the frequency with which checkpoint files are written. However, in the presence of failures, increasing the length of the checkpoint interval may increase application execution time.

In this paper, we analyze a well known mathematical model, developed by researchers at Los Alamos National Laboratory (LANL), of the execution time of applications that perform periodic checkpoints. Via this analysis, we derive properties that demonstrate that the length of the checkpoint interval can be increased beyond the optimal value defined by the model with only a small bounded increase in execution time. While it is clear that increasing the length of the checkpoint interval decreases the frequency of checkpoint file writes, it is not clear if the aggregate number of checkpoint I/O operations decreases. This is because increasing the length of the checkpoint interval increases the execution time, which, in turn, increases the expected number of failures, thereby increasing the expected number of checkpoint file read operations. The paper presents simulation results which show that the number of checkpoint I/O operations indeed decreases as the length of the checkpoint interval increases. In these simulations we have included a few uncertainties that more closely reflect reality and, thereby, relax some of the idealizations of analytical modeling. In addition, the simulation results establish a need for additional analytical modeling work related to the aggregate number of checkpoint operations. Finally, we describe other possible uses of our work in high performance computing.

Neelabh Baijal and Luc Longpre

(Department of Computer Science, UTEP)

PRESERVING PRIVACY IN STATISTICAL DATABASES BY USING INTERVALS INSTEAD OF CELL SUPPRESSION

An important challenge with statistical databases is to protect the privacy of an individual when aggregate data is released. Cell suppression is a commonly used technique to this end. This involves the suppression of additional non-sensitive data to restrict inferences about the sensitive data. We suggest another approach which we call cell blurring. Instead of completely suppressing some cells, we can replace the value in a cell by an interval. This has the advantage of distributing the uncertainty introduced by the suppression more evenly across the entire data. We propose a number of algorithms that, given a statistical database, a table to be published, some cells in the database that are considered sensitive and a degree of privacy that needs to be guaranteed on the sensitive cells, and computes a new table of intervals that may be published while preserving privacy.

Martine Ceberio and Vladik Kreinovich

(Department of Computer Science, UTEP)

COMPUTING WITH TENSORS: POTENTIAL APPLICATIONS OF PHYSICS-MOTIVATED MATHEMATICS TO COMPUTER SCIENCE

The main objective of our talk is to show that tensor can help in computing - and to explain how.

Let us first recall what are tensors, what is their origin in physics,
and why they can be helpful in computing. Physics starts with measuring and
describing the values of different physical quantities -- and goes on to
equations which enable us to predict the values of these quantities.
A measuring instrument usually returns a single numerical value. For some
physical quantities (like mass m), the single measured value is sufficient to
describe the quantity. For other quantities, we need several values:
e.g., to describe the electric field E at a given point, we need to
know the values of three components of this field E_{x},
E_{x}, and E_{z} corresponding to three
different coordinate axes. To know the characteristics
related to tension inside a solid body, we need to know even more
parameters, etc.
In the 19th century, the equations of physics (e.g., Maxwell's equations)
contained a separate equation for each component of the field. These equations
were cumbersom to describe and difficult to solve.

Physical equations
-- and the resulting computations -- became very much simplified when it
turned out that we can describe all the components of a
physical field as a single mathematical object -- a vector a_{i},
or, more generally, a tensor a_{ij}, a_{ijk}, ...
Originally, mostly vectors (rank-1 tensors)
were used, but in the 20th century quantum physics
brought in matrices (rank-2 tensors), and relativity theory brought in
even higher-order tensors such as the rank-4 curvature tensor.

In many applications of computing, we are now facing a problem similar to what physics faced in the 19th century: that we need to store and process large amounts of data. We show that in many such situations, tensors can help.

Our optimism is justified by the fact that
in many of these situations, there are already heuristic techniques that
help -- techniques which are, in effect, tensor-related. For example,
the standard way of multiplying two matrices C = AB is very inefficient
for large matrices A and B which cannot be placed into the fast-to-access part
of the computer memory ("cache") - because this way requires a lot of
time-consuming data transfers ("cache misses") between different memory parts.
To speed up computations, practicionere represent each large matrix A as
a "matrix" of block matrices A_{ij}, and use the fact that
matrix multiplication
can be represented in terms of these blocks
C_{ik} = A_{i1}B_{1j} + ... + A_{in}B_{nj}.
In effect, this means that each element of the original matrix is now
represented as an (a,b)-th element of a block A_{ij}, i.e., as an element of
a rank-4 tensor (A_{ij})_{ab}. (The
fact that an increase in rank improves efficiency
should not be surprizing: a representation of a rank-1
vector as a rank-2 spinor works in relativistic quantum physics.)

Tensors can be used to naturally describe *constraints*. Indeed, a general
constraint between n real-valued quantities is a subset
of R^{n}, and such a subset can be (approximately) described
block-by-block - i.e., as a
boolean-valued tensor. It turns out that
processing such constraint-related sets can also be naturally described
in tensor terms.

Tensors also naturally appear in an efficient Taylor series approach to
*uncertainty* propagation. Here, the resulting tensors are symmetric,
so have a special computational challenge of decreasing the r! duplication
of the standard computer representation. Tensors naturally appear in
*quantum computing* since in quantum physics,
the set of state of a composite system is a tensor product of
the corresponding sets of states.

So far, we have shown that tensors can help computing, but the relation between tensors and computing can also help physics. For example, to describe Kaluza-Klein-type high-dimensional space-time models of modern physics, Einstein himself pioneered to use "tensors" with integer or circular values; from the mathematical viewpoint, such "tensors" are unusual, but in computer terms, integer or circular data types are very natural - and even more efficient to process than standard real numbers.

Manali Chakraborty and Olac Fuentes

(Department of Computer Science, UTEP)

REAL-TIME IMAGE-BASED MOTION DETECTION USING COLOR AND STRUCTURE

In this paper we propose a method for automating the process of detecting regions of motion in a video sequence in real time. The main idea of this work is to detect motion based on both structure and color. The detection using structure is carried out with the aid of information gathered from the Census Transform, a method that characterizes the local intensity patterns in an image region. Color-based detection is done using color histograms, which allow efficient characterization without prior assumptions about color distribution in the scene. The probabilities from the Census Transform and Color histogram are combined in a robust way to detect the zones of active motion.

We demonstrate the effectiveness of our approach by showing initial results of its application as an initial step to detect regions of interest in a foveal vision system. In this foveal system, high resolution is applied to regions where motion is detected, while the rest of the image is sensed, transmitted and stored in low-resolution.

Behzad Djafari-Rouhani

(Department of Mathematical Sciences, UTEP)

ASYMPTOTIC BEHAVIOR OF SOLUTIONS TO A SECOND ORDER EVOLUTION EQUATION

Art Duval

(Department of Mathematical Sciences, UTEP)

SIMPLICIAL SPANNING TREES

Cayley's famous result that there are n^(n-2) spanning trees in a complete graph of n vertices has been proved, and generalized, in many ways. One particularly nice way is the Matrix- Tree Theorem, that the number of spanning trees of an arbitrary graph can be enumerated by the determinant of a reduced Laplacian matrix.

Building upon the work of Kalai and Adin, we extend the concept of a spanning tree from graphs to simplicial complexes, which are just higher-dimensional analogs of graphs. For all complexes K satisfying a mild technical condition, we show that the simplicial spanning trees of K can be enumerated using its Laplacian matrices, generalizing the Matrix-Tree theorem. As in the graphic case, replacing the Laplacian with a weighted analogue yields extra information about the simplicial spanning trees being counted. As an example, we find a nice expression for the resulting weighted tree enumerator of shifted complexes, by generalizing a formula for the Laplacian eigenvalues of a shifted complex to the weighted case.

This is joint work with Carly Klivans and Jeremy Martin.

John Harding

(Department of Mathematical Sciences, NMSU)

QUANTUM LOGIC

We give a brief overview of quantum logic and the recently introduced subject of categorical quantum mechanics. We outline a recently discovered link between the two subjects, and discuss several examples. One example also has a link to coding theory.

Naga Suman Kanagala, Edgar Padilla, Essau Ramirez, Angel Silva, Cesar Valenzuela, and Martine Ceberio

(Department of Computer Science, UTEP)

GAIT ANALYSIS USING OPTIMIZATION AND CONSTRAINT PROGRAMMING TECHNIQUES

Many neurological diseases, such as cerebral palsy in children, strokes or Parkinson's disease, can cause abnormal changes in gait or "gait disorders" and significantly affect the ability of patients to walk without assistance. Intensive rehabilitation therapy is usually the key for patients to improve their walking ability. However, therapy is usually long and relies on human experts to perform it: it involves a lot of repetition of exercises guided by therapists. As a result, it is not as efficient as one would expect and it is very costly. The key to improve rehabilitation is to automate the diagnosis of gait pathologies and as well as the therapy.

Automatically diagnosing gait pathologies requires to be able to assess gait. Nowadays, gait is assessed through observation and/or computer-based analysis of some characteristics; e.g., the angle between the shinbone and the femur, the height of the ankle (using markers placed on the patient's joints: typically ankle, knee, hip). Current diagnosis techniques are based on observations; current research on automating the gait diagnosis makes use of video and data recordings. Current research work is mostly oriented towards the use of fuzzy logic or neural networks, in order to identify features that will help differentiate between healthy and unhealthy gaits. Since it is also expected that diagnosis will involve comparing observed gait to known healthy gaits, current research is also concerned with being able to recognize scaling and shifting, so that comparison can be performed properly. However, although important, the problem of inconsistencies in the gait cycles of a same patient (e.g., patients walking faster, slower, patients having a very distorted gait) has not been addressed yet, and neither the need for determining models (of both healthy and unhealthy patients).

In this work, we focus on the diagnosis part of the problem. Our work makes use of optimization and constraint programming techniques, aiming at extracting characteristics (models) and at being independent of inconsistencies in the cycles / patients (e.g., height, weight, age) / speed. In particular, we propose a new technique to automatically detect gait cycles, and to determine whether or not a gait is unhealthy.

The work we describe in this abstract was performed on data recorded from healthy patients, and obtained from markers placed on their ankle, knee, and hip joints. We developed a simple filtering algorithm that takes into account online significant points among the recorded data, and that allows to exhibit a pattern of a normal gait. From the result of this algorithm, we were also able to isolate cycles and correlate them between data from different markers, in order to validate our patterns. The patterns were then used to define constraint solving problems: this step allows us to check if a gait is not healthy, i.e., does not match the expected pattern.

The novelty of this work for gait analysis lies on the fact that we are interested only in patterns, which leads us to a qualitative assessment of the gait, and not so much in a quantitative assessment that, in our opinion, relies too much on the data being perfect (no inconsistency).

Preliminary results show the promise of our work, and the developed tool can be extended to support the detection of more inconsistencies in the recordings than the ones experimented and reported in this article.

Amine Khamsi

(Department of Mathematical Sciences, UTEP)

REMARKS ON CARISTI'S FIXED POINT THEOREM

In this work, we give a characterization of the existence of minimal elements in partially ordered sets in terms of fixed point of multivalued maps. This characterization shows that the assumptions in Caristi's fixed point theorem can, a priori, be weakened. Finally, we discuss Kirk's problem on an extension of Caristi's theorem and prove a new positive result which illustrates the weakening mentioned before.

Jaime Nava, Vladik Kreinovich, Guillermo Restrepo, and Douglas J. Klein

(presenter from Department of Computer Science, UTEP)

DISCRETE TAYLOR SERIES AS A SIMPLE WAY TO PREDICT CHARACTERISTICS OF CHEMICAL SUBSTANCES LIKE BENZENES AND CUBANES

In many applications, we have numerous molecules that are obtained from a
"template"
molecule like benzene C_{6}H_{6} or cubane C_{8}H_{8}
by replacing some of its hydrogen atoms with other atoms or atom groups (called
ligands). Depending on how many original atoms are replaced and which ones are
replaced, we get a large number of different chemical substances. It is desirable
to be able, based on the measurements performed on a small number of such
substances, to accurately predict the characteristics (such as energy)
of all similar substances.

Such predictions are very important, since, e.g., cubanes, while kinetically stable, are highly reactive. As a result, at present, they are actively used as high-density, high-energy fuels and explosives, and researchers are investigating the potential of using cubanes (and similarly high-energy molecules) in medicine and nanotechnology.

In their 2007 paper in the Journal of Mathematical Chemistry, Teodora Ivanciuc and Douglas J. Klein use complex poset-related ideas of Gian-Carlo Rota to make such predictions. In our talk, we show that similar predictions can be made by using much simpler Taylor series techniques.

Inna Pivkina

(Department of Computer Science, NMSU)

TEACHING MATHEMATICS AND COMPUTER SCIENCE WITH PRIMARY HISTORICAL SOURCES

Curricular materials in discrete mathematics, algorithm design, automata, graph theory, and logic have been developed by a team consisting of Guram Bezhanishvili, Hing Leung, Jerry Lodder, David Pengelley, Inna Pivkina, Desh Ranjan from New Mexico State University, and Janet Barnett from Colorado State University - Pueblo. The materials (projects) are based on primary historical sources. Using original historical sources provides context and direction to subject matter, honing the students' verbal and deductive skills through reading the work of some of the greatest minds in history. Classroom testing of the projects shows improved student attitude and improved performance after completing a course with historical projects. More information is available at http://www.cs.nmsu.edu/historical-projects/

Andrzej Pownuk

(Department of Mathematical Sciences, UTEP)

MATHEMATICAL ASPECTS OF GRADING STUDENTS' HOMEWORK IN ON-LINE WEB APPLICATIONS

Automatic grading of students' homework improve quality of teaching at the university. There are many commercial companies which provide on-line web application for doing that (Webassign, MyMathLab etc.). It is also possible to get some free software (e.g. WeBWorK).

In this presentation, the author would like to show an example of an on-line grading system, which was written in asp.net (Web Forms), C#, and MSSQL Express. The structure of the system can be very easily customized in order to satisfy the needs of each particular subject. That feature is particularly important in teaching of graduate courses.

Homework can be created using C#, VB, and LaTeX.

In order to efficiently grade on-line homework, it is necessary to estimate the quality of students' answers. That process require special mathematical tools. The grade is a measure which tell the teacher how well the knowledge of the student is close to the knowledge which is necessary in order to pass each particular test. Because of that the process of online grading can be described by using the fuzzy sets theory and the theory of imprecise probability.

The system which automatically grades students' homework has the following internet address

Karen Villaverde and Gilbert Ornelas

(Department of Computer Science, NMSU, and Department of Computer Science, UTEP)

BEYOND INTERVALS: PHASE TRANSITIONS LEAD TO MORE GENERAL RANGES

One of the main tasks of science and engineering is to use the current values of the physical quantities for predicting the future values of the desired quantities. Due to the (inevitable) measurement inaccuracy, we usually know the current values of the physical quantities with interval uncertainty. Traditionally, it is assumed that all the processes are continuous; as a result, the range of possible values of the future quantities is also known with interval uncertainty. However, in many practical situations (such as phase transitions), the dependence of the future values on the current ones becomes discontinuous. We show that in such cases, initial interval uncertainties can lead to arbitrary bounded closed ranges of possible values of the future quantities. We also show that the possibility of such a discontinuity may drastically increase the computational complexity of the corresponding range prediction problem.

Piotr Wojciechowski

(Department of Mathematical Sciences, UTEP)

STRONG COMPLETENESS OF ŁUKASIEWICZ AXIOMS

John Harding and Qin Yang

(Department of Mathematical Sciences, NMSU)

REGULAR COMPLETIONS

Jun Zheng and Olac Fuentes

(Department of Computer Science, UTEP)

A STOCHASTIC METHOD FOR RESOLUTION-ENHANCEMENT OF FACE IMAGES

A Stochastic Method for Resolution-enhancement of Face Images Jun Zheng and Olac Fuentes When provided with high-resolution, occlusion-free images of faces in frontal view, automated face recognition systems can provide accuracies that are comparable to those obtained by humans. However, in surveillance applications, cameras are usually set up with wide-fields of view in order to capture as much of the scene as possible. This normally results in low resolution images of the objects of interest. A new family of approaches, based on statistical machine learning, aims at analyzing large data sets of images of a particular class, for example faces, and learning the mapping from low-quality to high-quality images of that class. This enables them to infer, for example, the most likely high-resolution face image depicting the same person as a low-resolution image given as input. These resolution enhancement algorithms are time-consuming, due to the need for exhaustive search in a database of models. This work improves the efficiency of face hallucination using stochastic search in local modeling. Experimental results show that the proposed algorithm can generate high-quality face images from low-resolution input while reducing the computational time dramatically.