Final exam, CS 5353, Fall 2008
Date: Tuesday, December 9, 2008.
Name (please print):
___________________________________________________________________________________
1. Motivations.
-
Explain what is cyberinfrastructure, why it is useful, and why it did not appear
many years ago, when computer communications were slower.
- Explain why we cannot directly measure
all physical quantities of interest to us, why we need data processing. Give a
real-life example.
2. Statistical foundations.
Assume that the probability
p(A) of the event A is 0.6, and the probability p(B) of the event B is 0.5.
- What is the probability p(A & B) that both A and B hold if these
events are independent? Provide the derivation of the corresponding formula.
- What is the range of possible values of p(A & B)
when we do not have any information about the dependence between A and B? Draw
examples illustrating the possibilities of the smallest and the largest values
from this range.
- What is the probability p(A \/ B) that one of the events
A or B holds if these events are independent?
Provide the derivation of the corresponding formula.
- What is the range of
possible values of p(A \/ B) when we do not have any information about the
dependence between A and B? Draw examples illustrating the possibilities of the
smallest and the largest values from this range.
3. Mathematical techniques.
- What is the Maximum Likelihood Method? Provide at least two examples of
where this method is used. Describe one of these examples in detail.
- What is bisection? Give a numerical example
(two steps are enough) of how bisection can be used to find the solution to an
equation, i.e., the value x for which f(x) = 0.
Provide an example where both Maximum Likelihood Method and bisection are used in
estimating uncertainty of the result of data processing.
4. Data fusion.
Let us assume that we have measured the same quantity with
two different measurement instruments. The result of the first measurement is
1.2, the result of the second
measurement is 0.8. Combine these two results into a single "fused" value, in
the following two situations:
- Probabilistic situation. In this situation, we assume that
both measuring instrument have 0 mean;
the first instrument has standard deviation 0.2, the second has standard deviation
0.3.
- Interval situation. In this situation, we have no information about the
probabilities, we only know that the measurement error of the first instrument
does not exceed 0.2, and the measurement error of the second instrument does not
exceed 0.3.
Where do the formulas that you used come from (no need for detailed derivations,
just explain the main ideas.)
5. Linearization.
- Illustrate, on the example of the function
y = x12 - x2,
with the measurement results
values x1 = 1.0 and x2 = 2.0 and measurement errors
Δx1 = 0.1 and Δx2 = -0.2,
what will be the error dy as estimated by the linearization method, and how
this estimated error compares with the actual value of this error. Compare
the analytical expressions for the corresponding partial derivatives with the
results of numerical differentiation.
- For the same function and the same measurement results, assume that we
only know the bounds Δ1 = 0.1 and Δ2 = 0.2 on the
measurement errors. Use the monotonicity of the function f to find the exact
range of the corresponding value y, and compare this range with the results of
applying a linearized formula for the bound Δ on the error Δy of
the result of data processing.
6. Uncertainty in data processing: computational aspects.
For the formulas for
computing uncertainty of the result of data processing,
explain how the computational complexity (= number of computational steps)
depends on the choice of the
parameters hi used in numerical differentiation, and what is the choice
for which the computational complexity is the smallest:
- for the case of statistical uncertainty, when we know the standard deviations
σi of the corresponding measurement errors, and
- for the case of interval uncertainty, when we only know the upper bounds
Δi of the corresponding measurement errors.
7. Uncertainty in data processing: Monte-Carlo method.
Explain why Monte-Carlo method is useful in estimating uncertainty of the
result of data processing, and for what number of inputs it is useful:
- for the case of statistical uncertainty, and
- for the case of interval uncertainty.
Provide a numerical example of the number of iterations that are
needed to achieve a given accuracy.
8. Estimating reliability and trust: Monte-Carlo method.
- Explain why Monte-Carlo method is useful in estimating the degree of trust.
- Explain why for very
reliable components, we cannot directly use the Monte-Carlo method, we need a
re-scaling. Describe the main idea of the re-scaling
and how it helps.
9. Reliability.
On the example of each of the following two
cases:
- case when f trusts t with probability p1 = 1 -
Δp1 and t trusts s with probability p2 = 1 -
Δp2;
- case when f has two reasons for trusting s: with
probability p1 = 1 - Δp1 and with probability
p2 = 1 - Δp2;
estimate two values:
- the worst-case
probability Δpw that f does not trust s, and
- the probability Δpi that f does not trust s under the
independence assumption.
10. Describe the contents of one of the class projects -- different
from your own project.