Final exam, CS 5353, Fall 2008

Date: Tuesday, December 9, 2008.

Name (please print): ___________________________________________________________________________________

1. Motivations.

Explain what is cyberinfrastructure, why it is useful, and why it did not appear many years ago, when computer communications were slower.
Explain why we cannot directly measure all physical quantities of interest to us, why we need data processing. Give a real-life example.

2. Statistical foundations.
Assume that the probability p(A) of the event A is 0.6, and the probability p(B) of the event B is 0.5.

What is the probability p(A & B) that both A and B hold if these events are independent? Provide the derivation of the corresponding formula.
What is the range of possible values of p(A & B) when we do not have any information about the dependence between A and B? Draw examples illustrating the possibilities of the smallest and the largest values from this range.
What is the probability p(A \/ B) that one of the events A or B holds if these events are independent? Provide the derivation of the corresponding formula.
What is the range of possible values of p(A \/ B) when we do not have any information about the dependence between A and B? Draw examples illustrating the possibilities of the smallest and the largest values from this range.

3. Mathematical techniques.

What is the Maximum Likelihood Method? Provide at least two examples of where this method is used. Describe one of these examples in detail.
What is bisection? Give a numerical example (two steps are enough) of how bisection can be used to find the solution to an equation, i.e., the value x for which f(x) = 0.

Provide an example where both Maximum Likelihood Method and bisection are used in estimating uncertainty of the result of data processing.

4. Data fusion.
Let us assume that we have measured the same quantity with two different measurement instruments. The result of the first measurement is 1.2, the result of the second measurement is 0.8. Combine these two results into a single "fused" value, in the following two situations:

Probabilistic situation. In this situation, we assume that both measuring instrument have 0 mean; the first instrument has standard deviation 0.2, the second has standard deviation 0.3.
Interval situation. In this situation, we have no information about the probabilities, we only know that the measurement error of the first instrument does not exceed 0.2, and the measurement error of the second instrument does not exceed 0.3.

Where do the formulas that you used come from (no need for detailed derivations, just explain the main ideas.)

5. Linearization.

Illustrate, on the example of the function y = x₁² - x₂, with the measurement results values x₁ = 1.0 and x₂ = 2.0 and measurement errors Δx₁ = 0.1 and Δx₂ = -0.2, what will be the error dy as estimated by the linearization method, and how this estimated error compares with the actual value of this error. Compare the analytical expressions for the corresponding partial derivatives with the results of numerical differentiation.
For the same function and the same measurement results, assume that we only know the bounds Δ₁ = 0.1 and Δ₂ = 0.2 on the measurement errors. Use the monotonicity of the function f to find the exact range of the corresponding value y, and compare this range with the results of applying a linearized formula for the bound Δ on the error Δy of the result of data processing.

6. Uncertainty in data processing: computational aspects.
For the formulas for computing uncertainty of the result of data processing, explain how the computational complexity (= number of computational steps) depends on the choice of the parameters h_i used in numerical differentiation, and what is the choice for which the computational complexity is the smallest:

for the case of statistical uncertainty, when we know the standard deviations σ_i of the corresponding measurement errors, and
for the case of interval uncertainty, when we only know the upper bounds Δ_i of the corresponding measurement errors.

7. Uncertainty in data processing: Monte-Carlo method.
Explain why Monte-Carlo method is useful in estimating uncertainty of the result of data processing, and for what number of inputs it is useful:

for the case of statistical uncertainty, and
for the case of interval uncertainty.

Provide a numerical example of the number of iterations that are needed to achieve a given accuracy.

8. Estimating reliability and trust: Monte-Carlo method.

Explain why Monte-Carlo method is useful in estimating the degree of trust.
Explain why for very reliable components, we cannot directly use the Monte-Carlo method, we need a re-scaling. Describe the main idea of the re-scaling and how it helps.

9. Reliability.
On the example of each of the following two cases:

case when f trusts t with probability p₁ = 1 - Δp₁ and t trusts s with probability p₂ = 1 - Δp₂;
case when f has two reasons for trusting s: with probability p₁ = 1 - Δp₁ and with probability p₂ = 1 - Δp₂;

estimate two values:

the worst-case probability Δp_w that f does not trust s, and
the probability Δp_i that f does not trust s under the independence assumption.

10. Describe the contents of one of the class projects -- different from your own project.