Test 1 for CS 5383, June 21, 2006

Name: ___________________________________________________________________

Please use as many sheets of paper as you need; do not forget to place your name on all these sheets.

1-2. Security and Privacy

1. One of the methods proposed to preserve privacy in statistical databases is to only allow queries that combine at least m records (where m is a given threshold). Illustrate the limitations of this method on the following example of a database that contains age and GPA of several students; in this example, m = 5.

Based on this information, there is exactly one student who is 22. Use the above answers to the query to find this student's GPA. (The possibility to reconstruct this student's GPA shows that this method does not provide a perfect protection of privacy.)

2. Another method of protecting privacy is adding a random error to the answer. What are the limitations of this method? How can we avoid the limitations of these privacy protection methods? Hint:instead of storing, e.g., the exact age of each student, we store ...

3--10. Uncertainty

3. Formulate the general problem of estimating the the upper bound D on the approximation error of result of data processing: what is known, and what we need to compute.

4-6. For the case when the data processing algorithm is f(x1,x2) = x1 - 2*x2, the measured values are X1 = 3.0 and X2 = 1.0, and the upper bounds on the measurement errors are D1 = D2 = 0.1, find the bound on D by using the following three methods:

4. use monotonicity to find the exact range and then, the value D;

5. use analytical differentiation;

6. use numerical differentiation.

7. Explain where bisection is used in a Monte-Carlo approach of estimating the upper bound D on the approximation error. What is the advantage of the Monte-Carlo approach? Why cannot we simply use numerical differentiation?

8. Use bisection to find the solution to the equation x^2=46 on the interval [0,8]. (It is sufficient to find the solution with an accuracy of 1.) Explain where bisection is used in a Monte-Carlo approach of estimating the upper bound D on the approximation error of result of data processing.

Turn over, please.
9-10. For the case when the data processing algorithm is f(x1,x2) = x1 - 2*x2, the measured values are X1 = 3.0 and X2 = 1.0, and the standard deviations of the measurement errors are s1 = 0.03, s2 = 0.02, find the standard deviation s of y - Y by using the following two methods:

9. use analytical differentiation;

10. use numerical differentiation.