## Test 1 for CS 5383,
June 21, 2006

Name: ___________________________________________________________________
Please use as many sheets of paper as you need; do not forget to
place your name on all these sheets.

1-2. *Security and Privacy*

1. One of the methods proposed to preserve privacy in statistical
databases is to only allow queries that combine at least *m*
records (where *m* is a given threshold). Illustrate the
limitations of this method on the following example of a database
that contains age and GPA of several students; in this example,
*m* = 5.

- When a user asks for the number of students
who are younger than 22 and for the average GPA of these students,
the answer is 5 students and 3.0.
- When a user asks for the number of students
who are younger than 23 and for the average GPA of these students,
the answer is 6 students and 2.8.

Based on this information, there is exactly one student who is 22.
Use the above answers to the query to find this student's GPA. (The
possibility to reconstruct this student's GPA shows that this method
does not provide a perfect protection of privacy.)
2.
Another method of protecting privacy is adding a random error to the
answer. What are the limitations of this method? How can we avoid the
limitations of these privacy protection methods? *Hint:*instead
of storing, e.g., the exact age of each student, we store ...

3--10. *Uncertainty*

3. Formulate the general problem of estimating the
the upper bound D on the approximation error of result of data processing:
what is known, and what we need to compute.

4-6. For the case when the data processing algorithm is f(x1,x2) = x1 - 2*x2,
the measured values are X1 = 3.0 and X2 = 1.0, and the upper bounds on
the measurement errors are D1 = D2 = 0.1, find the bound on D by using
the following three methods:

4. use monotonicity to find the exact range and then, the value D;

5. use analytical differentiation;

6. use numerical differentiation.

7. Explain where bisection is used in a Monte-Carlo approach of
estimating the upper bound D on the approximation error. What is the
advantage of the Monte-Carlo approach? Why cannot we simply use
numerical differentiation?

8. Use bisection to find the solution to the equation x^2=46 on the
interval [0,8]. (It is sufficient to find the solution with an accuracy
of 1.) Explain where bisection is used in a Monte-Carlo approach of
estimating the upper bound D on the approximation error of result
of data processing.

*Turn over, please.*
9-10. For the case when the data processing algorithm is f(x1,x2) = x1 - 2*x2,
the measured values are X1 = 3.0 and X2 = 1.0, and the standard deviations
of
the measurement errors are s1 = 0.03, s2 = 0.02, find the standard
deviation s of y - Y by using
the following two methods:
9. use analytical differentiation;

10. use numerical differentiation.