Please use as many sheets of paper as you need; do not forget to place your name on all these sheets.
1-2. Security and Privacy
1. One of the methods proposed to preserve privacy in statistical databases is to only allow queries that combine at least m records (where m is a given threshold). Illustrate the limitations of this method on the following example of a database that contains age and GPA of several students; in this example, m = 5.
2. Another method of protecting privacy is adding a random error to the answer. What are the limitations of this method? How can we avoid the limitations of these privacy protection methods? Hint:instead of storing, e.g., the exact age of each student, we store ...
3--10. Uncertainty
3. Formulate the general problem of estimating the the upper bound D on the approximation error of result of data processing: what is known, and what we need to compute.
4-6. For the case when the data processing algorithm is f(x1,x2) = x1 - 2*x2, the measured values are X1 = 3.0 and X2 = 1.0, and the upper bounds on the measurement errors are D1 = D2 = 0.1, find the bound on D by using the following three methods:
4. use monotonicity to find the exact range and then, the value D;
5. use analytical differentiation;
6. use numerical differentiation.
7. Explain where bisection is used in a Monte-Carlo approach of estimating the upper bound D on the approximation error. What is the advantage of the Monte-Carlo approach? Why cannot we simply use numerical differentiation?
8. Use bisection to find the solution to the equation x^2=46 on the interval [0,8]. (It is sufficient to find the solution with an accuracy of 1.) Explain where bisection is used in a Monte-Carlo approach of estimating the upper bound D on the approximation error of result of data processing.
9. use analytical differentiation;
10. use numerical differentiation.