1-2. In class, we learned the following algorithm for eliminating
outliers:
1. Use this algorithm to eliminate outliers from the following
database:
2. Explain how outlier elimination is used in computer
security.
3. In class, we derived a general formula for the optimal sizes
Δ
i of a privacy-enhancing box for which the
inaccuracy in the resulting estimation of a statistical
characteristic C is the smallest possible under the condition of
k-anonymity: Δ
i = c / a
i, where
a
i is the absolute value of the partial derivative of C
with respect to the i-th variable x
i, and c = (1/2)
*
√(k * a1
* a2 * ...) / ρ(x) , where
ρ(x) is the data density, i.e., number of records per unit
volume.
Use the general formula to find the sizes of the cell that
provides the smallest possible inaccuracy in computing the
covariance between weight and age for adults. Assume that:
- we are looking for k-anonymity with k = 10,
- we deal with
a population of El Paso, N = 700,000 students,
- age is
uniformly distributed on the interval [20, 90];
- weight is
uniformly distributed on the interval [50, 90]; and
- we are
interested in the cell that covers a record with age x1
= 65 and weight x2 = 80 kg.
For covariance, the
derivative with respect to x
1 is equal to
(x
2 − μ
2) / N and the derivative
with respect to x
2 is equal to (x
1 −
μ
1) / N, where μi are the sample
means of the corresponding values.
4. What if in Problem 3, in addition to k-anonymity, we also
require l-diversity, with l = 2, and the thresholds
ε
1 = 0.2 and ε
2 = 0.01?
Reminder:
- If for the k-anonymous solution, we have
2Δi ≥ l
* εi, then l-anonymity is also satisfied.
- If for some of the variables xi, this inequality
is not satisfied, then for this variable, we select:
- for
this variable, we select Δi = (1/2) * l
* εi, and
- for other variables, we select Δj from the
condition that the cell contain k records, i.e., that ρ(x)
* 2n * Δ1 * Δ2 * ...
= k.
5. What percentage of privacy do we lose if someone detects the
first digit x in the age xy? the second digit y? Assume that the
age is between 20 and 90.
6. Based on the Problem 5, it may look like the last decimal digit
-- the remainder of dividing a number by 10 -- is irrelevant, and
should not be protected much. Explain that it is not so: what if
someone knows the remainder by 10 and the remainder in the hex
system (when divided by 16) -- can we then uniquely determine the
adult age (from 20 to 90)? Explain your answer.
7. Use the efficient raising-to-the-power algorithm to compute
5
21 mod 11. Where is this algorithm used in RSA
coding?
8-9. Use Euclid's algorithm to compute the greatest common divisor
gcd(11, 21). Then find a number d for which d * 11 mod 21 = 1.
Where is this used in RSA coding?
10. Briefly describe your project for this class, and what you
have done so far for this project.