## Data Processing under Security and Privacy Homeworks for the course CS 4365/CS 5354, Summer 2016

1. (Due June 8) Similarly to how we used Newton's method to design algorithms for computing square root and cubic root, design an algorithm for computing the fourth order root.

2. (Due June 10) Write a method (program) for solving a system of linear equations. Turn in a printout of the code and a printout of an example on which you tested your method.

3. (Due June 8) Use the algorithm for computing 1/b that we had in class (and that is implemented in the computers) to perform the few first steps of computing 1/1.25.

4. (Due June 8) Use numerical differentiation to compute the derivative of the function x3 − a when a = 2 and x = 1.

5. (Due June 9) Use Newton's method to solve the following system of non-linear equations: x1 * x2 = 2, x12 − x22 = −3. One iteration is good enough, two iterations for extra credit.

7. (Due June 30) Find the point closest the origin on the line x1 + x2 = 1. In other words, find the values x1 and x2 for which the sum (x1)2 + (x2)2 attains the smallest possible value under the constraint x1 + x2 = 1.

8. (Due July 5) Similarly to what we did in class for selecting the optimal box for interval privacy protection, find the standard deviations σi that describe the optimal probabilistic disruption. In precise terms, find the values σ1, ..., σn for which the sum σ2 = (c1)2 * (σ1)2 + ... + (cn)2 * (σn)2 is the smallest possible under the constraint ρ(x) * 2n * σ1 * ... * σn = k. Here, ci is the partial derivative of the desired statistical characteristic C(x1, ..., xn) with respect to the i-th variable xi.

9. (Due July 5, 2016) Use linearization techniques to estimate the range of the function f(x1, x2) = (x1)2 + 4 * x1 * x2 + 4 * (x2)2 when x1 is in the interval [−1.2, −0.8] and x2 is in the interval [0.9, 1.1].

10. (Due July 5, 2016) Use naive (straightforward) interval computations to estimate the range from Problem 9.

11. (Due July 6, 2016) Write a program that, given a function f(x1, ..., xn) and n intervals, uses linearization techniques to estimate the range of the given function on the given intervals.

12. (Due July 11, 2016) Use the algorithm that we discussed in class to eliminate outliers from the following database:

• we have 100 records each equal to 1.8;
• we have 100 records each equal to 2.2;
• we have one record with value 10; and
• we have one record with value 1,000.
The algorithm is straightforward:
• we compute the sample mean μ = (x1 + ... + xn) / n,
• we compute the variance V = ((x1 − μ)2 + ... + (xn − μ)2) / (n − 1), and the sample standard deviation σ as the square root of V;
• then, we dismiss all the records which are not in the 2-sigma interval [μ − 2σ, μ − 2σ] as outliers;
• after that, we repeat the same procedure for the new database, with some outliers deleted: we re-compute μ and σ and, if needed, eliminate values which are not in the new 2-sigma interval, etc.;
• this process continues until, at some stage, no more outliers are eliminated.

Comment: In this homeworks, we selected 2-sigma intervals, but we could select 3-sigma, 6-sigma, etc.

13. (Due July 11, 2016) Use the general formula to find the sizes of the cell that provides the smallest possible inaccuracy in computing the covariance between height and weight; assume that:

• we are looking for k-anonymity with k = 5,
• we deal with a population of N = 20,000 students,
• height is uniformly distributed on the interval [150, 180];
• weight is uniformly distributed on the interval [50, 90]; and
• we are interested in the cell that covers a record with height x1 = 170 cm and weight x2 = 60 kg.

14. (Due July 14, 2016) Use the general algorithm to find the sizes of the call that provides the smallest possible inaccuracy in computing the covariance between height and weight; assume that:

• we are looking for k-anonymity with k = 5 and l-diversity with l = 3 and ε1 = ε2 = 0.1;
• we deal with a population of N = 20,000 students,
• height is uniformly distributed on the interval [145, 185];
• weight is uniformly distributed on the interval [50, 90]; and
• we are interested in the cell that covers a record with height x1 = 170 cm and weight x2 = 80 kg.

15. (Due July 14, 2016) How much privacy do we lose if someone detects the second digit x in the height 1xy? the third digit y? Assume that the height is between 145 and 185 cm.

16. (Due July 15, 2016) Use the efficient raising-to-the-power algorithm that we learned in class to compute 52016 mod 11.

Reminder: To raise a number n to the power p mod m, we first represent p in the binary code, as the sum of powers of two. Then, we compute n2 = n * n mod m, n4 = n2 * n2 mod m, n8 = n4 * n4 mod m, until we get all needed powers of two. Then, we multiply the resulting powers of two mod m.

Example: suppose that we want co compute 711 mod 9. In binary, 11 is 10112, i.e., 11 = 8 + 2 + 1. Thus, 711 = 78 * 72 * 71. Following the above algorithm, we first compute:

• 72 = 7 * 7 = 49 = 4 mod 9;
• 74 = 72 * 72 = 4 * 4 = 16 = 7 mod 9;
• 78 = 74 * 74 = 7 * 7 = 49 = 4 mod 9.
After that, we compute 711 = 78 * 72 * 71 = 4 * 4 * 7 mod 9. Here, 4 * 4 = 16 = 7 mod 9, so (4 * 4) * 7 = 7 * 7 = 49 = 4 mod 9. So, the answer is that 711 mod 9 = 4.

17. (Due July 15, 2016) Use Euclid's algorithm to compute the greatest common divisor gcd(2016, 361).

Reminder: Euclid's algorithm means that to compute gcd(a, b), where a > b, we repeatedly take into account that gcd(a, b) = gcd(b, a % b). We stop when we get remainder 0, in which case the last non-zero number id the desired gcd.

Example:

• To find gcd(128, 26), we first compute the remainder 128 % 26 = 24. This means that the desired gcd is equal to gcd(26, 24).
• Dividing once again, we get 16 % 24 = 2, this means that the desired gcd is equal to gcd(24, 2).
• Here, 24 % 2 = 0, so the last non-zero remainder 2 is the answer: gcd(128, 26) = 2.

18. (Due July 18, 2016) Use Euclid's algorithm to find a number d for which d * 50 mod 121 = 1.

Solution: First, we use Euclid's algorithm to check that indeed gcd(121, 50) = 1:

• 121 = 2 * 50 + 21;
• 50 = 2 + 21 = 8;
• 21 = 2 * 8 + 5;
• 8 = 1 * 5 + 3;
• 5 = 1 * 3 + 2;
• 3 = 1 * 2 + 1.
The last remainder is 1, so indeed gcd(121, 50) = 1.

Next, for each of the remainders 21, 8, etc., we represent it as a linear combination of the original numbers 121 and 50, until we get such a representation for remainder 1:

• 21 = 121 − 2 * 50;
• 8 = 50 − 2 * 21 = 50 − 2 * (121 − 2 * 50) = 50 − 2 * 121 + 4 * 50 = 5 * 50 − 2 * 121;
• 5 = 21 − 2 * 8 = (121 − 2 * 50) − 2 * (5 * 50 − 2 * 121) = 121 − 2 * 50 − 10 * 50 + 4 * 121;
so, 5 = 5 * 121 − 12 * 50;
• 3 = 8 − 5 = (5 * 50 − 2 * 121) − (5 * 121 − 12 * 50) = 17 * 50 − 7 * 121;
• 2 = 5 − 3 = (5 * 121 − 12 * 50) − (17 * 50 − 7 * 121) = 12 * 121 − 29 * 50;
• 1 = 3 − 2 = (17 * 50 − 7 * 121) − 12 * 121 − 29 * 50) = 46 * 50 − 19 * 121.
Here, 46 * 50 − 19 * 121 = 1, so 46 * 50 = 1 mod 121. The answer is d = 46.

19. (Due July 19, 2016) Show, step-by-step, how the RSA algorithm works, on the following example. We select p = 5, q = 7, and e = 11, and want to encode and decode the message m = 12. What you need to do:

• apply the general algorithm to find the secret code d that will be needed for decoding;
• then apply the public code values n and e to the message, to get the encoded message c; and
• finally, apply the secret code to the encoded message to recover the original message m.
Reminder: in preparation to the RSA encoding:
• first, we select two prime numbers p and q; these prime numbers are kept secret, but their product n = p * q is the first part of the public code;
• we then compute the auxiliary product φ(n) = (p − 1) * (q − 1), and select a number e for which gcd(e, φ(n)) = 1; this number e will be the second part of the public code;
• finally, we use the Euclid's algorithm to find a number d for which d * e = 1 mod φ(n) (i.e., for which the remainder d * e % φ(n) is equal to 1); this number d is a secret code -- that is used to decode encoded messages.
Once the public code is selected, the original message m is replaced by the encoded message which is computed as c = me mod n. On receiving the encoded message c, the recipient (who knows the secret code d) can decode the secret message as m = cd mod n.

20. (Due July 20, 2016) Use the Chinese remainder theorem algorithm to find a number n for which:

• n mod 7 = 1 and
• n mod 11 = 6.