## Homework assignments, CS 5353, Fall 2008

1. (Due September 2) Write a program that simulates normally distributed random variables with 0 mean and standard deviation 1. Check that the resulting numbers indeed have the right mean and the right standard deviation.

2. (Due September 9) Program a simulation of data fusion, and compare the accuracy of the simulation results with the theoretical estimate for this accuracy. Specifically:

• Assume some actual value of the measured quantity, and pick several standard deviations.
• Simulate measurement results by using the Gaussian random number generator that we developed in Problem 1.
• Then use the algorithm for data fusion to fuse these results.
• Repeat this process several times and compute the standard deviation of the fused result from the actual value.
• Compare this empirical standard deviation with the formula for the standard deviation of the results of data fusion.

3. (Due September 16)

a) Illustrate, on the example of the function y = x12 + x22, with the nominal values x1 = 1.0 and x2 = 2.0 and measurement errors Δx1 = 0.1 and Δx2 = -0.1, what will be the error Δy as estimated by the linearization method, and how this estimated error compares with the actual value of this error. Perform computations by hand on a sheet of paper.

b) Answer the same question for y = x1/x2. You can use a calculator or a computer for this part of the assignment.

4. (Due September 18, for extra credit): write down a detailed derivation of the formula for the standard deviation of the result of data processing.

5. (Due September 23)

a) Write a code that, given a data processing function f(x1,...,xn), measurement results X1, ..., Xn, and standard deviations σ1, ..., σn of the corresponding measurement errors, estimates the standard deviation of the error of the resulting indirect measurement. Make sure that this code does not use any specific function, keep f as a generic name, so that we will be able to use for any function. Make sure that the code does not use any specific value n: use arrays of X and arrays of σ as input.

b) Check that this code is correct by running a computer simulation of the measurement and data processing process:

• compute the actual value y = f(x1,...,xn);
• simulate measurement errors Δxi with given standard deviations σi;
• simulate measurement results Xi = xi + Δxi;
• simulate the result Y = f(X1,...,Xn) of data processing;
• estimate the simulated error Δy = Y - y.
Repeat this procedure several times, and use the resulting values Δy to estimate the desired standard deviation σ.

6. (Due September 25) Write a code that, given a data processing function f(x1,...,xn), measurement results X1, ..., Xn, and standard deviations σ1, ..., σn of the corresponding measurement errors, uses the Monte-Carlo algorithm described in the class (and in the handouts) to estimate the standard deviation of the error of the resulting indirect measurement. Make sure that this code does not use any specific function, keep f as a generic name, so that we will be able to use for any function. Make sure that the code does not use any specific value n: use arrays of X and arrays of σ as input.

This algorithm is as follows:

• compute the result Y = f(X1,...,Xn) of data processing;
• simulate measurement errors Δxi with given standard deviations σi;
• simulate actual values xi = Xi - Δxi;
• simulate the actual value y = f(x1,...,xn) of data processing;
• estimate the simulated error Δy = Y - y.
Repeat this procedure several times, and use the resulting values Δy to estimate the desired standard deviation σ.

7. (Due October 21) On a simple example of a function which is monotonic (increasing or decreasing) with respect to each of its variables, compute the range of this function on given intervals, and compute the actual result with the results of linearized computations.

8. (Due October 21)

a) Write a code that, given a data processing function f(x1,...,xn), measurement results X1, ..., Xn, and upper bounds Δ1, ..., Δn on the (absolute values of the) corresponding measurement errors, estimates the upper bound on the error of the resulting indirect measurement. Make sure that this code does not use any specific function, keep f as a generic name, so that we will be able to use for any function. Make sure that the code does not use any specific value n: use arrays of X and arrays of Δ as input.

b) Check that this code is correct by selecting, say, 10 equally spaced points within each interval [Xi - Δi, Xi + Δi], computing the value of f for all possible combinations of these points, and finding the smallest and the largest of these values.

9. (Due October 28) Write a code that implements the Cauchy-based Monte-Carlo algorithm for estimating the upper bound on the error of the result of data processing (this algorithm is given in the handout).

10. (Due November 4) Write a code that computes the uncertainty caused by the possible un-reliability ("mistrust") of the measurement results. We start with the set S of measurement results and the corresponding locations. This set contains m values vi of a certain quantity measured at m different spatial locations, with coordinates xi and yi.

Based on this set S of measurement results, we interpolate the values from the measurement locations to the points from a rectangular grid, i.e., to the points with integer coordinates x = 0, 1, 2, ..., and y = 0, 1, ... In other words, based on the set S of measurement results, we estimate the value of our quantity at all the points from the grid. In your program, feel free to assume that we have a 10 x 10 grid, i.e., a grid formed by points (x,y) with x = 0, 1, ..., 9 and y = 0, 1, ..., 9.

As an interpolation algorithm, we use the 2-Nearest Neighbor algorithm (2-NN). In this algorithm, as a value of the desired quantity at a point (x,y) from the grid, we take the arithmetic average of the values vi in two locations (xi,yi) which are the closest to the point (x,y). In the algorithm, we will apply this 2-NN algorithm not only to the original set of measurement results, but also to its subsets.

Let us now assume that for each value vi, we also know the probability pi that this values can be trusted. Your program should use the Monte-Carlo simulation technique to estimate, for each grid point, the standard deviation of the interpolated value caused by the possible mistrust. Specifically, for a large number of iterations k = 1, ..., N, repeat the following procedure:

• First, we create a subset S(k) of "correct" measurement results as follows. Each of m measurement locations (xi,yi) is either included or not included into this subset S(k). This inclusion or not-inclusion is done "at random":
• the i-th location (and the corresponding measurement result) is included into the subset with the probability pi;
• the i-th location is not included into the subset with the remaining probability 1 - pi.
(See below an explanation of how to include or not include a location with a given probability.)
• After the subset S(k) of measurement results (and measurement locations) is formed, we apply the same 2-NN interpolation algorithm to this new subset S(k). As a result, for each grid point (x,y), we compute the interpolated value v(k)(x,y).
After N iterations of this procedure, for each grid point (x,y), we have N different interpolated values v(1)(x,y), ..., v(N)(x,y) coming from different iterations. We can then compute the arithmetic average m(x,y) of these N values, as

m(x,y) = (v(1)(x,y) + ... + v(N)(x,y))/N,

and compute the standard deviation s(x,y) in the usual way -- as the square root of the expression

s2(x,y) = ((v(1)(x,y) - m(x,y))2 + ... + (v(N)(x,y) - m(x,y))2)/N.

This standard deviation s(x,y) describes the uncertainty at this particular grid point (x,y) caused by the possible un-reliability of the measurement results vi.

Reminder: one way to include a location (xi,yi) with the given probability pi is to run the standard random number generator (which generates numbers uniformly distributed on the interval [0,1]). Then:

• if the resulting random value r is smaller than pi, we include the i-th location in the k-th subset;
• otherwise, if the random value r is not smaller than pi, then we do not include the i-th location in the k-th subset.
Here, the probability that we include the i-th location is equal to the probability that the random value r is smaller than pi, i.e., to the probability that r belongs to the interval [0, pi). Since r is distributed according to the uniform distribution, this probability is equal to the width of the interval [0, pi), i.e., to the desired value pi. Thus, if we follow this procedure, we include the i-th location with the probability pi.

As a result, for this same procedure, the probability that we do not include the i-th location is equal to 1 - pi.

11. (Due November 6) In class, we have shown that if we know the probabilities p(A) and p(B) of the events A and B, but we do not know whether they are dependent or not, then the probability p(A&B) can take any value from max(p(A) + p(B) -1,0) to min(p(A),p(B)). The largest value min(p(A),p(B)) occurs when one of the events is a subset of another one, the smallest possible value occurs when the largest possible part of A is outside of B: all of it if p(A) + p(B) <= 1 and as much as possible otherwise. As a homework, pick any two values p(A) and p(B), compute the smallest and the largest values of p(A&B), and show, e.g., on the example of subintervals from the interval [0,1], when these values are attained.

12. (Due November 11) Write a program that performs the re-scaled Monte-Carlo approach to solve the following problem. We are given a directed graph, in which a probability is assigned to every edge. The vertices in this graph describe agents, and the probability p(e) assigned to the edge e that connects vertices U and V is the probability which which the agent U (directly) trusts the agent V. Direct trusts lead to indirect ones: e.g., if U trusts V, and V trusts W, then U trusts W. We assume that all the trusts are independent. We are given two agents A and B, and we want to find the trust p with which A trusts B. Equivalently, we want to find the probability Δp = 1 - p that A does not trust B.

In principle, we can solve this problem by using standard Monte-Carlo simulations. Specifically, N times we form a sub-graph of the original graph by including each vertex e with the corresponding probability p(e). We then estimate the probability that A trusts B as the probability that in the resulting sub-graph, there is a path from A to B -- i.e., as the ratio N(A --> B)/N, where N(A --> B) is the number of simulations in which there is a path from A to B.

However, when the probabilities p(e) are close to 1, the resulting probability p is also close to 1. To get a meaningful result, we must find p with a high accuracy -- and this requires unrealistically many iterations N (e.g., if the probability of mistrust is 0.1%, we need N =106 iterations).

To speed up this process, we use re-scaling. Namely, instead of the original probabilities of mistrust Δp(e) = 1 - p(e), we take re-scaled probabilities p'(e) = 1 - λ * Δp(e), where λ is chosen in such a way that λ * Δp(e) is closer to 1 -- e.g., as λ = 0.1/(max Δp(e)). Based on these new probabilities p'(e), we apply the Monte-Carlo method and find the probability p' that A trusts B, and the remaining probability Δp' = 1 - p' that A does not trust B.

After that, we take new re-scaled probabilities p''(e) = 1 - 2λ * Δp(e) and find the new value Δp''. We have mentioned, in class, that in a good approximation, Δp' = Δp * λd for some natural number d, and that Δp'' = Δp * (2λ)d. Thus, Δp''/Δp' = 2d and hence, we can find d as the integer which is the closest to the logarithm log2(Δp''/Δp'). Once we know d, we can estimate Δp as Δp = Δp'/λd.