2. (Due September 9) Program a simulation of data fusion, and compare the accuracy of the simulation results with the theoretical estimate for this accuracy. Specifically:

- Assume some actual value of the measured quantity, and pick several standard deviations.
- Simulate measurement results by using the Gaussian random number generator that we developed in Problem 1.
- Then use the algorithm for data fusion to fuse these results.
- Repeat this process several times and compute the standard deviation of the fused result from the actual value.
- Compare this empirical standard deviation with the formula for the standard deviation of the results of data fusion.

3. (Due September 16)

a) Illustrate, on the example of the function y = x_{1}^{2}
+ x_{2}^{2},
with the nominal
values x_{1} = 1.0 and x_{2} = 2.0 and measurement errors
Δx_{1} = 0.1 and Δx_{2} = -0.1,
what will be the error Δy as estimated by the linearization method, and how
this estimated error compares with the actual value of this error. Perform
computations by hand on a sheet of paper.

b) Answer the same question for y = x_{1}/x_{2}. You can use a
calculator or a
computer for this part of the assignment.

4. (Due September 18, for extra credit): write down a detailed derivation of the formula for the standard deviation of the result of data processing.

5. (Due September 23)

a) Write a code that, given a data processing function
f(x_{1},...,x_{n}),
measurement results X_{1}, ..., X_{n}, and standard deviations
σ_{1}, ..., σ_{n} of the
corresponding measurement errors, estimates the standard deviation of the
error of the resulting indirect measurement. Make sure that this code does
not use any specific function, keep f as a generic name, so that we will be able
to use for any function. Make sure that the code does not use any specific value
n: use arrays of X and arrays of σ as input.

b) Check that this code is correct by running a computer simulation of the measurement and data processing process:

- start with actual values x
_{1}, ..., x_{n}; - compute the actual value y = f(x
_{1},...,x_{n}); - simulate measurement errors Δx
_{i}with given standard deviations σ_{i}; - simulate measurement results X
_{i}= x_{i}+ Δx_{i}; - simulate the result Y = f(X
_{1},...,X_{n}) of data processing; - estimate the simulated error Δy = Y - y.

6. (Due September 25) Write a code that, given a data processing function
f(x_{1},...,x_{n}),
measurement results X_{1}, ..., X_{n}, and standard deviations
σ_{1}, ..., σ_{n} of the
corresponding measurement errors, uses the Monte-Carlo algorithm described in
the class (and in the handouts) to
estimate the standard deviation of the
error of the resulting indirect measurement. Make sure that this code does
not use any specific function, keep f as a generic name, so that we will be able
to use for any function. Make sure that the code does not use any specific value
n: use arrays of X and arrays of σ as input.

This algorithm is as follows:

- start with measured values X
_{1}, ..., X_{n}; - compute the result Y = f(X
_{1},...,X_{n}) of data processing; - simulate measurement errors Δx
_{i}with given standard deviations σ_{i}; - simulate actual values x
_{i}= X_{i}- Δx_{i}; - simulate the actual value y = f(x
_{1},...,x_{n}) of data processing; - estimate the simulated error Δy = Y - y.

7. (Due October 21) On a simple example of a function which is monotonic (increasing or decreasing) with respect to each of its variables, compute the range of this function on given intervals, and compute the actual result with the results of linearized computations.

8. (Due October 21)

a) Write a code that, given a data processing function
f(x_{1},...,x_{n}),
measurement results X_{1}, ..., X_{n}, and upper bounds
Δ_{1}, ..., Δ_{n} on the (absolute values of the)
corresponding measurement errors, estimates the upper bound on the
error of the resulting indirect measurement. Make sure that this code does
not use any specific function, keep f as a generic name, so that we will be able
to use for any function. Make sure that the code does not use any specific value
n: use arrays of X and arrays of Δ as input.

b) Check that this code is correct by selecting, say,
10 equally spaced points within each interval [X_{i} - Δ_{i},
X_{i} + Δ_{i}], computing the value of
f for all possible combinations of these points, and finding the smallest and the
largest of these values.

9. (Due October 28) Write a code that implements the Cauchy-based Monte-Carlo algorithm for estimating the upper bound on the error of the result of data processing (this algorithm is given in the handout).

10. (Due November 4) Write a code that computes the uncertainty caused by
the possible un-reliability ("mistrust") of the measurement results. We start with
the set S of measurement results and the corresponding locations.
This set contains m values v_{i} of a certain quantity measured at
m different spatial
locations, with coordinates
x_{i} and y_{i}.

Based on this set S of measurement results, we interpolate the values from the measurement locations to the points from a rectangular grid, i.e., to the points with integer coordinates x = 0, 1, 2, ..., and y = 0, 1, ... In other words, based on the set S of measurement results, we estimate the value of our quantity at all the points from the grid. In your program, feel free to assume that we have a 10 x 10 grid, i.e., a grid formed by points (x,y) with x = 0, 1, ..., 9 and y = 0, 1, ..., 9.

As an interpolation algorithm, we use the 2-Nearest
Neighbor algorithm (2-NN). In this algorithm,
as a value of the desired quantity at a point (x,y) from the grid,
we take the arithmetic
average of the values v_{i} in two locations (x_{i},y_{i})
which are the closest to the point (x,y). In the algorithm, we will apply
this 2-NN algorithm not only to the original set of measurement results, but also
to its subsets.

Let us now assume that for each value v_{i}, we also know the
probability p_{i} that this values can be trusted. Your program should
use the Monte-Carlo simulation technique to estimate, for each grid point, the
standard deviation of the interpolated value caused by the possible mistrust.
Specifically, for a large number of iterations k = 1, ..., N,
repeat the following procedure:

- First, we create a subset S
^{(k)}of "correct" measurement results as follows. Each of m measurement locations (x_{i},y_{i}) is either included or not included into this subset S^{(k)}. This inclusion or not-inclusion is done "at random":- the i-th location (and the
corresponding measurement result) is included into the subset with the probability
p
_{i}; - the i-th location is not included into the subset with the remaining
probability
1 - p
_{i}.

- the i-th location (and the
corresponding measurement result) is included into the subset with the probability
p
- After the subset S
^{(k)}of measurement results (and measurement locations) is formed, we apply the same 2-NN interpolation algorithm to this new subset S^{(k)}. As a result, for each grid point (x,y), we compute the interpolated value v^{(k)}(x,y).

and compute the standard deviation s(x,y) in the usual way -- as the square root of the expression

This standard deviation
s(x,y) describes the
uncertainty at this particular grid point (x,y) caused by the possible un-reliability
of the measurement results v_{i}.

*Reminder:* one way to include a location (x_{i},y_{i})
with the given probability p_{i} is
to run the standard random
number generator (which generates numbers uniformly distributed on the interval
[0,1]). Then:

- if the resulting random value r
is smaller than
p
_{i}, we include the i-th location in the k-th subset; - otherwise, if the random value r is not smaller than p
_{i}, then we do not include the i-th location in the k-th subset.

As a result, for this same procedure, the probability that we *do not* include
the i-th location is equal to 1 - p_{i}.

11. (Due November 6) In class, we have shown that if we know the probabilities p(A) and p(B) of the events A and B, but we do not know whether they are dependent or not, then the probability p(A&B) can take any value from max(p(A) + p(B) -1,0) to min(p(A),p(B)). The largest value min(p(A),p(B)) occurs when one of the events is a subset of another one, the smallest possible value occurs when the largest possible part of A is outside of B: all of it if p(A) + p(B) <= 1 and as much as possible otherwise. As a homework, pick any two values p(A) and p(B), compute the smallest and the largest values of p(A&B), and show, e.g., on the example of subintervals from the interval [0,1], when these values are attained.

12. (Due November 11) Write a program that performs the re-scaled Monte-Carlo approach to solve the following problem. We are given a directed graph, in which a probability is assigned to every edge. The vertices in this graph describe agents, and the probability p(e) assigned to the edge e that connects vertices U and V is the probability which which the agent U (directly) trusts the agent V. Direct trusts lead to indirect ones: e.g., if U trusts V, and V trusts W, then U trusts W. We assume that all the trusts are independent. We are given two agents A and B, and we want to find the trust p with which A trusts B. Equivalently, we want to find the probability Δp = 1 - p that A does not trust B.

In principle, we can solve this problem by using standard Monte-Carlo simulations. Specifically, N times we form a sub-graph of the original graph by including each vertex e with the corresponding probability p(e). We then estimate the probability that A trusts B as the probability that in the resulting sub-graph, there is a path from A to B -- i.e., as the ratio N(A --> B)/N, where N(A --> B) is the number of simulations in which there is a path from A to B.

However, when the probabilities p(e) are close to 1, the resulting probability p
is also close to 1. To get a meaningful result, we must find p with a high
accuracy -- and this requires unrealistically many iterations N (e.g., if the
probability of mistrust is 0.1%, we need N =10^{6} iterations).

To speed up this process, we use re-scaling. Namely, instead of the original probabilities of mistrust Δp(e) = 1 - p(e), we take re-scaled probabilities p'(e) = 1 - λ * Δp(e), where λ is chosen in such a way that λ * Δp(e) is closer to 1 -- e.g., as λ = 0.1/(max Δp(e)). Based on these new probabilities p'(e), we apply the Monte-Carlo method and find the probability p' that A trusts B, and the remaining probability Δp' = 1 - p' that A does not trust B.

After that, we take new re-scaled probabilities
p''(e) = 1 - 2λ * Δp(e) and find the new value Δp''.
We have mentioned, in class, that in a good approximation,
Δp' = Δp * λ^{d} for some
natural number d, and that Δp'' = Δp * (2λ)^{d}. Thus,
Δp''/Δp' = 2^{d} and hence, we can find d as the integer
which is the closest to the logarithm log_{2}(Δp''/Δp').
Once we know d, we can estimate Δp as
Δp = Δp'/λ^{d}.

Test your algorithms on simple examples that we had in class:

- a simple graph with two nodes, in which A trusts B with probability p; your algorithm should return this same probability;
- a graph in which A trusts C with probability p
_{1}and C trusts B with probability p_{2}; for this graph, we should have p = p_{1}* p_{2}; - a graph in which two different edges connect A and B: one edge has probability
of direct trust p
_{1}, another has the probability of direct trust p_{2}; for this graph, we should have p = p_{1}+ p_{2}- p_{1}* p_{2}.

13. (Due November 13) Write a program that implements an algorithm (given in the
handout) for estimating
the worst-case trust for the case of possible dependence. We start with a graph
in which vertices represent agents, and an edge from an agent a to an agent b
means that agent a has reasons to directly trust agent b.
For every edge e that (connects nodes a and b),
we are given the probability p_{0}(e) with which, based on the given
evidence, the agent a directly trusts agents
b. We are also given two agents f (first) and second (s), and we want to find the
worst-case value p of the probability with which the agent f should trust the
agent b.

The algorithm works as follows. Based
on the given probabilities, we compute the "length" l(e) of each edge as
l(e) = 1 - p_{0}(e). Based on these edge lengths, we compute the length
d of the shortest path from f to s. The desired probability is then equal to
max(1 - d,0).