Appendix: Construction of Data

The number of elements in a 75% imprecise record is 12 where the domain is D{a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p} then an instance of a 75% imprecise record R can be the subset of D with twelve random elements. Consider such a 75% imprecise record R{a, c, d, f, g, h, j, k, m, n, o, p }. We construct different percentage of imprecision of records by picking up random elements from R. This is illustrated in Table 1.

Table 1. Construction of different percentage of imprecise data.

Percentage of Imprecesion

Records

75%

a, c, d, f, g, h, j, k, m, n, o, p

68.75%

a, c, d, f, h, j, k, m, n, o, p

62.5%

a, c, d, f, h, j, k, m, n, p

56.25%

a, d, f, h, j, k, m, n, p

50%

a, d, f, h, j, k, m, p

43.75%

a, d, f, h, k, m, p

37.5%

d, f, h, k, m, p

31.25%

d, f, h, k, m

25%

d, h, k, m

18.75%

d, h, k

12.5%

h, k

6.25%

(In this case, 6.25% data are imprecise means all the data are precise which basically means 0% imprecision in this case )

k

 

For preserving the randomness in large scale, a total of 30000 records were generated for each of the percentage file. One random element is eliminated from a 75% imprecise record at until the record length becomes an atomic element. This is the way how different percentage of imprecision is generated from a 75% imprecise record.

Hence for each record in the 75% imprecise data file, there exists corresponding records in different percentage files that are randomly shortened by certain number of elements at each percentage. That is, if the 75% imprecise file contains a record R{a, c, e, f, g, h, j, k, m, n, o, p} , then the 68.75% imprecise file contains a corresponding record that is a subset of R of length |R|-1, similarly the 62.5% imprecise file contains a corresponding record that is again a subset of R and of length |R|-2. This process continues until the size of the record becomes 1.

Each of our test case data files contained 30000 records. All the data files were executed using different types of hierarchies. The domain length of our test cases is 32. Hence, each record of the 75% imprecise data file contains 24 elements.

Data Files (total 24 files):

imprecision

Data File

75 %

0.txt

71.875 %

1.txt

68.75 %

2.txt

65.625 %

3.txt

62.5 %

4.txt

59.375 %

5.txt

56.25 %

6.txt

53.125 %

7.txt

50 %

8.txt

46.875 %

9.txt

43.75 %

10.txt

40.625 %

11.txt

37.5 %

12.txt

34.375 %

13.txt

31.25 %

14.txt

28.125 %

15.txt

25 %

16.txt

21.875 %

17.txt

18.75 %

18.txt

15.625 %

19.txt

12.5 %

20.txt

9.375 %

21.txt

6.25 %

22.txt

3.125 %

23.txt

Partition Tree Files (total 12 files):

h1.shr

h2.shr

h3.shr

h4.shr

h5.shr

h6.shr

h7.shr

h8.shr

h9.shr

h10.shr

h11.shr

h12.shr