Design and Implementation of Programming Languages: Why-Questions

Question. Why do we need to represent a language in BNF form? What is wrong with describing it in a natural language?

Answer. There exist algorithms that, given a formally described language, generate a compiler for this language. They are known as compiler compilers, the most well-known is yacc, which stands for Yet Another Compiler Compiler. These algorithms do not understand natural language, they need a formal description.

Question. Why computers use 2's complement? Why not use an easier representation: one bit for sign and the rest for the value?

Answer. In the usual representation, to add two numbers, we need to use different operations depending on whether the numbers have the same sign or different signs: if both numbers are positive, it is addition, if one of them is negative, it is subtraction. If we use 2's complement, we can use the same algorithm in all the cases.

Question. Why most computers use lowest-bit-first representation of integers? When we write down numbers, we put the highest bit first: e.g., 2017 starts with thousands.

Answer. When we perform arithmetic operations, such as addition, multiplication, etc., we start with the lowest bit -- this enables us to immediately find the value of the lowest bit of the result. Thus, when the number if presented lowest bit first, we can start performing the operation while the higher bits are still loading -- thus saving time. In the highest-bit first representation, we would have to wait until the whole number is loaded and only then start the operation.

Question. Why some programming language allow unlimited integers?

Answer. From the viewpoint of data processing, standard 4-byte integers are usually enough, they cover numbers up to 232 − 1, which is approximately 4 billion -- more than enough for most applications. If we need larger numbers, we can use longint data type, where 63 bits allows us to get numbers up to 232 − 1 which is approximately 1018.

Long integers are, however, crucial for security: RSA algorithm that is used for encoding (e.g., when we buy online by using https) requires processing 200-digit decimal numbers -- which means integers with up to 700 binary digits. Processing such numbers in usual programming languages is difficult. Operations with long integers allows us to program such operations easily.

A disadvantage of using unlimited-size integers is that it takes longer.

Question: Why IEEE standard for representing real numbers in a computer allows dividing a non-zero number by 0 -- with infinity as the result?

Answer: We often transform expressions without thinking that the transformation may sometimes lead to division by 0. For example, if we divide both the numerator and the denominator of the expression a / (a + b) by a, we get a new expression 1 / (1 + b / a). When a is different from 0, these two expressions are equal. However, when a = 0, the new expression is not mathematically correct, since it includes division by 0.

However, if we use IEEE computations, we still get the correct result even when a = 0. Indeed, here b / a is infinity. 1 plus infinity is infinity, so the denominator of the new expression is also infinity, and 1 divided by infinity is 0 -- as desired.

Question. Why some programming languages -- such as Cobol -- have a decimal data type?

Answer. In computing related to finance, accuracy is important. If we start with 10 dollars and we gain 2.1%, we want to get exactly $10.21. With usual binary numbers, this is difficult, since it is impossible to exactly represent a decimal fraction in binary. As a result, even if we store the number in binary form and then transform back into decimal, we may get a slightly different value. Being able to store and process data in decimal form, without the need to translate it into binary, solves this problem.

Question. Why do we need complex numbers?.

Answer. Complex numbers are actively used in physics and engineering. In programming languages without complex number data type, we can write programs for processing complex numbers, but of course it is more convenient and more efficient if compiler writers already have complex number operations implemented.

Question. Why in most languages, space allocated to each data type is a power of two: 1, 2, 4, 8 bytes.

Answer. Often, we have many different values. In this case, a natural idea is to place them in an array. If we place numbers a[0], a[1], ... one by one in computer memory, then it is easy to find the address of each element a[k] of the array: address(a[k]) = address(a[0]) + k * size-of-the-type. When size of the type is a power of two, multiplication by this size is easy: it is simply a shift by the corresponding number of bits.

Question. What is aliasing, what are its advantages and disadvantages of aliasing.

Answer. Aliasing means that we have two names for the same memory location. Some implicit aliasing exists in most languages, e.g., in Java, when we process lists, we often have two variables pointing to the same location. Some language -- like Fortran -- go further and allow explicit aliasing.

Question. What is deallocation, what are its advantages and disadvantages.

Answer. In most programming languages, a variable is valid within its pre-defined scope. Some languages allow to explicitly release the space allocated to the variable when this variable is no longer needed -- even when its scope still continues.

Question. What is the main advantage of an array.

Answer. The main advantage of storing many values in an array is fast access. The address of the i-th element of the array is equal to address(a[i]) = address(a[0]) + i * size-of-type. Sice the size of each type is a power of two, multiplication by size is simply a shift -- a very fast operation.

Question. What are advantages and disadvantages of jagged arrays.

Answer. In many practical situations, we use 2-D arrays. For example, if every day, we make several temperature measurements, we can store the j-th measurement at i-th day at a location a[i][j]. The standard Java array type assumes that we have the same number of records each day. However, in many practical situations, we have different number of measurements on different days.

Question. What are advantages and disadvantages of using named constants.

Answer.

Question. What are advantages and disadvantages of static vs. dynamic memory alloaction? Is memory allocation in Java static or dynamic?

Answer. Static memory allocation means that we allocate all the memory from the very beginning. This was done, e.g., in the first versions of Fortran. Dynamic means that we allocate at least some memory during the run-time. For example, in Java, we may read data from a file, count the number of records, and then design an array with exactly as many records.

The main idea behind C is to save time, so in C -- and in its ancestors like Java -- memory allocation is dynamic.

Question> What is a subtype? What are advantages and disadvantages of using subtypes?

Answer. In some language like Ada, in addition to having a type like integer, we can also define its subtype, e.g.,

  subtype percent is integer range 0..100;
We can do the same operations on values of this subtype, the only difference is that every type we assign a new value to a variable of this new type, we check whether this value is within the given bounds, and return an error message (or raise an exception) if it is not. This is similar to checking the array indices in Java, and it has similar advantages and disadvantages:

Question. What is a derived type? What are advantages and disadvantages of using derived types?

Answer. In real-life computations, to describe a value of a quantity, we need not only the number, but also the unit: e.g., 2.5 m is different from 2.5 sec. An explicit description of these units helps avoid mistakes: e.g., it does not make sense to add distance and time. So avoid such mistakes, some languages such as Ada allow to use every type like integer or double to generate a new type: meters, second, etc. Then, for example:

Advantages and disadvantages are the same as for subtypes:

Question. What is the difference between a pointer and a reference?

Answer. A pointer, by definition, is an address. A reference can be an address -- for most Java compilers, it is an address -- but it does not have to be an address, it can be some information based on which we can reconstruct an address. As a result:

Question. What is Algol's call-by-name, what are its advantages and disadvantages.

Answer. Most programming language use either call-by-value, when the value is passed to the method, or call-by-reference, when a reference (usually, an address) is passed to the method, or a combination of both. For example, Java uses call-by-value for basic types such as int, double, etc., and call-by-reference for all other types. In Algol, there was a third type of calling a method: call-by-name, when an expression is substituted into the code of a method before compiling. The advantage is that it is close to the usual mathematical notation. For example, the usual expression for the sum Σi = <lower><upper> <expression>, for any expression, is described by the following Algol-type code (I am using the Java syntax):

  public static double sum(int variable, int lower, int upper,
    double expression){
      double result = 0.0;
      for(int variable = lower; variable <= upper; variable++){
        result += expression;}
      return result;
  }
For example, to compute the sum Σk = 3m sin(k − m), we simply write
  double r = sum(k, 3, m, sin(k − m));
The compiler will substitute k instead of the variable, 3 instead of lower, m instead of upper, and sin(k − m) instead of the expression, resulting in the following code:
    double result = 0.0;
    for(int k = 3; k <= m; k++)
      {result += sin(k − m)};
    return result;
Notice that the result is different from using call-by-value or call-by-name, because for these two calls, the same value will be used in all iterations, while in call-by-name, on different iterations, different values are added to result.

The main disadvantage is that compiling becomes more time-consuming; this is the main reason why this feature is not used in modern programming languages.

Question.What is parallel-for (to be more precise, parallel-do)?

Answer. Parallelization is needed when the program takes too long. Usual no-loop operations are very fast, what is time-consuming is loops. Thus, parallelizing loops is very important.

When you write a for-loop with a counter variable, the usual intent is that the operations are performed in the prescribed order. For example, if the counter variable is increased by 1 every cycle (e.g., by using i++ in Java), this means that we start the original value i of the counter variable, then perform the operation with the next value i0 +1, etc.

In some case, this order is important, changing the order will ruin the results. In other cases, however, the order is irrelevant. In this case, if we have several processors, we can run some of the iterations on one of the processors, some on others, without changing the result. An example is a loop

 for(int i = 0; i < a.length; i++)
   c[i] = a[i] + b[i];}
Iteration corresponding to i = 0 computes c[0] as a[0] + b[0], iteration corresponding to i = 1 computes the value c[1] as a[1] + b[1], etc. Each of these computations do not depend on each other, so the order does not matter, and this loop can be easily parallelized.

For a computer, it is difficult to tell when such a parallelization is possible. Thus, Fortran allows the programmers to explicitly indicate, to the compiler, that a loop is parallelizable. This is the parallel-for construction (in Fortran, it is actually called parallel-do, since in Fortran for-loops are called do-loops).

The advantage is that we can parallelize. A disadvantage is that if a programmer is wrong, the computer will still parallelize, it does not check whether a parallelization is indeed possible; thus, this feature adds the probability of making a mistake.

It should be mentioned that Java has a similar construction, a version of a for-loop when the index runs over a set. The main difference is that in Java, there is no automatic paralellization.

Question. What are multiple assignments?.

Answer. In Java, in a statement you can assign the new value only to one variable. For example x = x+ 3 means that to the variable x we assign a new value: the old value of x plus 3.

In Java, if you need to assign new values to two or more variables, you need to write two or more assignment statements. In Perl, you can assign new values to several variables at a time, by a single assignment statement.

For example, in Perl you can write an assignment statement like this (I am using Java syntax, just describing the idea): (x,y) = (y,x). This means that to the variable x we assign the new value -- the old value of y, and to the variable y, we assign the old value of the variable x. This is a rather usual swap operation, which is frequently used in programming practice. In Java, we have to implement it ourselves, by defining a new temporary variable. With multiple assignments, swap is automatic.

Main disadvantage: like every other additional feature, it slows down compiling and computation.

Question. What are guarded commands.

Answer. In Java, whether you write a multiple if-then-else statement or case-of statement, the computer follows the cases in the order in which the options are presented. This does not allow the possibility of parallelization: we cannot check the next condition until we check the previous ones.

Guarded commands are another way to set up conditional statements, when the order of options is not relevant. This way, we can have a computer check several conditions in parallel, and if one of the condition is satisfied, we follow the corresponding instructions. The conditions are called guards, and the instructions are called commands.

In this case, it is possible that the same value is covered by several different conditions. For example, we can write the computation of absolute value y = |x| as follows (using Java-like syntax):

 guarded commands{
   if (x >= 0) {y = x;}
   if (x <= 0) {y = -x);}
 }
When x is equal to 0, two different statements are applicable, they both lead to the same result, that to the variable y we assign a new value 0.

The advantage is that the resulting parallelization speeds up computations. The disadvantage is that if we made a mistake and for the value covered by different guards, different commands lead to different results, the computer may still produce the correct result and we will never know it. For example, if we write

 guarded commands{
   if (x >= 0) {y = x;}
   if (x == 0) {y = 2;}
   if (x <= 0) {y = -x);}
 }

the middle command is a mistake, but if we run a program to test its correctness, the computer may not use the second line. This is a typical problem for non-deterministic compilers, including parallel ones: the results are more efficient, but the probability of error increases.