next up previous contents
Next: Execution and Linking Up: No Title Previous: Class File format

Interpreter

The Interpreter is the corner stone of the JVM. This is where the opcodes are executed.

In this chapter we will describe the concepts of objects as well as frames since they are a vital part of the implementation. We will also describe what happens to the opcodes as they enter the Interpreter. A Java instruction consists of an opcode specifying the action to take, followed by zero or more operands. Operations most often take their operands from the stack. After the computation is done the result is put back on the stack. A short description on the different types of opcodes used is found in appendix A.

Objects and frames are allocated with malloc() and deallocated with free() in our implementation. In a more advanced version of our JVM one will have to implement memory management (garbage collector), thus a new form of allocation needs to be employed.

Objects

Since Java is an object-oriented language, the JVM must be able to handle objects. By handle we mean that the JVM is responsible for creating, using and terminating objects. More specifically, it is the Interpreter's job to handle objects.

The implementor of the JVM may decide how he or she wants to represent a Java object in their implementation. Our Java objects are described by the C struct below:

typedef struct object_info
{
  void* data;           
  uword size;           
  ArrayType array_type; 
  ClassFile* CF_p;   
  SyncObject* lock;
} object_info;

The data is a pointer to an array of data. This is where the instance variables are stored. It is declared void* since the type of the instance variables may vary.

To access the object's methods we use CF_p, which is a pointer to the ClassFile struct for the object's class. The methods are stored in the class file. Every object created from the same class use the same methods. This can be done since methods are not changed during execution.

The variables are unique thus every object must have its own set of variables, except for those declared static. Some of the variables are inherited from the object's super classes and the rest are defined by the class. To allocate the right amount of memory for the instance variables we need to check all the super classes' instance variables. The search ends when class Object is reached. All classes extend class Object like in Smalltalk.

Static variables are shared among the objects of the same class, thus they are only stored in the ClassFile struct, not in the object_info. The static variables are reached through a pointer called static_p in the ClassFile struct. Actually, in every entry in the fields array.

The number of entries in the data array is given by size. Each element in the array is 32-bits wide even though it is unnecessary in some cases. If an element represents a variable of type short there are 16-bits not actually used. A variable declared short is a 16-bits entity. The reason for this is alignment problems with the Etrax chip. Every operand that the chip uses must be 32-bits wide. Instead of converting every variable to fit the processor, a direct copy can be done. The variables double and long are specified to be 64-bits wide. To fit them in the array two neighboring entries in the data array are used. The most significant 32-bits are stored at the lower index in the array and the least significant 32-bits are stored directly afterwards. Every double and long variable thus increase the size by two.

The object_info is used both as a head for objects and arrays. The entry called array_type indicates if the object represents an array or an object. If it is an array, then the array_type holds information about the type of the elements in the array. An array containing integers will have an array_type equal to T_INT. If the object_info represents a string object the array_type will equal T_OBJECT. If array_type equals T_ARRAY, then the object_info represents a multidimensional array.

   figure281
Figure 3: The object_info structure

Frames

  As described earlier an object contains variables and methods. Methods can be viewed as the actions an object is capable of. An object MyCircle may be drawn and moved. When you want to draw a circle you create the object MyCircle and then you invoke the method Draw. The method Draw must be implemented in class MyCircle or one of its super classes.

When the Interpreter wants to execute a method it needs to create a frame associated with the method. For every method invocation a new frame is created. A frame is actually an execution environment that contains an operand stack, some local variables and its own program counter (pc). Our frames are described by the C struct below:

typedef struct frame
{
  uword method_index;   
  uword code_index;  
  byte* pc;
  dword* vars;
  dword* optop;
  dword* optop_start;
  byte nargs;          
  ClassFile* CF_p;
  struct frame* return_frame_p;
  object_info* error_p;  
} frame;

A pointer to the class file (i.e. our ClassFile struct) is stored in CF_p.

Every method that the class implements is described by a method_info (see appendix B.2) in the class file. The method_info gives us the name of the method and its descriptor. The descriptor is a specification of the arguments that the method needs and its return type. The number of arguments is stored in nargs.

   figure306
Figure 4: The frame with its pointers to a ClassFile.

Method call
When a method is invoked the class file where the method is implemented must be found. The invoking opcode is followed by an index. The index is used into the current class' constant_pool where a CONSTANT_Methodref is found. This gives us the name of the class as well as the name of the method and its descriptor.

If the invoking opcode is invokevirtual then the wanted class is based on the run-time state of the object on the stack. The other invoking opcodes (for static, final etc) gets the class from the Constant_Methodref as described above since the compiler can determine this at compile time.

The wanted class file is searched for the corresponding method_info which is found in the array methods. One could say that we compare our CONSTANT_Methodref with each method_info.

When the method is found, a frame is allocated and the index into the methods array is stored in method_index.

The method_info contains an array of attributes and there must be one and only one Code_attribute unless it's a native method. The index into the attributes array is saved in code_index.

The Code_attribute has a Code pointer which points to the beginning of this method's code block (i.e. the opcodes) hence we set pc to refer to this block. Figure 4 shows the elements used in the frame too keep track of the method. Also the Code_attribute contains information on the number of local variables and the size of the stack. These numbers are needed to determine the size of the frame.

When this is done we set vars to refer to the beginning of the array holding local variables and optop to refer to the top of the stack. The operand stack and the local variable area are both allocated in each frame. We don't use a global stack and a global variable area.

Method return
Since a method always is invoked by another method we need to be able to reach this calling method when the invoked method is finished (i.e. a return is encountered). This is done through the pointer return_frame_p.

The finished frame is deleted and the calling frame is sent to the Interpreter where execution is continued at the calling frame's pc. This also enables recursive calls since a new frame is created even if it is the same method.

Passing arguments
To pass arguments to an invoked frame one simply moves the values from the calling frame's stack (optop) to the invoked frame's local variables (vars). To return a value one pops the value from the invoked frame's stack and pushes it on the calling frame's stack.

Finally, error_p is used when dealing with exceptions which is described in chapter 7.

The structure of the Interpreter

The only task of the Interpreter is to execute the instruction it is given. It is not up to the Interpreter to keep track of threads, frames and memory management. Every opcode is represented by a 2-digit hex number, thus the Interpreter's basic contents is one big switch statement. One entry for each opcode.

The registers

When instructions are being executed the Interpreter can be viewed as a software or virtual processor. Since it works as a processor it needs registers. The Java Interpreter uses only four registers, namely pc, optop, vars and frame. The frame register points to the frame struct of the currently executing method.

The program counter (pc) always points to the next instruction to be executed. The code is stored in method_info structs, one struct for every method that the class implements. The Interpreter makes sure that the pc is incremented the right amount for every instruction executed.

The address to the section in memory that contains the current method's local variables is stored in the vars register. The vars register always points to the first local variable of the currently executing method.

The operand stack can be viewed as a working area for methods since this is where the operands and the results from computations are placed. The optop register refers to the top of the operand stack of the currently executing method. It is incremented by the Interpreter.

At a method call, the current values of pc, optop and vars are stored in the frame of the calling method, so that these registers can be restored when control is returned to the calling method.

The execution

As mentioned earlier every opcode has its own entry in the switch statement. In each of these entries a number of actions are performed. These actions are executed in a C environment, i.e. some compiled C block is executed, and the result is placed on the current frame's stack. For instance, if a subtraction is to be performed the Interpreter pops the current frame's stack twice in order to get the two operands. The subtraction is done and the result is pushed onto the frame's stack. Some actions require little work and the Interpreter needs no help. In other cases the Interpreter has to rely on other parts of the JVM. If a method invocation is to be performed a new frame representing the invoked method has to be created. This is done outside the Interpreter by a function that will return this new frame and execution will continue.

Resolution

Resolution is a way to increase the performance of the JVM. Interpretation is a tedious procedure since many checks have to be done to ensure that the execution can proceed properly. Many of these checks need only to be done once. Resolution makes sure that these checks are done only once, thus gaining speed if the same code is executed again. This happens in a for or a while loop, or if a method is called more than once. Resolution is not part of the JVM specification, but it is described as an optimization in [ArnGos96].

Resolution algorithm

Some opcodes refer to the constant_pool array of a class file where the symbolic references are stored. The reference must be looked up to gain access to the real value. This look-up is time consuming.

For example, the opcode putfield sets a variable in an object. The stack contains a reference to the object and the new value of the instance variable. To set the variable, the JVM must find the offset of the instance variable in the object. The offset indicates where the variable is located in the instance variable array called data. Every instantiated object contains the data element. The putfield opcode's operand is an index into the constant_pool array of the class containing the opcode. The entry contains a symbolic reference to the class containing the variable, the symbolic name and descriptor of the variable. The class must be loaded and the variable must be found in order to reach the offset. The resolved corresponding entry contains the offset, thus the time consuming procedure described above is avoided.

When the symbolic reference is ``resolved'' it should be stored for future use since other opcodes may have to use the same reference. Figure 5 describes how the symbolic reference to a class is resolved.

   figure390
Figure 5: The work to resolve a symbolic class reference.

One problem remains. The opcode is designed to be used with the symbolic reference. Every time it is executed the JVM will presume that the symbolic reference must be looked up. The way to circumvent this problem is to design new opcodes that use the direct reference at once.

The overall procedure to interpret an opcode which can be resolved is:

  1. Resolve the specified item in the constant_pool.
  2. Throw an exception if the resolution didn't work out.
  3. Change the opcode to a _quick variant, that uses the real value instead of the symbolic.
  4. Execute the _quick opcode.

To keep track of the resolved items in the constant_pool a new array at constant_pool[0] can be allocated. The elements in that array shows if the corresponding entry in the constant_pool is resolved or not.

JAVAX doesn't use an array to mark resolved objects. Instead, the tag field in the resolved cp_info is marked.

Resolved items

When an entry in the constant_pool is resolved it is replaced with the resolved item. The corresponding entry in the constant_pool[0] array is marked as resolved.

The different entries in the constant_pool that can be resolved are listed below. Appendix B.1 contains a description of these entries. The work that is needed to resolve the item is described directly after the name of the entry.

CONSTANT_Class_info
The class that this symbolic reference describes has to be located.
If the ClassLoader doesn't find the class in its list, the class is loaded and initialized.
CONSTANT_Fieldref_info
The type and the location of the field, i.e. variable, has to be found.
The class containing the field is symbolically referenced and has to be found. The procedure is the same as with CONSTANT_Class_info. When the class is found, it is searched for the field.
If the field is not declared static the type and offset is stored as the resolved item in the constant_pool and the entry is marked resolved. The offset shows where the field is located in the object.
If the field is static the pointer to the static field is stored and the element is marked resolved.
CONSTANT_InterfaceMethodref_info,
CONSTANT_Methodref_info
The resolved item contains information to help the JVM build a frame for this method.
The class containing the method is symbolically referenced and has to be found. The procedure is the same as with CONSTANT_Class_info.
The method has to be found in the class. The method_info contains all the information needed to build the frame.
In our implementation we store different resolved items depending on the type of method.
If the method is declared native we store an index to the native method. The NativeHandler uses the index to access the native method. In any other case we store information about the index of the method_info, and its code attribute index. The number of arguments and a pointer to the class file that contains the method are also stored. The frame can easily be built with this information.
CONSTANT_String_info
The String object has to be created. A pointer to the String is stored.
CONSTANT_Integer_info,
CONSTANT_Float_info
The value corresponding to the number is stored.
CONSTANT_Long_info,
CONSTANT_Double_info
The value corresponding to the number is stored. We store the value at two consecutive constant_pool entries. The entry following one of these two cp_info entries is invalidgif. We use it to store the 32 least significant bits of the value.

Opcode replacement

The opcodes that use the same type of symbolic reference, e.g. constants, can be changed in a similar way. The following list describe how opcodes that uses symbolic references are transformed to the corresponding ``real reference'' opcode.

Field access

To access a field in an object the offset and the type of the object is needed. The offset gives the object's internal place for the field. The resolved entry in the constant_pool contains this information.

The index into the constant_pool is replaced by the offset. The opcode is changed to hold information about the size of the field. There are many different types in Java but there are only two different sizes they use in memory, namely 32- and 64-bits.

For instance, the putfield opcode could be changed to putfield_quick_w to show that the operand holds the offset and not the index. The _w extension shows that the element is 64-bits wide and thus it is stored in two subsequent entries.

Constants

The opcodes that get constants from the constant_pool are changed to include the size of the constant. The _quick opcode does a simple copy operation without checking the type.

Another optimization would be to store the constant as operand to the opcode. A problem is that the size of the constant may not fit in the operand area.

Class references

Opcodes that use symbolic class references are changed to use realgif references to classes.

The pointer to the class could replace the operands, if the pointer is not larger than the operands in size. Otherwise the operands are left unchanged.

Resolution analysis

Resolution increases the performance of the Interpreter but there is still some time consuming overhead as the application is started. It can't be avoided because the symbolic references must be resolved.

If a Just-In-Time Compiler JVM is built, the symbol table can be omitted when every opcode that uses the table is resolved. The resolved item can be stored as an operand. This removes the extra lookup in the constant_pool to find the resolved item.

The only opcodes that are referring to a class file's symbol table are the ones located in the class file. When these opcodes are resolved there is little use for the symbol table. There are still some symbolic references left in the various attributes, but these are not difficult to replace with the real reference since the procedure is the same as for opcode resolving.

Native methods

The language Java and the Java compiler, are both implemented in Java to make it as portable as possible. It does not however make it entirely portable. At some point the JVM has to hand over the control to some platform specific code. These methods are called native.

Native methods are the means for a Java program to interact with its environment like operating system, window system etc.

Usage

There are two types of native methods:

All native methods have in common that they use the reserved keyword native in their declaration. Also there is just a semicolon instead of a body.
  public native void doSomething();
At run-time the library where the method is stored will be loaded dynamically. This is done by methods implemented in class System.

Problems

The implementation of the native methods in the Java APIgif requires a lot of work. The main part of these methods are declared private. Private methods may only be used by methods in the same class and are therefore not specified in reference books. This made it hard to estimate the amount of native methods needed to implement the Java API. This also meant that we did not know their names, descriptors nor their functionality. A descriptor specifies the type of the the parameters and the return value. In lack of anything better we used the following strategy:

  1. Write a Java program that uses native methods (e.g. print something on the screen).
  2. Compile the program and run it through our JVM. Exit the program when a native method is found (i.e. its access_flags are declared native).
  3. Print the name of the method as well as its descriptor and class.
  4. Try and figure out what the method is supposed to do. This is hopefully possible just by looking at the name.
  5. Implement the method and link it into the JVM statically.
  6. Expand the Java program and repeat the procedure.

Linking

To make our JVM complete, all of the native methods have to be implemented. Also they should be linked dynamically instead of statically. We spent time and effort trying to accomplish this but there were too many problems. The biggest one was that Etrax does not support dynamic libraries. JAVAX does not support Java programs that use native code written by the programmer due to this restriction. Also our JVM will be bigger than if we had used dynamic linking. The upside is that our JVM will be faster since we can gain access to native methods without loading them.

Implementation status

The main reason we wanted to use native methods, other than making our JVM complete, was to make testing easier. To test our Interpreter's arithmetic skills it is vital to be able to print the result on the screen. Earlier we had to use the C debugger to view the stack of the executing frame.

We have implemented methods in the following packages: java.lang, java.io and java.net. The total amount of methods implemented is about 50. We have focused on the classes dealing with sockets since we wanted to run a simple server written Java.


next up previous contents
Next: Execution and Linking Up: No Title Previous: Class File format

Magnus Hjersing
Wed Dec 18 18:44:00 MET 1996