The Interpreter is the corner stone of the JVM. This is where the opcodes are executed.
In this chapter we will describe the concepts of objects as well as frames since they are a vital part of the implementation. We will also describe what happens to the opcodes as they enter the Interpreter. A Java instruction consists of an opcode specifying the action to take, followed by zero or more operands. Operations most often take their operands from the stack. After the computation is done the result is put back on the stack. A short description on the different types of opcodes used is found in appendix A.
Objects and frames are allocated with malloc() and deallocated with free() in our implementation. In a more advanced version of our JVM one will have to implement memory management (garbage collector), thus a new form of allocation needs to be employed.
Since Java is an object-oriented language, the JVM must be able to handle objects. By handle we mean that the JVM is responsible for creating, using and terminating objects. More specifically, it is the Interpreter's job to handle objects.
The implementor of the JVM may decide how he or she wants to represent a Java object in their implementation. Our Java objects are described by the C struct below:
typedef struct object_info
{
void* data;
uword size;
ArrayType array_type;
ClassFile* CF_p;
SyncObject* lock;
} object_info;
The data is a pointer to an array of data. This is where the instance variables are stored. It is declared void* since the type of the instance variables may vary.
To access the object's methods we use CF_p, which is a pointer to the ClassFile struct for the object's class. The methods are stored in the class file. Every object created from the same class use the same methods. This can be done since methods are not changed during execution.
The variables are unique thus every object must have its own set of variables, except for those declared static. Some of the variables are inherited from the object's super classes and the rest are defined by the class. To allocate the right amount of memory for the instance variables we need to check all the super classes' instance variables. The search ends when class Object is reached. All classes extend class Object like in Smalltalk.
Static variables are shared among the objects of the same class, thus they are only stored in the ClassFile struct, not in the object_info. The static variables are reached through a pointer called static_p in the ClassFile struct. Actually, in every entry in the fields array.
The number of entries in the data array is given by size. Each element in the array is 32-bits wide even though it is unnecessary in some cases. If an element represents a variable of type short there are 16-bits not actually used. A variable declared short is a 16-bits entity. The reason for this is alignment problems with the Etrax chip. Every operand that the chip uses must be 32-bits wide. Instead of converting every variable to fit the processor, a direct copy can be done. The variables double and long are specified to be 64-bits wide. To fit them in the array two neighboring entries in the data array are used. The most significant 32-bits are stored at the lower index in the array and the least significant 32-bits are stored directly afterwards. Every double and long variable thus increase the size by two.
The object_info is used both as a head for objects and arrays. The entry called array_type indicates if the object represents an array or an object. If it is an array, then the array_type holds information about the type of the elements in the array. An array containing integers will have an array_type equal to T_INT. If the object_info represents a string object the array_type will equal T_OBJECT. If array_type equals T_ARRAY, then the object_info represents a multidimensional array.
Figure 3: The object_info structure
As described earlier an object contains variables and methods. Methods can be viewed as the actions an object is capable of. An object MyCircle may be drawn and moved. When you want to draw a circle you create the object MyCircle and then you invoke the method Draw. The method Draw must be implemented in class MyCircle or one of its super classes.
When the Interpreter wants to execute a method it needs to create a frame associated with the method. For every method invocation a new frame is created. A frame is actually an execution environment that contains an operand stack, some local variables and its own program counter (pc). Our frames are described by the C struct below:
typedef struct frame
{
uword method_index;
uword code_index;
byte* pc;
dword* vars;
dword* optop;
dword* optop_start;
byte nargs;
ClassFile* CF_p;
struct frame* return_frame_p;
object_info* error_p;
} frame;
A pointer to the class file (i.e. our ClassFile struct) is stored in CF_p.
Every method that the class implements is described by a method_info (see appendix B.2) in the class file. The method_info gives us the name of the method and its descriptor. The descriptor is a specification of the arguments that the method needs and its return type. The number of arguments is stored in nargs.
Figure 4: The frame with its pointers to a ClassFile.
If the invoking opcode is invokevirtual then the wanted class is based on the run-time state of the object on the stack. The other invoking opcodes (for static, final etc) gets the class from the Constant_Methodref as described above since the compiler can determine this at compile time.
The wanted class file is searched for the corresponding method_info which is found in the array methods. One could say that we compare our CONSTANT_Methodref with each method_info.
When the method is found, a frame is allocated and the index into the methods array is stored in method_index.
The method_info contains an array of attributes and there must be one and only one Code_attribute unless it's a native method. The index into the attributes array is saved in code_index.
The Code_attribute has a Code pointer which points to the beginning of this method's code block (i.e. the opcodes) hence we set pc to refer to this block. Figure 4 shows the elements used in the frame too keep track of the method. Also the Code_attribute contains information on the number of local variables and the size of the stack. These numbers are needed to determine the size of the frame.
When this is done we set vars to refer to the beginning of the array holding local variables and optop to refer to the top of the stack. The operand stack and the local variable area are both allocated in each frame. We don't use a global stack and a global variable area.
The finished frame is deleted and the calling frame is sent to the Interpreter where execution is continued at the calling frame's pc. This also enables recursive calls since a new frame is created even if it is the same method.
The only task of the Interpreter is to execute the instruction it is given. It is not up to the Interpreter to keep track of threads, frames and memory management. Every opcode is represented by a 2-digit hex number, thus the Interpreter's basic contents is one big switch statement. One entry for each opcode.
When instructions are being executed the Interpreter can be viewed as a software or virtual processor. Since it works as a processor it needs registers. The Java Interpreter uses only four registers, namely pc, optop, vars and frame. The frame register points to the frame struct of the currently executing method.
The program counter (pc) always points to the next instruction to be executed. The code is stored in method_info structs, one struct for every method that the class implements. The Interpreter makes sure that the pc is incremented the right amount for every instruction executed.
The address to the section in memory that contains the current method's local variables is stored in the vars register. The vars register always points to the first local variable of the currently executing method.
The operand stack can be viewed as a working area for methods since this is where the operands and the results from computations are placed. The optop register refers to the top of the operand stack of the currently executing method. It is incremented by the Interpreter.
At a method call, the current values of pc, optop and vars are stored in the frame of the calling method, so that these registers can be restored when control is returned to the calling method.
As mentioned earlier every opcode has its own entry in the switch statement. In each of these entries a number of actions are performed. These actions are executed in a C environment, i.e. some compiled C block is executed, and the result is placed on the current frame's stack. For instance, if a subtraction is to be performed the Interpreter pops the current frame's stack twice in order to get the two operands. The subtraction is done and the result is pushed onto the frame's stack. Some actions require little work and the Interpreter needs no help. In other cases the Interpreter has to rely on other parts of the JVM. If a method invocation is to be performed a new frame representing the invoked method has to be created. This is done outside the Interpreter by a function that will return this new frame and execution will continue.
Resolution is a way to increase the performance of the JVM. Interpretation is a tedious procedure since many checks have to be done to ensure that the execution can proceed properly. Many of these checks need only to be done once. Resolution makes sure that these checks are done only once, thus gaining speed if the same code is executed again. This happens in a for or a while loop, or if a method is called more than once. Resolution is not part of the JVM specification, but it is described as an optimization in [ArnGos96].
Some opcodes refer to the constant_pool array of a class file where the symbolic references are stored. The reference must be looked up to gain access to the real value. This look-up is time consuming.
For example, the opcode putfield sets a variable in an object. The stack contains a reference to the object and the new value of the instance variable. To set the variable, the JVM must find the offset of the instance variable in the object. The offset indicates where the variable is located in the instance variable array called data. Every instantiated object contains the data element. The putfield opcode's operand is an index into the constant_pool array of the class containing the opcode. The entry contains a symbolic reference to the class containing the variable, the symbolic name and descriptor of the variable. The class must be loaded and the variable must be found in order to reach the offset. The resolved corresponding entry contains the offset, thus the time consuming procedure described above is avoided.
When the symbolic reference is ``resolved'' it should be stored for future use since other opcodes may have to use the same reference. Figure 5 describes how the symbolic reference to a class is resolved.
Figure 5: The work to resolve a symbolic class reference.
One problem remains. The opcode is designed to be used with the symbolic reference. Every time it is executed the JVM will presume that the symbolic reference must be looked up. The way to circumvent this problem is to design new opcodes that use the direct reference at once.
The overall procedure to interpret an opcode which can be resolved is:
To keep track of the resolved items in the constant_pool a new array at constant_pool[0] can be allocated. The elements in that array shows if the corresponding entry in the constant_pool is resolved or not.
JAVAX doesn't use an array to mark resolved objects. Instead, the tag field in the resolved cp_info is marked.
When an entry in the constant_pool is resolved it is replaced with the resolved item. The corresponding entry in the constant_pool[0] array is marked as resolved.
The different entries in the constant_pool that can be resolved are listed below. Appendix B.1 contains a description of these entries. The work that is needed to resolve the item is described directly after the name of the entry.
. We use it to
store the 32 least significant bits of the value.
The opcodes that use the same type of symbolic reference, e.g. constants, can be changed in a similar way. The following list describe how opcodes that uses symbolic references are transformed to the corresponding ``real reference'' opcode.
To access a field in an object the offset and the type of the object is needed. The offset gives the object's internal place for the field. The resolved entry in the constant_pool contains this information.
The index into the constant_pool is replaced by the offset. The opcode is changed to hold information about the size of the field. There are many different types in Java but there are only two different sizes they use in memory, namely 32- and 64-bits.
For instance, the putfield opcode could be changed to putfield_quick_w to show that the operand holds the offset and not the index. The _w extension shows that the element is 64-bits wide and thus it is stored in two subsequent entries.
The opcodes that get constants from the constant_pool are changed to include the size of the constant. The _quick opcode does a simple copy operation without checking the type.
Another optimization would be to store the constant as operand to the opcode. A problem is that the size of the constant may not fit in the operand area.
Opcodes that use symbolic class references
are changed to use real
references to
classes.
The pointer to the class could replace the operands, if the pointer is not larger than the operands in size. Otherwise the operands are left unchanged.
Resolution increases the performance of the Interpreter but there is still some time consuming overhead as the application is started. It can't be avoided because the symbolic references must be resolved.
If a Just-In-Time Compiler JVM is built, the symbol table can be omitted when every opcode that uses the table is resolved. The resolved item can be stored as an operand. This removes the extra lookup in the constant_pool to find the resolved item.
The only opcodes that are referring to a class file's symbol table are the ones located in the class file. When these opcodes are resolved there is little use for the symbol table. There are still some symbolic references left in the various attributes, but these are not difficult to replace with the real reference since the procedure is the same as for opcode resolving.
The language Java and the Java compiler, are both implemented in Java to make it as portable as possible. It does not however make it entirely portable. At some point the JVM has to hand over the control to some platform specific code. These methods are called native.
Native methods are the means for a Java program to interact with its environment like operating system, window system etc.
There are two types of native methods:
public native void doSomething();At run-time the library where the method is stored will be loaded dynamically. This is done by methods implemented in class System.
The implementation of the native methods in the Java API
requires a lot of work. The main part of these
methods are declared private. Private methods may only be used by
methods in the same class and are therefore not specified in reference
books. This made it hard to estimate the amount of native methods needed to
implement the Java API. This also meant that we did not know their names,
descriptors nor their functionality. A descriptor specifies the type of the the
parameters and the return value. In lack of anything better we used the
following strategy:
To make our JVM complete, all of the native methods have to be implemented. Also they should be linked dynamically instead of statically. We spent time and effort trying to accomplish this but there were too many problems. The biggest one was that Etrax does not support dynamic libraries. JAVAX does not support Java programs that use native code written by the programmer due to this restriction. Also our JVM will be bigger than if we had used dynamic linking. The upside is that our JVM will be faster since we can gain access to native methods without loading them.
The main reason we wanted to use native methods, other than making our JVM complete, was to make testing easier. To test our Interpreter's arithmetic skills it is vital to be able to print the result on the screen. Earlier we had to use the C debugger to view the stack of the executing frame.
We have implemented methods in the following packages: java.lang, java.io and java.net. The total amount of methods implemented is about 50. We have focused on the classes dealing with sockets since we wanted to run a simple server written Java.