Sponsored Link •
|
Advertisement
|
At the core of any Java virtual machine implementation is its execution engine. In the Java virtual machine specification, the behavior of the execution engine is defined in terms of an instruction set. For each instruction, the specification describes in detail what an implementation should do when it encounters the instruction as it executes bytecodes, but says very little about how. As mentioned in previous chapters, implementation designers are free to decide how their implementations will execute bytecodes. Their implementations can interpret, just-in-time compile, execute natively in silicon, use a combination of these, or dream up some brand new technique.
Similar to the three senses of the term "Java virtual machine" described at the beginning of this chapter, the term "execution engine" can also be used in any of three senses: an abstract specification, a concrete implementation, or a runtime instance. The abstract specification defines the behavior of an execution engine in terms of the instruction set. Concrete implementations, which may use a variety of techniques, are either software, hardware, or a combination of both. A runtime instance of an execution engine is a thread.
Each thread of a running Java application is a distinct instance of the virtual machine's execution engine. From the beginning of its lifetime to the end, a thread is either executing bytecodes or native methods. A thread may execute bytecodes directly, by interpreting or executing natively in silicon, or indirectly, by just- in-time compiling and executing the resulting native code. A Java virtual machine implementation may use other threads invisible to the running application, such as a thread that performs garbage collection. Such threads need not be "instances" of the implementation's execution engine. All threads that belong to the running application, however, are execution engines in action.
A method's bytecode stream is a sequence of instructions for the Java virtual machine. Each instruction consists of a one-byte opcode followed by zero or more operands. The opcode indicates the operation to be performed. Operands supply extra information needed by the Java virtual machine to perform the operation specified by the opcode. The opcode itself indicates whether or not it is followed by operands, and the form the operands (if any) take. Many Java virtual machine instructions take no operands, and therefore consist only of an opcode. Depending upon the opcode, the virtual machine may refer to data stored in other areas in addition to (or instead of) operands that trail the opcode. When it executes an instruction, the virtual machine may use entries in the current constant pool, entries in the current frame's local variables, or values sitting on the top of the current frame's operand stack.
The abstract execution engine runs by executing bytecodes one instruction at a time. This process takes place for each thread (execution engine instance) of the application running in the Java virtual machine. An execution engine fetches an opcode and, if that opcode has operands, fetches the operands. It executes the action requested by the opcode and its operands, then fetches another opcode. Execution of bytecodes continues until a thread completes either by returning from its starting method or by not catching a thrown exception.
From time to time, the execution engine may encounter an instruction that requests a native method invocation. On such occasions, the execution engine will dutifully attempt to invoke that native method. When the native method returns (if it completes normally, not by throwing an exception), the execution engine will continue executing the next instruction in the bytecode stream.
One way to think of native methods, therefore, is as programmer-customized extensions to the Java virtual machine's instruction set. If an instruction requests an invocation of a native method, the execution engine invokes the native method. Running the native method is how the Java virtual machine executes the instruction. When the native method returns, the virtual machine moves on to the next instruction. If the native method completes abruptly (by throwing an exception), the virtual machine follows the same steps to handle the exception as it does when any instruction throws an exception.
Part of the job of executing an instruction is determining the next instruction to execute. An execution
engine determines the next opcode to fetch in one of three ways. For many instructions, the next opcode to
execute directly follows the current opcode and its operands, if any, in the bytecode stream. For some
instructions, such as goto
or return
, the execution engine
determines the next opcode as part of its execution of the current instruction. If an instruction throws an
exception, the execution engine determines the next opcode to fetch by searching for an appropriate catch
clause.
Several instructions can throw exceptions. The athrow
instruction, for example,
throws an exception explicitly. This instruction is the compiled form of the throw
statement in Java source code. Every time the athrow
instruction is executed, it will
throw an exception. Other instructions throw exceptions only when certain conditions are encountered. For
example, if the Java virtual machine discovers, to its chagrin, that the program is attempting to perform an
integer divide by zero, it will throw an ArithmeticException
. This can occur
while executing any of four instructions--idiv
, ldiv
,
irem
, and lrem
--which perform divisions or calculate remainders
on int
s or long
s.
Each type of opcode in the Java virtual machine's instruction set has a mnemonic. In the typical assembly language style, streams of Java bytecodes can be represented by their mnemonics followed by (optional) operand values.
For an example of method's bytecode stream and mnemonics, consider the
doMathForever()
method of this class:
// On CD-ROM in file jvm/ex4/Act.java class Act { public static void doMathForever() { int i = 0; for (;;) { i += 1; i *= 2; } } }
The stream of bytecodes for doMathForever()
can be disassembled into
mnemonics as shown next. The Java virtual machine specification does not define any official syntax for
representing the mnemonics of a method's bytecodes. The code shown next illustrates the manner in which
streams of bytecode mnemonics will be represented in this book. The left hand column shows the offset in
bytes from the beginning of the method's bytecodes to the start of each instruction. The center column
shows the instruction and any operands. The right hand column contains comments, which are preceded with
a double slash, just as in Java source code.
// Bytecode stream: 03 3b 84 00 01 1a 05 68 3b a7 ff f9 // Disassembly: // Method void doMathForever() // Left column: offset of instruction from beginning of method // | Center column: instruction mnemonic and any operands // | | Right column: comment 0 iconst_0 // 03 1 istore_0 // 3b 2 iinc 0, 1 // 84 00 01 5 iload_0 // 1a 6 iconst_2 // 05 7 imul // 68 8 istore_0 // 3b 9 goto 2 // a7 ff f9
This way of representing mnemonics is very similar to the output of the javap
program of Sun's Java 2 SDK. javap
allows you to look at the bytecode mnemonics
of the methods of any class file. Note that jump addresses are given as offsets from the beginning of the
method. The goto
instruction causes the virtual machine to jump to the instruction at
offset two (an iinc
). The actual operand in the stream is minus seven. To execute this
instruction, the virtual machine adds the operand to the current contents of the pc register. The result is the
address of the iinc
instruction at offset two. To make the mnemonics easier to read,
the operands for jump instructions are shown as if the addition has already taken place. Instead of saying
"goto -7
," the mnemonics say, "goto 2
."
The central focus of the Java virtual machine's instruction set is the operand stack. Values are generally
pushed onto the operand stack before they are used. Although the Java virtual machine has no registers for
storing arbitrary values, each method has a set of local variables. The instruction set treats the local
variables, in effect, as a set of registers that are referred to by indexes. Nevertheless, other than the
iinc
instruction, which increments a local variable directly, values stored in the local
variables must be moved to the operand stack before being used.
For example, to divide one local variable by another, the virtual machine must push both onto the stack, perform the division, and then store the result back into the local variables. To move the value of an array element or object field into a local variable, the virtual machine must first push the value onto the stack, then store it into the local variable. To set an array element or object field to a value stored in a local variable, the virtual machine must follow the reverse procedure. First, it must push the value of the local variable onto the stack, then pop it off the stack and into the array element or object field on the heap.
Several goals--some conflicting--guided the design of the Java virtual machine's instruction set. These goals are basically the same as those described in Part I of this book as the motivation behind Java's entire architecture: platform independence, network mobility, and security.
The platform independence goal was a major influence in the design of the instruction set. The instruction set's stack-centered approach, described previously, was chosen over a register-centered approach to facilitate efficient implementation on architectures with few or irregular registers, such as the Intel 80X86. This feature of the instruction set--the stack-centered design--make it easier to implement the Java virtual machine on a wide variety of host architectures.
Another motivation for Java's stack-centered instruction set is that compilers usually use a stack-based
architecture to pass an intermediate compiled form or the compiled program to a linker/optimizer. The Java
class file, which is in many ways similar to the UNIX .o
or Windows
.obj
file emitted by a C compiler, really represents an intermediate compiled form of a
Java program. In the case of Java, the virtual machine serves as (dynamic) linker and may serve as optimizer.
The stack-centered architecture of the Java virtual machine's instruction set facilitates the optimization that
may be performed at run-time in conjunction with execution engines that perform just-in-time compiling or
adaptive optimization.
As mentioned in Chapter 4, "Network Mobility," one major design consideration was class file compactness. Compactness is important because it facilitates speedy transmission of class files across networks. In the bytecodes stored in class files, all instructions--except two that deal with table jumping--are aligned on byte boundaries. The total number of opcodes is small enough so that opcodes occupy only one byte. This design strategy favors class file compactness possibly at the cost of some performance when the program runs. In some Java virtual machine implementations, especially those executing bytecodes in silicon, the single-byte opcode may preclude certain optimizations that could improve performance. Also, better performance may have been possible on some implementations if the bytecode streams were word-aligned instead of byte-aligned. (An implementation could always realign bytecode streams, or translate opcodes into a more efficient form as classes are loaded. Bytecodes are byte-aligned in the class file and in the specification of the abstract method area and execution engine. Concrete implementations can store the loaded bytecode streams any way they wish.)
Another goal that guided the design of the instruction set was the ability to do bytecode verification, especially all at once by a data flow analyzer. The verification capability is needed as part of Java's security framework. The ability to use a data flow analyzer on the bytecodes when they are loaded, rather than verifying each instruction as it is executed, facilitates execution speed. One way this design goal manifests itself in the instruction set is that most opcodes indicate the type they operate on.
For example, instead of simply having one instruction that pops a word from the operand stack and
stores it in a local variable, the Java virtual machine's instruction set has two. One instruction,
istore
, pops and stores an int
. The other instruction,
fstore
, pops and stores a float
. Both of these instructions
perform the exact same function when executed: they pop a word and store it. Distinguishing between
popping and storing an int
versus a float
is important only to
the verification process.
For many instructions, the virtual machine needs to know the types being operated on to know how to
perform the operation. For example, the Java virtual machine supports two ways of adding two words
together, yielding a one-word result. One addition treats the words as int
s, the other as
float
s. The difference between these two instructions facilitates verification, but also
tells the virtual machine whether it should perform integer or floating point arithmetic.
A few instructions operate on any type. The dup
instruction, for example,
duplicates the top word of a stack irrespective of its type. Some instructions, such as
goto
, don't operate on typed values. The majority of the instructions, however, operate
on a specific type. The mnemonics for most of these "typed" instructions indicate their type by a single
character prefix that starts their mnemonic. Table 5-2 shows the prefixes for the various types. A few
instructions, such as arraylength
or instanceof
, don't
include a prefix because their type is obvious. The arraylength
opcode requires an
array reference. The instanceof
opcode requires an object reference.
Type | Code | Example | Description |
---|---|---|---|
byte |
b |
baload |
load byte from array |
short |
s |
saload |
load short from array |
int |
i |
iaload |
load int from array |
long |
l |
laload |
load long from array |
char |
c |
caload |
load char from array |
float |
f |
faload |
load float from array |
double |
d |
daload |
load double from array |
reference |
a |
aaload |
load reference from array |
Table 5-2. Type prefixes of bytecode mnemonics
Values on the operand stack must be used in a manner appropriate to their type. It is illegal, for
example, to push four int
s, then add them as if they were two
long
s. It is illegal to push a float
value onto the operand stack
from the local variables, then store it as an int
in an array on the heap. It is illegal to
push a double
value from an object field on the heap, then store the topmost of its two
words into the local variables as an value of type reference
. The strict type rules that
are enforced by Java compilers must also be enforced by Java virtual machine implementations.
Implementations must also observe rules when executing instructions that perform generic stack
operations independent of type. As mentioned previously, the dup
instruction pushes a
copy of the top word of the stack, irrespective of type. This instruction can be used on any value that
occupies one word: an int
, float
,
reference
, or returnAddress
. It is illegal, however, to use
dup
when the top of the stack contains either a long
or
double
, the data types that occupy two consecutive operand stack locations. A
long
or double
sitting on the top of the operand stack can be
duplicated in their entirety by the dup2
instruction, which pushes a copy of the top two
words onto the operand stack. The generic instructions cannot be used to split up dual-word values.
To keep the instruction set small enough to enable each opcode to be represented by a single byte, not
all operations are supported on all types. Most operations are not supported for types
byte
, short
, and char
. These types are
converted to int
when moved from the heap or method area to the stack frame. They
are operated on as int
s, then converted back to byte
,
short
, or char
before being stored back into the heap or method
area.
Table 5-3 shows the computation types that correspond to each storage type in the Java virtual machine. As used here, a storage type is the manner in which values of the type are represented on the heap. The storage type corresponds to the type of the variable in Java source code. A computation type is the manner in which the type is represented on the Java stack frame.
Storage Type | Minimum Bits in Heap or Method Area |
Computation Type | Words in the Java Stack Frame |
---|---|---|---|
byte |
int |
||
short |
int |
||
int |
int |
||
long |
long |
||
char |
int |
||
float |
float |
||
double |
double |
||
reference |
reference |
Table 5-3. Storage and computation types inside the Java virtual machine
Implementations of the Java virtual machine must in some way ensure that values are operated on by instructions appropriate to their type. They can verify bytecodes up front as part of the class verification process, on the fly as the program executes, or some combination of both. Bytecode verification is described in more detail in Chapter 7, "The Lifetime of a Type." The entire instruction set is covered in detail in Chapters 10 through 20.
Sponsored Links
|