Welcome to another edition of Under The Hood. This column focuses on Java's underlying technologies. It aims to give developers a glimpse of the mechanisms that make their Java programs run. This month's article takes a look at the bytecodes that deal with objects and arrays.
Object-oriented machine
The Java virtual machine (JVM) works with data in three forms: objects, object references, and primitive types. Objects reside on the garbage-collected heap. Object references and primitive types reside either on the Java stack as local variables, on the heap as instance variables of objects, or in the method area as class variables.
In the Java virtual machine, memory is allocated on the garbage-collected heap only as objects. There is no way to allocate memory for a primitive type on the heap, except as part of an object. If you want to use a primitive type where an Object
reference is needed, you can allocate a wrapper object for the type from the java.lang
package. For example, there is an Integer
class that wraps an int
type with an object. Only object references and primitive types can reside on the Java stack as local variables. Objects can never reside on the Java stack.
The architectural separation of objects and primitive types in the JVM is reflected in the Java programming language, in which objects cannot be declared as local variables. Only object references can be declared as such. Upon declaration, an object reference refers to nothing. Only after the reference has been explicitly initialized -- either with a reference to an existing object or with a call to new
-- does the reference refer to an actual object.
In the JVM instruction set, all objects are instantiated and accessed with the same set of opcodes, except for arrays. In Java, arrays are full-fledged objects, and, like any other object in a Java program, are created dynamically. Array references can be used anywhere a reference to type Object
is called for, and any method of Object
can be invoked on an array. Yet, in the Java virtual machine, arrays are handled with special bytecodes.
As with any other object, arrays cannot be declared as local variables; only array references can. Array objects themselves always contain either an array of primitive types or an array of object references. If you declare an array of objects, you get an array of object references. The objects themselves must be explicitly created with new
and assigned to the elements of the array.
Opcodes for objects
Instantiation of new objects is accomplished via the new
opcode. Two one-byte operands follow the new
opcode. These two bytes are combined to form a 16-bit index into the constant pool. The constant pool element at the specified offset gives information about the class of the new object. The JVM creates a new instance of the object on the heap and pushes the reference to the new object onto the stack, as shown below.
Opcode | Operand(s) | Description |
---|---|---|
new |
indexbyte1, indexbyte2 | creates a new object on the heap, pushes reference |
The next table shows the opcodes that put and get object fields. These opcodes, putfield and getfield, operate only on fields that are instance variables. Static variables are accessed by putstatic and getstatic, which are described later. The putfield and getfield instructions each take two one-byte operands. The operands are combined to form a 16-bit index into the constant pool. The constant pool item at that index contains information about the type, size, and offset of the field. The object reference is taken from the stack in both the putfield and getfield instructions. The putfield instruction takes the instance variable value from the stack, and the getfield instruction pushes the retrieved instance variable value onto the stack.
Opcode | Operand(s) | Description |
---|---|---|
putfield |
indexbyte1, indexbyte2 | set field, indicated by index, of object to value (both taken from stack) |
getfield |
indexbyte1, indexbyte2 | pushes field, indicated by index, of object (taken from stack) |
Class variables are accessed via the getstatic and putstatic opcodes, as shown in the table below. Both getstatic and putstatic take two one-byte operands, which are combined by the JVM to form a 16-bit unsigned offset into the constant pool. The constant pool item at that location gives information about one static field of a class. Because there is no particular object associated with a static field, there is no object reference used by either getstatic or putstatic. The putstatic instruction takes the value to assign from the stack. The getstatic instruction pushes the retrieved value onto the stack.
Opcode | Operand(s) | Description |
---|---|---|
putstatic |
indexbyte1, indexbyte2 | set field, indicated by index, of object to value (both taken from stack) |
getstatic |
indexbyte1, indexbyte2 | pushes field, indicated by index, of object (taken from stack) |
The following opcodes check to see whether the object reference on the top of the stack refers to an instance of the class or interface indexed by the operands following the opcode. The checkcast instruction throws CheckCastException
if the object is not an instance of the specified class or interface. Otherwise, checkcast does nothing. The object reference remains on the stack and execution is continued at the next instruction. This instruction ensures that casts are safe at run time and forms part of the JVM's security blanket.
The instanceof instruction pops the object reference from the top of the stack and pushes true or false. If the object is indeed an instance of the specified class or interface, then true is pushed onto the stack, otherwise, false is pushed onto the stack. The instanceof instruction is used to implement the instanceof
keyword of Java, which allows programmers to test whether an object is an instance of a particular class or interface.
Opcode | Operand(s) | Description |
---|---|---|
checkcast |
indexbyte1, indexbyte2 | Throws ClassCastException if objectref on stack cannot be cast to class at index |
instanceof |
indexbyte1, indexbyte2 | Pushes true if objectref on stack is an instanceof class at index, else pushes false |
Opcodes for arrays
Instantiation of new arrays is accomplished via the newarray, anewarray, and multianewarray opcodes. The newarray opcode is used to create arrays of primitive types other than object references. The particular primitive type is specified by a single one-byte operand following the newarray opcode. The newarray instruction can create arrays for byte, short, char, int, long, float, double, or boolean.
The anewarray instruction creates an array of object references. Two one-byte operands follow the anewarray opcode and are combined to form a 16-bit index into the constant pool. A description of the class of object for which the array is to be created is found in the constant pool at the specified index. This instruction allocates space for the array of object references and initializes the references to null.
The multianewarray instruction is used to allocate multidimensional arrays -- which are simply arrays of arrays -- and could be allocated with repeated use of the anewarray and newarray instructions. The multianewarray instruction simply compresses the bytecodes needed to create multidimensional arrays into one instruction. Two one-byte operands follow the multianewarray opcode and are combined to form a 16-bit index into the constant pool. A description of the class of object for which the array is to be created is found in the constant pool at the specified index. Immediately following the two one-byte operands that form the constant pool index is a one-byte operand that specifies the number of dimensions in this multidimensional array. The sizes for each dimension are popped off the stack. This instruction allocates space for all arrays that are needed to implement the multidimensional arrays.
Opcode | Operand(s) | Description |
---|---|---|
newarray |
atype | pops length, allocates new array of primitive types of type indicated by atype, pushes objectref of new array |
anewarray |
indexbyte1, indexbyte2 | pops length, allocates a new array of objects of class indicated by indexbyte1 and indexbyte2, pushes objectref of new array |
multianewarray |
indexbyte1, indexbyte2, dimensions | pops dimensions number of array lengths, allocates a new multidimensional array of class indicated by indexbyte1 and indexbyte2, pushes objectref of new array |
The next table shows the instruction that pops an array reference off the top of the stack and pushes the length of that array.
Opcode | Operand(s) | Description |
---|---|---|
arraylength |
(none) | pops objectref of an array, pushes length of that array |
The following opcodes retrieve an element from an array. The array index and array reference are popped from the stack, and the value at the specified index of the specified array is pushed back onto the stack.
Opcode | Operand(s) | Description |
---|---|---|
baload |
(none) | pops index and arrayref of an array of bytes, pushes arrayref[index] |
caload |
(none) | pops index and arrayref of an array of chars, pushes arrayref[index] |
saload |
(none) | pops index and arrayref of an array of shorts, pushes arrayref[index] |
iaload |
(none) | pops index and arrayref of an array of ints, pushes arrayref[index] |
laload |
(none) | pops index and arrayref of an array of longs, pushes arrayref[index] |
faload |
(none) | pops index and arrayref of an array of floats, pushes arrayref[index] |
daload |
(none) | pops index and arrayref of an array of doubles, pushes arrayref[index] |
aaload |
(none) | pops index and arrayref of an array of objectrefs, pushes arrayref[index] |
The next table shows the opcodes that store a value into an array element. The value, index, and array reference are popped from the top of the stack.
Opcode | Operand(s) | Description |
---|---|---|
bastore |
(none) | pops value, index, and arrayref of an array of bytes, assigns arrayref[index] = value |
castore |
(none) | pops value, index, and arrayref of an array of chars, assigns arrayref[index] = value |
sastore |
(none) | pops value, index, and arrayref of an array of shorts, assigns arrayref[index] = value |
iastore |
(none) | pops value, index, and arrayref of an array of ints, assigns arrayref[index] = value |
lastore |
(none) | pops value, index, and arrayref of an array of longs, assigns arrayref[index] = value |
fastore |
(none) | pops value, index, and arrayref of an array of floats, assigns arrayref[index] = value |
dastore |
(none) | pops value, index, and arrayref of an array of doubles, assigns arrayref[index] = value |
aastore |
(none) | pops value, index, and arrayref of an array of objectrefs, assigns arrayref[index] = value |
Three-dimensional array: a Java virtual machine simulation
The applet below demonstrates a Java virtual machine executing a sequence of bytecodes. The bytecode sequence in the simulation was generated by javac
for the initAnArray()
method of the class shown below:
class ArrayDemo {
static void initAnArray() {
int[][][] threeD = new int[5][4][3];
for (int i = 0; i < 5; ++i) {
for (int j = 0; j < 4; ++j) {
for (int k = 0; k < 3; ++k) {
threeD[i][j][k] = i + j + k;
}
}
}
}
}
The bytecodes generated by javac
for initAnArray()
are shown below:
0 iconst_5 // Push constant int 5.
1 iconst_4 // Push constant int 4.
2 iconst_3 // Push constant int 3.
// Create a new multi-dimensional array using constant pool
// entry #2 as the class (which is [[[I, an 3D array of ints)
// with a dimension of 3.
3 multianewarray #2 dim #3 <Class [[[I>
7 astore_0 // Pop object ref into local variable 0: int threeD[][][] = new int[5][4][3];
8 iconst_0 // Push constant int 0.
9 istore_1 // Pop int into local variable 1: int i = 0;
10 goto 54 // Go to section of code that tests outer loop.
13 iconst_0 // Push constant int 0.
14 istore_2 // Pop int into local variable 2: int j = 0;
15 goto 46 // Go to section of code that tests middle loop.
18 iconst_0 // Push constant int 0.
19 istore_3 // Pop int into local variable 3: int k = 0;
20 goto 38 // Go to section of code that tests inner loop.
23 aload_0 // Push object ref from local variable 0.
24 iload_1 // Push int from local variable 1 (i).
25 aaload // Pop index and arrayref, push object ref at arrayref[index] (gets threeD[i]).
26 iload_2 // Push int from local variable 2 (j).
27 aaload // Pop index and arrayref, push object ref at arrayref[index] (gets threeD[i][j]).
28 iload_3 // Push int from local variable 3 (k).
// Now calculate the int that will be assigned to threeD[i][j][k]
29 iload_1 // Push int from local variable 1 (i).
30 iload_2 // Push int from local variable 2 (j).
31 iadd // Pop two ints, add them, push int result (i + j).
32 iload_3 // Push int from local variable 3 (k).
33 iadd // Pop two ints, add them, push int result (i + j + k).
34 iastore // Pop value, index, and arrayref; assign arrayref[index] = value: threeD[i][j][k] = i + j + k;
35 iinc 3 1 // Increment by 1 the int in local variable 3: ++k;
38 iload_3 // Push int from local variable 3 (k).
39 iconst_3 // Push constant int 3.
40 if_icmplt 23 // Pop right and left ints, jump if left < right: for (...; k < 3;...)
43 iinc 2 1 // Increment by 1 the int in local variable 2: ++j;
46 iload_2 // Push int from local variable 2 (j).
47 iconst_4 // Push constant int 4.
48 if_icmplt 18 // Pop right and left ints, jump if left < right: for (...; j < 4;...)
51 iinc 1 1 // Increment by 1 the int in local variable 1: ++i;
54 iload_1 // Push int from local variable 1 (i).
55 iconst_5 // Push constant int 5.
56 if_icmplt 13 // Pop right and left ints, jump if left < right: for (...; i < 5;...)
59 return
The initAnArray()
method merely allocates and initializes a three-dimensional array. This simulation demonstrates how the Java virtual machine handles multidimensional arrays. In response to the multianewarray instruction, which in this example requests the allocation of a three-dimensional array, the JVM creates a tree of one-dimensional arrays. The reference returned by the multianewarray instruction refers to the base one-dimensional array in the tree. In the initAnArray()
method, the base array has five components -- threeD[0]
through threeD[4]
. Each component of the base array is itself a reference to a one-dimensional array of four components, accessed by threeD[0][0]
through threeD[4][3]
. The components of these five arrays are also references to arrays, each of which has three components. These components are ints, the elements of this multidimensional array, and they are accessed by threeD[0][0][0]
through threeD[4][3][2]
.
In response to the multianewarray instruction in the initAnArray()
method, the Java virtual machine creates one five-dimensional array of arrays, five four-dimensional arrays of arrays, and twenty three-dimensional arrays of ints
. The JVM allocates these 26 arrays on the heap, initializes their components such that they form a tree, and returns the reference to the base array.
To assign an int
value to an element of the three-dimensional array, the JVM uses aaload to get a component of the base array. Then the JVM uses aaload again on this component -- which is itself an array of arrays -- to get a component of the branch array. This component is a reference to a leaf array of ints
. Finally the JVM uses iastore to assign an int
value to the element of the leaf array. The JVM uses multiple one-dimensional array accesses to accomplish operations on multidimensional arrays.
To drive the simulation, just press the Step button. Each press of this button will cause the Java virtual machine to execute one bytecode instruction. To start the simulation over, press the Reset button. To cause the JVM to repeatedly execute bytecodes with no further coaxing on your part, press the Run button. The JVM will then execute the bytecodes until the Stop button is pressed. The return instruction in the bytecode sequence generated by javac
has been replaced by a breakpoint instruction in the simulation's bytecode sequence. In this case, the breakpoint instruction just causes the simulator to stop. The text area at the bottom of the applet describes the next instruction to be executed. Happy clicking.
To view the Three Dimensional Array applet, visit the interactive illustrations of Inside the Java Virtual Machine at:
http://www.artima.com/insidejvm/applets/ThreeDArray.html
This article was first published under the name Objects and arrays in JavaWorld, a division of Web Publishing, Inc., December 1996.
Have an opinion? Be the first to post a comment about this article.
Bill Venners has been writing software professionally for 12 years. Based in Silicon Valley, he provides software consulting and training services under the name Artima Software Company. Over the years he has developed software for the consumer electronics, education, semiconductor, and life insurance industries. He has programmed in many languages on many platforms: assembly language on various microprocessors, C on Unix, C++ on Windows, Java on the Web. He is author of the book: Inside the Java Virtual Machine, published by McGraw-Hill.
Artima provides consulting and training services to help you make the most of Scala, reactive
and functional programming, enterprise systems, big data, and testing.