Java offers all the control-flow constructs that C++ programmers found endearing: if
, if-else
, while
, do-while
, for
, and switch
. (Java doesn't offer the goto
, but that was never endearing, not to real C++ programmers anyway.)
Decisions, decisions: keep it simple
The simplest control-flow construct Java offers is the if
statement. But in bytecodes, the if
is not so simple. When a Java program is compiled, the if
statement may be translated to a variety of opcodes. Each opcode pops one or two values from the top of the stack and does a comparison. The opcodes that pop only one value off the top of the stack compare that value with zero. The opcodes that pop two values off the stack compare one of the popped values to the other popped value. If the comparison succeeds (success is defined differently by each individual opcode), the Java virtual machine (JVM) branches -- or jumps -- to the offset given as an operand to the comparison opcode. In this manner, the if
statement provides many ways for you to make the Java virtual machine decide between two alternative paths of program flow.
All you ever wanted to know about the if opcode
One family of if
opcodes performs integer comparisons against zero. When the JVM encounters one of these opcodes, it pops one int
off the stack and compares it with zero.
Opcode | Operand(s) | Description |
---|---|---|
ifeq |
branchbyte1, branchbyte2 | pop int value, if value == 0, branch to offset |
ifne |
branchbyte1, branchbyte2 | pop int value, if value != 0, branch to offset |
iflt |
branchbyte1, branchbyte2 | pop int value, if value < 0, branch to offset |
ifle |
branchbyte1, branchbyte2 | pop int value, if value <= 0, branch to offset |
ifgt |
branchbyte1, branchbyte2 | pop int value, if value > 0, branch to offset |
ifge |
branchbyte1, branchbyte2 | pop int value, if value >= 0, branch to offset |
Another family of if
opcodes pops two integers off the top of the stack and compares them against one another. The Java virtual machine branches if the comparison succeeds. Just before these opcodes are executed, value2 is on the top of the stack; value1 is just beneath value2.
Opcode | Operand(s) | Description |
---|---|---|
if_icmpeq |
branchbyte1, branchbyte2 | pop int value2 and value1, if value1 == value2, branch to offset |
if_icmpne |
branchbyte1, branchbyte2 | pop int value2 and value1, if value1 != value2, branch to offset |
if_icmplt |
branchbyte1, branchbyte2 | pop int value2 and value1, if value1 < value2, branch to offset |
if_icmple |
branchbyte1, branchbyte2 | pop int value2 and value1, if value1 <= value2, branch to offset |
if_icmpgt |
branchbyte1, branchbyte2 | pop int value2 and value1, if value1 > value2, branch to offset |
if_icmpge |
branchbyte1, branchbyte2 | pop int value2 and value1, if value1 >= value2, branch to offset |
The opcodes shown above operate on ints
. These opcodes also are used for comparisons of types short
, byte
, and char
-- the JVM always manipulates types smaller than int
by first converting them to ints
and then manipulating the ints
.
A third family of opcodes takes care of comparisons of the other primitive types: long
, float
, and double
. These opcodes don't cause a branch by themselves. Instead, they push the int
value that represents the result of the comparison -- 0 for equal to, 1 for greater than, and -1 for less than -- and then use one of the int
compare opcodes introduced above to force the actual branch.
Opcode | Operand(s) | Description |
---|---|---|
lcmp |
(none) | pop long value2 and value1, compare, push int result |
fcmpg |
(none) | pop float value2 and value1, compare, push int result |
fcmpl |
(none) | pop float value2 and value1, compare, push int result |
dcmpg |
(none) | pop double value2 and value1, compare, push int result |
dcmpl |
(none) | pop double value2 and value1, compare, push int result |
The two opcodes for float
comparisons (fcmpg and fcmpl) differ only in how they handle NaN ("not a number"). In the Java virtual machine, comparisons of floating-point numbers always fail if one of the values being compared is NaN. If neither value being compared is NaN, both fcmpg and fcmpl instructions push a 0 if the values are equal, a 1 if the value1 is greater than value2, and a -1 if value1 is less than value2. But if one or both of the values is NaN, the fcmpg instruction pushes a 1, whereas the fcmpl instruction pushes a -1. Because both of these operands are available, any comparison between two float
values can push the same result onto the stack independent of whether the comparison failed because of a NaN. This is also true for the two opcodes that compare double
values: dcmpg and dcmpl.
A fourth family of if
opcodes pops one object reference off the top of the stack and compares it with null. If the comparison succeeds, the JVM branches.
Opcode | Operand(s) | Description |
---|---|---|
ifnull |
branchbyte1, branchbyte2 | pop reference value, if value == null , branches to offset |
ifnonnull |
branchbyte1, branchbyte2 | pop reference value, if value != null , branches to offset |
The last family of if
opcodes pops two object references off the stack and compares them with each other. In this case, there are only two comparisons that make sense: "equals" and "not equals." If the references are equal, then they refer to the exact same object on the heap. If not, they refer to two different objects. As with all the other if
opcodes, if the comparison succeeds, the JVM branches.
Opcode | Operand(s) | Description |
---|---|---|
if_acmpeq |
branchbyte1, branchbyte2 | pop reference value2 and value1, if value1 == value2, branch to offset |
if_acmpne |
branchbyte1, branchbyte2 | pop reference value2 and value1, if value1 != value2, branch to offset |
It's unconditional: goto opcodes
Those are all of the opcodes that cause the Java virtual machine to branch conditionally. One other family of opcodes, however, causes the JVM to branch unconditionally. Not surprisingly, these opcodes are called "goto." Although goto
is a reserved word in the Java programming language, it can't be used in your programs because it won't compile. The reason goto
is a reserved word is so that a mischievous programmer can't make a variable named "goto
" in order to freak out their peers. But, when you compile a Java program, the bytecodes generated will likely contain lots of goto
instructions.
Opcode | Operand(s) | Description |
---|---|---|
goto |
branchbyte1, branchbyte2 | branch to offset |
goto_w |
branchbyte1, branchbyte2, branchbyte3, branchbyte4 | branch to offset |
The above opcodes, which perform comparisons and both conditional and unconditional branches, are sufficient to express to a Java virtual machine the desired control flow indicated in Java source code. They achieve this with an if
, if-else
, while
, do-while
, or for
statement. The above opcodes also could be used to express a switch
statement, but the JVM's instruction set includes two opcodes specially designed for the switch
statement: tableswitch and lookupswitch.
The nitty gritty of tableswitch and lookupswitch
The tableswitch
and lookupswitch
instructions both include one default branch offset and a variable-length set of case
value/branch offset pairs. Both instructions pop the key (the value of the expression in the parentheses immediately following the switch
keyword) from the stack. The key is compared with all the case values. If a match is found, the branch offset associated with the case value is taken. If no match is found, the default branch offset is taken.
The difference between tableswitch
and lookupswitch
is in how they indicate the case values. The lookupswitch
instruction is more general-purpose than tableswitch
, but tableswitch
is usually more efficient. Both instructions are followed by zero to three bytes of padding -- enough so that the byte immediately following the padding starts at an address that is a multiple of four bytes from the beginning of the method. (These two instructions, by the way, are the only ones in the entire Java virtual machine instruction set that involve alignment on a greater than one-byte boundary.) For both instructions, the next four bytes after the padding is the default branch offset.
After the zero- to three-byte padding and the four-byte default branch offset, the lookupswitch
opcode is followed by a four-byte value, npairs, which indicates the number of case value/branch offset pairs that will follow. The case value is an int
; this highlights the fact that switch statements in Java require a key expression that is an int
, short
, char
, or byte
. If you attempt to use a long
, float
, or double
as a switch key, your program won't compile. The branch offset associated with each case value is another four-byte offset.
In the tableswitch
instruction, the zero- to three-byte padding and the four-byte default branch offset are followed by low and high int
values. The low and high values indicate the endpoints of a range of case values included in this tableswitch
instruction. Following the low and high values are high - low + 1 branch offsets -- one branch offset for high, one for low, and one for each integer case value in between high and low. The branch offset for low immediately follows the high value.
Thus, when the Java virtual machine encounters a lookupswitch
instruction, it must check the key against each case value until it finds a match or runs out of case values. If it runs out of case values, it uses the default branch offset. On the other hand, when the JVM encounters a tableswitch
instruction, it can simply check to see if the key is within the range defined by low and high. If not, it takes the default branch offset. If so, it just subtracts low from key to get an offset into the list of branch offsets. In this manner, it can determine the appropriate branch offset without having to check each case value.
Opcode | Operand(s) | Description |
---|---|---|
lookupswitch |
<0-3 byte pad>defaultbyte1, defaultbyte2, defaultbyte3, defaultbyte4, npairs1, npairs2, npairs3, npairs4, case value/branch offset pairs... | pop key, match key with case values, if match found jump to associated branch offset, else jump to default branch offset |
tableswitch |
<0-3 byte pad>defaultbyte1, defaultbyte2, defaultbyte3, defaultbyte4, lowbyte1, lowbyte2, lowbyte3, lowbyte4, highbyte1, highbyte2, highbyte3, highbyte4, branch offsets... | pop key, if not in low/high range jump to default branch offset, else get the (key - low) branch offset and jump |
Other than the opcodes described above, the only Java virtual machine instructions that affect control flow are those that deal with throwing and catching exceptions, try-finally clauses, and invoking and returning from methods. The bytecodes for exceptions and try-finally clauses were discussed in the previous two installments of this column (see Resources). The bytecodes that deal with invoking and returning from methods will be treated in a future installment.
SayingTomato: a Java virtual machine simulation
The applet below demonstrates a Java virtual machine executing a sequence of bytecodes. The bytecode sequence in the simulation was generated by the javac
compiler for the argue()
method of the class shown below:
class Argument {
public final static int TOMAYTO = 0;
public final static int TOMAHTO = 1;
static void argue() {
int say = TOMAYTO;
while (true) {
switch (say) {
case TOMAYTO:
say = TOMAHTO;
break;
case TOMAHTO:
say = TOMAYTO;
break;
}
}
}
}
The bytecodes generated by javac
for the argue()
method are shown below:
0 iconst_0 // Push constant 0 (TOMAYTO)
1 istore_0 // Pop into local var 0: int say = TOMAYTO;
2 iload_0 // Push key for switch from local var 0
// Perform switch statement: switch (say) {...
// Low case value is 0, high case value is 1
// Default branch offset will goto 2
3 tableswitch 0 to 1: default=2
0: 24 // case 0 (TOMAYTO): goto 24
1: 29 // case 1 (TOMAHTO): goto 29
// Note that the next instruction starts at address 24,
// which means that the tableswitch took up 21 bytes
24 iconst_1 // Push constant 1 (TOMAHTO)
25 istore_0 // Pop into local var 0: say = TOMAHTO
26 goto 2 // Branch unconditionally to 2, top of while loop
29 iconst_0 // Push constant 1 (TOMAYTO)
30 istore_0 // Pop into local var 0: say = TOMAYTO
31 goto 2 // Branch unconditionally to 2, top of while loop
The argue()
method merely switches the value of say
back and forth between TOMAYTO
and TOMAHTO
. Because the values of TOMAYTO
and TOMAHTO
were consecutive (TOMAYTO
was a 0 and TOMAHTO
was a 1), the javac
compiler used a tableswitch
. The tableswitch
is a more efficient instruction than a lookupswitch
, and the equivalent lookupswitch
instruction would occupy 28 bytes -- 4 bytes more than the tableswitch
instruction.
It turns out that even if TOMAYTO
were a 0 and TOMAHTO
were a 2, the javac
compiler still would have used a tableswitch
, because even with the extra default branch offset in there for a 1, the tableswitch
instruction would occupy only 28 bytes -- the same number of bytes as the equivalent lookupswitch
. Both instructions occupy the same number of bytes, but tableswitch
is more efficient, so it is used. As soon as you make TOMAHTO
a 3, however, javac
starts using a lookupswitch
. This is because a tableswitch
now would need two default branch offsets in its list (for 1 and 2), which would push its size up to 32 bytes. Thus, a lookupswitch
now would require fewer bytes than a tableswitch
-- so javac
would choose the lookupswitch
.
The branch offsets for the case values cause the Java virtual machine to hop down to code that will change the value of the say
local variable. The value of say
will alternate between TOMAYTO
and TOMAHTO
indefinitely, until the user aborts the program, thereby calling the whole thing off.
Get in the driver's seat
To drive the simulation, just press the Step button. Each press of the Step button will cause the Java virtual machine to execute one bytecode instruction. To start the simulation over, press the Reset button. To cause the JVM to repeatedly execute bytecodes with no further coaxing on your part, press the Run button. The JVM will then execute the bytecodes until the Stop button is pressed. The text area at the bottom of the applet describes the next instruction to be executed. Happy clicking.
This article was first published under the name Under the Hood: Control flow in JavaWorld, a division of Web Publishing, Inc., March 1997.
Have an opinion? Be the first to post a comment about this article.
Bill Venners has been writing software professionally for 12 years. Based in Silicon Valley, he provides software consulting and training services under the name Artima Software Company. Over the years he has developed software for the consumer electronics, education, semiconductor, and life insurance industries. He has programmed in many languages on many platforms: assembly language on various microprocessors, C on Unix, C++ on Windows, Java on the Web. He is author of the book: Inside the Java Virtual Machine, published by McGraw-Hill.
Artima provides consulting and training services to help you make the most of Scala, reactive
and functional programming, enterprise systems, big data, and testing.