AGAL bytecode format

AGAL bytecode must use Endian.LITTLE_ENDIAN format.

Bytecode Header

AGAL bytecode must begin with a 7-byte header:

A0 01000000 A1 00 -- for a vertex program 
A0 01000000 A1 01 -- for a fragment program

Offset (bytes)	Size (bytes)	Name	Description
0	1	magic	must be 0xa0
1	4	version	must be 1
5	1	shader type ID	must be 0xa1
6	1	shader type	0 for a vertex program; 1 for a fragment program

Tokens

The header is immediately followed by any number of tokens. Every token is 192 bits (24 bytes) in size and always has the format:

[opcode][destination][source1][source2 or sampler]

Not every opcode uses all of these fields. Unused fields must be set to 0.

Operation codes

The [opcode] field is 32 bits in size and can take one of these values:

Name	Opcode	Operation	Description
mov	0x00	move	move data from source1 to destination, component-wise
add	0x01	add	destination = source1 + source2, component-wise
sub	0x02	subtract	destination = source1 - source2, component-wise
mul	0x03	multiply	destination = source1 * source2, component-wise
div	0x04	divide	destination = source1 / source2, component-wise
rcp	0x05	reciprocal	destination = 1/source1, component-wise
min	0x06	minimum	destination = minimum(source1,source2), component-wise
max	0x07	maximum	destination = maximum(source1,source2), component-wise
frc	0x08	fractional	destination = source1 - (float)floor(source1), component-wise
sqt	0x09	square root	destination = sqrt(source1), component-wise
rsq	0x0a	reciprocal root	destination = 1/sqrt(source1), component-wise
pow	0x0b	power	destination = pow(source1,source2), component-wise
log	0x0c	logarithm	destination = log_2(source1), component-wise
exp	0x0d	exponential	destination = 2^source1, component-wise
nrm	0x0e	normalize	destination = normalize(source1), component-wise (produces only a 3 component result, destination must be masked to .xyz or less)
sin	0x0f	sine	destination = sin(source1), component-wise
cos	0x10	cosine	destination = cos(source1), component-wise
crs	0x11	cross product	destination.x = source1.y * source2.z - source1.z * source2.y destination.y = source1.z * source2.x - source1.x * source2.z destination.z = source1.x * source2.y - source1.y * source2.x (produces only a 3 component result, destination must be masked to .xyz or less)
dp3	0x12	dot product	destination = source1.xsource2.x + source1.ysource2.y + source1.z*source2.z
dp4	0x13	dot product	destination = source1.xsource2.x + source1.ysource2.y + source1.zsource2.z + source1.wsource2.w
abs	0x14	absolute	destination = abs(source1), component-wise
neg	0x15	negate	destination = -source1, component-wise
sat	0x16	saturate	destination = maximum(minimum(source1,1),0), component-wise
m33	0x17	multiply matrix 3x3	destination.x = (source1.x * source2[0].x) + (source1.y * source2[0].y) + (source1.z * source2[0].z) destination.y = (source1.x * source2[1].x) + (source1.y * source2[1].y) + (source1.z * source2[1].z) destination.z = (source1.x * source2[2].x) + (source1.y * source2[2].y) + (source1.z * source2[2].z) (produces only a 3 component result, destination must be masked to .xyz or less)
m44	0x18	multiply matrix 4x4	destination.x = (source1.x * source2[0].x) + (source1.y * source2[0].y) + (source1.z * source2[0].z) + (source1.w * source2[0].w) destination.y = (source1.x * source2[1].x) + (source1.y * source2[1].y) + (source1.z * source2[1].z) + (source1.w * source2[1].w) destination.z = (source1.x * source2[2].x) + (source1.y * source2[2].y) + (source1.z * source2[2].z) + (source1.w * source2[2].w) destination.w = (source1.x * source2[3].x) + (source1.y * source2[3].y) + (source1.z * source2[3].z) + (source1.w * source2[3].w)
m34	0x19	multiply matrix 3x4	destination.x = (source1.x * source2[0].x) + (source1.y * source2[0].y) + (source1.z * source2[0].z) + (source1.w * source2[0].w) destination.y = (source1.x * source2[1].x) + (source1.y * source2[1].y) + (source1.z * source2[1].z) + (source1.w * source2[1].w) destination.z = (source1.x * source2[2].x) + (source1.y * source2[2].y) + (source1.z * source2[2].z) + (source1.w * source2[2].w) (produces only a 3 component result, destination must be masked to .xyz or less)
kil	0x27	kill/discard (fragment shader only)	If single scalar source component is less than zero, fragment is discarded and not drawn to the frame buffer. (Destination register must be set to all 0)
tex	0x28	texture sample (fragment shader only)	destination equals load from texture source2 at coordinates source1. In this case, source2 must be in sampler format.
sge	0x29	set-if-greater-equal	destination = source1 >= source2 ? 1 : 0, component-wise
slt	0x2a	set-if-less-than	destination = source1 < source2 ? 1 : 0, component-wise
seq	0x2c	set-if-equal	destination = source1 == source2 ? 1 : 0, component-wise
sne	0x2d	set-if-not-equal	destination = source1 != source2 ? 1 : 0, component-wise

In AGAL2, the following opcodes have been introduced:

Name	Opcode	Operation	Description
ddx	0x1a	partial derivative in X	Load partial derivative in X of source1 into destination.
ddy	0x1b	partial derivative in Y	Load partial derivative in Y of source1 into destination.
ife	0x1c	if equal to	Jump if source1 is equal to source2.
ine	0x1d	if not equal to	Jump if source1 is not equal to source2.
ifg	0x1e	if greater than	Jump if source1 is greater than or equal to source2.
ifl	0x1f	if less than	Jump if source1 is less than source2.
els	0x20	else	Else block
eif	0x21	Endif	Close if or else block.

Destination field format

The [destination] field is 32 bits in size:

31.............................0 
----TTTT----MMMMNNNNNNNNNNNNNNNN

T = Register type (4 bits)

M = Write mask (4 bits)

N = Register number (16 bits)

- = undefined, must be 0

Source field format

The [source] field is 64 bits in size:

63.............................................................0 
D-------------QQ----IIII----TTTTSSSSSSSSOOOOOOOONNNNNNNNNNNNNNNN

D = Direct=0/Indirect=1 for direct Q and I are ignored, 1bit

Q = Index register component select (2 bits)

I = Index register type (4 bits)

T = Register type (4 bits)

S = Swizzle (8 bits, 2 bits per component)

O = Indirect offset (8 bits)

N = Register number (16 bits)

- = undefined, must be 0

Sampler field format

The second source field for the tex opcode must be in [sampler] format, which is 64 bits in size:

63.............................................................0 
FFFFMMMMWWWWSSSSDDDD--------TTTT--------BBBBBBBBNNNNNNNNNNNNNNNN

N = Sampler register number (16 bits)

B = Texture level-of-detail (LOD) bias, signed integer, scale by 8. The floating point value used is b/8.0 (8 bits)

T = Register type, must be 5, Sampler (4 bits)

F = Filter (0=nearest,1=linear) (4 bits)

M = Mipmap (0=disable,1=nearest, 2=linear)

W = Wrapping (0=clamp,1=repeat)

S = Special flag bits (must be 0)

D = Dimension (0=2D, 1=Cube)

Program Registers

The number of registers used depend upon the Context3D profile used. The number of registers along with their usage are defined in the following table:

Name	Value	AGAL		AGAL2		AGAL3		Usage
		Number per fragment program	Number per vertex program	Number per fragment program	Number per vertex program	Number per fragment program	Number per vertex program
Context 3D Profiles Support		Below Standard		Standard		Standard Extended
SWF version		Below 25		25		28 and above
Attribute	0	NA	8	NA	8	NA	16	Vertex shader input; read from a vertex buffer specified using Context3D.setVertexBufferAt().
Constant	1	28	128	64	250	200	250	Shader input; set using the Context3D.setProgramConstants() family of functions.
Temporary	2	8	8	26	26	26	26	Temporary register for computation; not accessible outside program.
Output	3	1	1	1	1	1	1	Shader output: in a vertex program, the output is the clip space position; in a fragment program, the output is a color.
Varying	4	8	8	10	10	10	10	Transfer interpolated data between vertex and fragment shaders. The varying registers from the vertex program are applied as input to the fragment program. Values are interpolated according to the distance from the triangle vertices.
Sampler	5	8	NA	16	NA	16	NA	Fragment shader input; read from a texture specified using Context3D.setTextureAt().
Fragment register	6	NA	NA	1	NA	1	NA	It is write-only and used to re-write z-value (or depth value) written in vertex shader.
Tokens		200		1024		2048

The latest AGAL Mini Assembler can be found here .