Interfacing Assembly Language Routines with C

This tutorial helps you in interfacing assembly routines in C language. Most serious C or C++ programmers must face the task of interfacing assembly language subroutines with the language. Because C and C++ are compiled languages, assembly language subroutines can be linked as the final stage of program development.

Before entering deep into the topic, I would like to address the general description of:

How the subroutine in invoked.
How parameters are passed.
How values are returned.

Subroutine invocation:

I must clear at this point that, now we are entering into the world of assembly language of 8086 family microprocessors, so further discussion heavily depends upon it. Prior knowledge of 8086 assembly language is assumed. I shall give explanation as far as space constraints allow. Simplest form of subroutine invocation is "CALL" mnemonic. Notre that stack frame is different for "FAR CALL" and for "NEAR CALL".

Passing Parameters:

High level languages like C pass parameters on stack. Because each high level language uses memory in different ways, a common area is needed for subroutine interfacing. The accepted standard is stack. Mnemonic used to store 16 bit data on stack is "PUSH". "POP" to retrieve data. Normally if the parameter is a short numeric integer, its data value is pushed on stack. If the parameter is an alphanumeric string, a pointer to the string is passed. Floating point format numbers can be passed either by pointer or direct value. (single/double precision)

Accessing parameters on the stack is simplified by secondary stack pointer: BP. The BP register is used for accessing information on stack. The important thing is our assembly subroutines which are "extern" for C program should use BP pointer in such a way that it should not affect the working of main C program. C programs heavily depends on BP pointer.

Returning values:

Values are returned by following ways:

Through the stack.
Through a register.
Through memory.

None of these methods are automatic and we must predetermine which method we are going to use and that must be pretty clear in assembly subroutine documentation.

Turbo C follows following table to return values from a subroutine. Even functions using asm (the inline assembler directive) must return their results as follows:

Type of result	Returned in
Ordinal	AL (8-bit Values) AX (16-Bit Values) DX:AX (32-bit values)
8087	ST(0) on the 8087's register stack
Pointer	DX:AX

This table describes returning parameters from registers. Returning parameters from stack and fixed memory locations is also possible but it is not practiced widely.

In C all general purpose registers are considered volatile and C program does not depend on the values stored in it.

Other considerations:

When subroutine is called overhead data is pushed on the stack. This contains the return address for the calling program. Inadvertently changing this overhead data can, and usually does, have a disastrous effects. So care must be exercised such that return address is not modified.

Remember to leave the stack as it was when the routine was called. (Simplified language directive handle much of this for you in the latest assembler versions.)

Although above discussion is not through, It gives good idea of the basics of interfacing assembly routines with high level language. Now on the basis of above discussion I would like to start with the topic of our interest: ( Interfacing with C )

Memory models are important concepts in C programs. They have significant impact on the guidelines given below.

Model	Code Segments	Data Segments
Tiny	One, also containing data	Shared with code
Small	One for all procedures	One for all data
Medium	Many	One
Compact	One	Many
Large	Many	Many (64K size )
Huge	Many	Many ( > than 64K in size)

Most C programs are written in either small or large memory model.

General interfacing guidelines:

There are two ways to interface with C or C++. The easier method is to specify a language using assembler directives. (coming next) This feature is available with MASM and TASM. Another old method is to do manually what can be done automatically. Real knowledge lies in the older method. I have summarized the points in brief which are absolutely necessary for interfacing.

You must give code segment name _TEXT if you are using Microsoft C, Turbo C compiler.
You must give data segment name _DATA if you are using above mentioned compilers. These names vary from compiler to compiler.
Third, you must understand how parameter are passed in C. In the function calling syntax of:

function_name ( arg1, arg2, arg3, ………, argn ) ;

the values of each argument are pushed on stack in reverse order. Thus argument argn is pushed on stack first then arg(n-1) and so on till arg1. Parameters can be passed either by using pointer or using direct value. In compact, large and huge memory model the data pointer require 32 bit size. That must be noted.

The assembly language routines to be called from a C program must begin with a underscore character (_). e.g. _myfunct
Remember to save any special purpose registers (such as CS, DS, SS, SI, DI) your assembly program may disturb. Failure to save then you may find undesired effects when control is returned to the C program.

Simpler interfacing:

Latest assemblers provide facility of interfacing with popular languages very easily by means of advanced assembler directives. Older interfacing is shown in next block.

PUBLIC _MYFUNCT

_TEXT SEGMENT WORD PUBLIC ‘CODE’

ASSUME CS:_TEXT

_MYFUNCT PROC NEAR ; For small memory model.

If you are using newer advanced directives then this same information is coded as follows

PUBLIC MYFUNCT

.MODEL small, C

.CODE

MYFUNCT PROC

This source code is much cleaner. .MODEL and .CODE simplify the programmers task.

Interfacing subroutines without parameter passing:

These subroutines are very easy to program. Only care that should be exercised is, the special purpose registers (CS ,ES, DS, SS, DI, SI) which get modified during the subroutine execution should be restored before the procedure returns to main calling program. An example is given below.

Assembly code with directives

Assembly code without directives

PUBLIC CUR_ON

.MODEL small, C

.CODE

CUR_ON PROC

mov ah, 03h

mov bx, 00h

int 10h

and ch, 1fh

mov ah, 01h

int 10h

ret

CUR_ON ENDP

END

PUBLIC _CUR_ON

_TEXT SEGMENT WORD PUBLIC ‘CODE’

ASSUME CS: _TEXT

_CUR_ON PROC NEAR

mov ah, 03h

mov bx, 00h

int 10h

and ch, 1fh

mov ah, 01h

int 10h

ret

_CUR_ON ENDP

_TEXT ENDS

END

The C program which uses this procedure CUR_ON is shown below. Note that prototype of the function is "extern."

extern void CUR_ON ( void ) ;

void main ( void )

{

CUR_ON( ) ;

}

Interfacing subroutines with parameter passing:

This is the most interesting part of the tutorial. TASM / MASM assemblers provide advanced directives. If you use them then you don’t have to understand how parameters are manipulated in a C compatible assembly routine. One of these directives is "USES". An example is given below. The example also shows how to return values from a subroutine.

Assembly code with directives

Assembly code without directives

PUBLIC TRIPLE

.MODEL small, C

.CODE

TRIPLE PROC USES ES DI, NOS: WORD

mov ax, NOS

mov bx, 03h

mul bx

mov ax, bx

ret

TRIPLE ENDP

END

PUBLIC _TRIPLE

_TEXT SEGMENT WORD PUBLIC ‘CODE’

ASSUME CS: _TEXT

_TRIPLE PROC NEAR

push bp

mov bp, sp

push es

push di

mov ax, [bp+4]

mov bx, 03h

mul bx

mov ax, bx

pop di

pop es

pop bp

ret

_TRIPLE ENDP

_TEXT ENDS

END

Extra lines we see in the second column are in fact added by the assembler itself if we use the directives. BP must be saved because it is used by the calling program. The procedure given above requires one parameter on stack of 16 bits. Observe how it is used using BP pointer. The stack frame of the program before execution of instruction mov ax, [bp+4] is shown below.

Stack Frame	Memory	Registers
MSB of NOS	2000Ah	[BP + 4]
LSB of NOS	20009h	[BP + 4]
MSB of return address	20008h	[BP + 2]
LSB of return address	20007h	[BP + 2]
MSB of saved BP	20006h	BP
LSB of saved BP	20005h	BP
MSB of saved ES	20004h
LSB of saved ES	20003h
MSB of saved DI	20002h	SP
LSB of saved DI	20001h	SP

LSB = Least Significant Byte.

MSB = Most Significant Byte.

Thus you can observe that the first parameter begins at address BP+4. Don’t forget that this is the first parameter from left of the function call in C. All the remaining parameters (if any) should be successively push on stack prior to call. They can be manipulated using based index addressing mode e.g. [BP+6] Also important point is, in 8086 family microprocessor stack grows downwards hence the memory locations are given in decreasing order. If you have observed keenly the location of BP pointer, you must have noticed the it points at it’s original contents. But what about the pushed ES and DI? Those are pushed on stack to restore them later.

For extra information I would like to mention here about local variables. Those are stored after ES and DI. Yes! in stack. And every reference to it produces an instruction containing [BP - 6], [BP - 8] etc. operands. Thus in general in a C program, [BP + n] manipulates parameters and [BP - n] manipulates local variables.

A C program which uses above assembly procedure is given below.

#include <stdio.h>

extern int TRIPLE ( int ) ;

void main ( void )

{

int p ;

p = TRIPLE ( 20 ) ;

printf ( "%d", p ) ;

}

Now because the procedure is returning a 16 bit data (integer) the compiler produces code which reads AX after function call and assigns it to p. (Refer table given above) As p is also a local variable it is stored in stack. Hence a typical code produced is

mov [BP + n], ax

Note that BP has been restored by the procedure before returning. All the library functions, user defined functions follow the same technique. Just for fun try following code:

#include <stdio.h>

int twice ( int ) ;

void main ( void )

{

int p ;

p = twice ( 20 ) ;

printf ( "%d", p ) ;

}

int twice ( int q )

{

q*=2;

asm

{

mov ax, q

add ax, 10

pop bp

ret

}

The answer you get is 50. Which shows you have fooled the compiler!! Some of you must be wondering what is "asm"? This is a facility provided by C compiler to write inline assembly code. If writing assembly code is so simple in C then why did we go through all the way long in the tutorial. The reason is:

Originally written assembly code subroutines can be called from C programs. Large number of graphics routines available in assembly which can be directly interfaced in C.
Large inline assembly blocks makes C program look clumsy and not everyone understands assembly.
Third is of course, the knowledge.

Compiling C programs with "extern" function calls:

I shall explain it using TASM assembler.

Fist write assembly routine which is compatible with C. e.g. Triple.asm
Then assemble the file using TASM assembler

e.g. TASM Triple.asm

This creates a file called object file ".OBJ"

Then write C program which has "extern" keyword for one or more functions.
Compile this program using command line:

TCC <progname.c> <Triple.asm>

This results in working .EXE file.

In order to study the .EXE file in detail I suggest to use either TD (Turbo Debugger)

or TCC –S option. TCC –S <progname.c> produces assembly output of the C program!!

This was all about interfacing assembly routines in C. I hope my efforts will help someone to write and run mixed language programs.

Reference: "Using Assembly Language" by Allen L. Wyatt.

Third Edition, PHI.