Interfacing Assembly Language Routines with C 



This tutorial helps you in interfacing assembly routines in C language. Most serious C or C++ programmers must face the task of interfacing assembly language subroutines with the language. Because C and C++ are compiled languages, assembly language subroutines can be linked as the final stage of program development.

Before entering deep into the topic, I would like to address the general description of:

  • How the subroutine in invoked.

  • How parameters are passed.

  • How values are returned.

Subroutine invocation:

 

I must clear at this point that, now we are entering into the world of assembly language of 8086 family microprocessors, so further discussion heavily depends upon it. Prior knowledge of 8086 assembly language is assumed. I shall give explanation as far as space constraints allow. Simplest form of subroutine invocation is "CALL" mnemonic. Notre that stack frame is different for "FAR CALL" and for "NEAR CALL".

 

Passing Parameters:

 

High level languages like C pass parameters on stack. Because each high level language uses memory in different ways, a common area is needed for subroutine interfacing. The accepted standard is stack. Mnemonic used to store 16 bit data on stack is "PUSH". "POP" to retrieve data. Normally if the parameter is a short numeric integer, its data value is pushed on stack. If the parameter is an alphanumeric string, a pointer to the string is passed. Floating point format numbers can be passed either by pointer or direct value. (single/double precision)

 

Accessing parameters on the stack is simplified by secondary stack pointer: BP. The BP register is used for accessing information on stack. The important thing is our assembly subroutines which are "extern" for C program should use BP pointer in such a way that it should not affect the working of main C program. C programs heavily depends on BP pointer.

 

Returning values:

 

Values are returned by following ways:

  • Through the stack.

  • Through a register.

  • Through memory.

None of these methods are automatic and we must predetermine which method we are going to use and that must be pretty clear in assembly subroutine documentation.

Turbo C follows following table to return values from a subroutine. Even functions using asm (the inline assembler directive) must return their results as follows:

 

Type of result

Returned in

Ordinal

AL (8-bit Values)

AX (16-Bit Values)

DX:AX (32-bit values)

8087

ST(0) on the 8087's register stack

Pointer

DX:AX

 

 

This table describes returning parameters from registers. Returning parameters from stack and fixed memory locations is also possible but it is not practiced widely.

In C all general purpose registers are considered volatile and C program does not depend on the values stored in it.

Other considerations:

    • When subroutine is called overhead data is pushed on the stack. This contains the return address for the calling program. Inadvertently changing this overhead data can, and usually does, have a disastrous effects. So care must be exercised such that return address is not modified.

    • Remember to leave the stack as it was when the routine was called. (Simplified language directive handle much of this for you in the latest assembler versions.)

Although above discussion is not through, It gives good idea of the basics of interfacing assembly routines with high level language. Now on the basis of above discussion I would like to start with the topic of our interest: ( Interfacing with C )

 

Memory models are important concepts in C programs. They have significant impact on the guidelines given below.

 

Model

Code Segments

Data Segments

Tiny

One, also containing data

Shared with code

Small

One for all procedures

One for all data

Medium

Many

One

Compact

One

Many

Large

Many

Many (64K size )

Huge

Many

Many ( > than 64K in size)

 

Most C programs are written in either small or large memory model.

General interfacing guidelines:

 

There are two ways to interface with C or C++. The easier method is to specify a language using assembler directives. (coming next) This feature is available with MASM and TASM. Another old method is to do manually what can be done automatically. Real knowledge lies in the older method. I have summarized the points in brief which are absolutely necessary for interfacing.

  • You must give code segment name _TEXT if you are using Microsoft C, Turbo C compiler.

  • You must give data segment name _DATA if you are using above mentioned compilers. These names vary from compiler to compiler.

  • Third, you must understand how parameter are passed in C. In the function calling syntax of:

function_name ( arg1, arg2, arg3, ………, argn ) ; 

 

the values of each argument are pushed on stack in reverse order. Thus argument argn is pushed on stack first then arg(n-1) and so on till arg1. Parameters can be passed either by using pointer or using direct value. In compact, large and huge memory model the data pointer require 32 bit size. That must be noted.

  • The assembly language routines to be called from a C program must begin with a underscore character (_). e.g. _myfunct

  • Remember to save any special purpose registers (such as CS, DS, SS, SI, DI) your assembly program may disturb. Failure to save then you may find undesired effects when control is returned to the C program.

Simpler interfacing:

 

Latest assemblers provide facility of interfacing with popular languages very easily by means of advanced assembler directives. Older interfacing is shown in next block.

 

PUBLIC _MYFUNCT

_TEXT SEGMENT WORD PUBLIC ‘CODE’

ASSUME CS:_TEXT

_MYFUNCT PROC NEAR ; For small memory model.

 

If you are using newer advanced directives then this same information is coded as follows

 

PUBLIC MYFUNCT

 

.MODEL small, C

.CODE

MYFUNCT PROC

 

This source code is much cleaner. .MODEL and .CODE simplify the programmers task.

 

Interfacing subroutines without parameter passing:

 

These subroutines are very easy to program. Only care that should be exercised is, the special purpose registers (CS ,ES, DS, SS, DI, SI) which get modified during the subroutine execution should be restored before the procedure returns to main calling program. An example is given below.

 

Assembly code with directives

Assembly code without directives

PUBLIC CUR_ON

.MODEL small, C

.CODE

 

CUR_ON PROC

mov ah, 03h

mov bx, 00h

int 10h

and ch, 1fh

mov ah, 01h

int 10h

ret

CUR_ON ENDP

END

PUBLIC _CUR_ON

_TEXT SEGMENT WORD PUBLIC ‘CODE’

ASSUME CS: _TEXT

_CUR_ON PROC NEAR

mov ah, 03h

mov bx, 00h

int 10h

and ch, 1fh

mov ah, 01h

int 10h

ret

_CUR_ON ENDP

_TEXT ENDS

END

 

The C program which uses this procedure CUR_ON is shown below. Note that prototype of the function is "extern."

 

extern void CUR_ON ( void ) ;

void main ( void )

{

CUR_ON( ) ;

}

Interfacing subroutines with parameter passing:

 

This is the most interesting part of the tutorial. TASM / MASM assemblers provide advanced directives. If you use them then you don’t have to understand how parameters are manipulated in a C compatible assembly routine. One of these directives is "USES". An example is given below. The example also shows how to return values from a subroutine.

 

 

Assembly code with directives

Assembly code without directives

PUBLIC TRIPLE

.MODEL small, C

.CODE

TRIPLE PROC USES ES DI, NOS: WORD

mov ax, NOS

mov bx, 03h

mul bx

mov ax, bx

ret

TRIPLE ENDP

END

PUBLIC _TRIPLE

_TEXT SEGMENT WORD PUBLIC ‘CODE’

ASSUME CS: _TEXT

_TRIPLE PROC NEAR

push bp

mov bp, sp

push es

push di

mov ax, [bp+4]

mov bx, 03h

mul bx

mov ax, bx

pop di

pop es

pop bp

ret

_TRIPLE ENDP

_TEXT ENDS

END

 

Extra lines we see in the second column are in fact added by the assembler itself if we use the directives. BP must be saved because it is used by the calling program. The procedure given above requires one parameter on stack of 16 bits. Observe how it is used using BP pointer. The stack frame of the program before execution of instruction mov ax, [bp+4] is shown below.

 

    Stack Frame

Memory

Registers

MSB of NOS

2000Ah

 

 [BP + 4]

LSB of NOS

20009h

MSB of return address

20008h

 

 [BP + 2]

LSB of return address

20007h

MSB of saved BP

20006h

 

 BP

LSB of saved BP

20005h

MSB of saved ES

20004h

 

LSB of saved ES

20003h

MSB of saved DI

20002h

  SP

LSB of saved DI

20001h

 

LSB = Least Significant Byte.

MSB = Most Significant Byte.

 

Thus you can observe that the first parameter begins at address BP+4. Don’t forget that this is the first parameter from left of the function call in C. All the remaining parameters (if any) should be successively push on stack prior to call. They can be manipulated using based index addressing mode e.g. [BP+6] Also important point is, in 8086 family microprocessor stack grows downwards hence the memory locations are given in decreasing order. If you have observed keenly the location of BP pointer, you must have noticed the it points at it’s original contents. But what about the pushed ES and DI? Those are pushed on stack to restore them later.

 

For extra information I would like to mention here about local variables. Those are stored after ES and DI. Yes! in stack. And every reference to it produces an instruction containing [BP - 6], [BP - 8] etc. operands. Thus in general in a C program, [BP + n] manipulates parameters and [BP - n] manipulates local variables.

 

A C program which uses above assembly procedure is given below.

 

#include <stdio.h>

extern int TRIPLE ( int ) ;

void main ( void )

{

int p ;

 

p = TRIPLE ( 20 ) ;

printf ( "%d", p ) ;

}

 

Now because the procedure is returning a 16 bit data (integer) the compiler produces code which reads AX after function call and assigns it to p. (Refer table given above) As p is also a local variable it is stored in stack. Hence a typical code produced is

 

mov [BP + n], ax

 

Note that BP has been restored by the procedure before returning. All the library functions, user defined functions follow the same technique. Just for fun try following code:

 

#include <stdio.h>

int twice ( int ) ;

void main ( void )

{

int p ;

 

p = twice ( 20 ) ;

printf ( "%d", p ) ;

}

 

int twice ( int q )

{

q*=2;

asm 

{

mov ax, q

add ax, 10

pop bp

ret

}

}

 

The answer you get is 50. Which shows you have fooled the compiler!! Some of you must be wondering what is "asm"? This is a facility provided by C compiler to write inline assembly code. If writing assembly code is so simple in C then why did we go through all the way long in the tutorial. The reason is:

  • Originally written assembly code subroutines can be called from C programs. Large number of graphics routines available in assembly which can be directly interfaced in C.

  • Large inline assembly blocks makes C program look clumsy and not everyone understands assembly.

  • Third is of course, the knowledge.

Compiling C programs with "extern" function calls:

 

I shall explain it using TASM assembler.

  • Fist write assembly routine which is compatible with C. e.g. Triple.asm

  • Then assemble the file using TASM assembler

e.g. TASM Triple.asm

This creates a file called object file ".OBJ"

  • Then write C program which has "extern" keyword for one or more functions.

  • Compile this program using command line:

TCC <progname.c> <Triple.asm>

  • This results in working .EXE file.

In order to study the .EXE file in detail I suggest to use either TD (Turbo Debugger)

or TCC –S option. TCC –S <progname.c> produces assembly output of the C program!!

 

This was all about interfacing assembly routines in C. I hope my efforts will help someone to write and run mixed language programs.

 

Reference: "Using Assembly Language" by Allen L. Wyatt.

Third Edition, PHI. 

Back


Home | Online Courses | Free C Source Code | Free C# Source Code | Free VC++ Source Code | COM/DCOM Stuff |  Courses@Nagpur | Project Ideas | Ask Queries | COM FAQs |  Conferences | Discussion Board | Previous Weekly Updates | Good Books | Vedic Maths | Time Pass |  Submit Code | About Us | Advertise | Disclaimer  


 Designed and Managed by
DCube Software Technologies, Nagpur (India) 
Last Revised: 5th July 2002 8:05:20