External Interfacing

A historical problem in Smalltalk environments has been interfacing with the outside world. In Dolphin we have attempted to address this problem by including external interfacing capabilities which are powerful, complete, and relatively simple to use:

External Libraries

In Smalltalk everything is (or should be) an object, and methods should be invoked by sending messages to objects. Consequently we represent external libraries with objects in Dolphin, and wrap each function we wish to call in a particular library in a method (an ExternalMethod) which describes the types of the parameters the external function call is expecting. So each separate external library has a class (a subclass of ExternalLibrary) whose methods represent the external functions. The External Library pattern details the steps necessary to create one of these beasties.

External Methods

The concept of "types" in Smalltalk is controversially somewhat different than that of C/C++ and many other languages. Smalltalk objects carry their type along with them, and respond to messages in an appropriate manner. We typically group a set of messages (and the behaviour expected when those messages are acted upon) into a "protocol". Protocols are independent of any particular class. Protocols are arguably the nearest thing to types in Smalltalk, not classes. Furthermore, Smalltalk variables (including parameters) are typeless: When the only thing one can do to an object is to send it messages, one doesn't care about its internal representation, or its class, only whether it correctly responds to a given protocol. When calling external functions, however, we must map Smalltalk objects to a notion of types which does include representation, because in the external world "type" generally specifies representation. Dolphin achieves this mapping by including type information in a special form of primitive method format, and by performing appropriate type conversions.

The external call interface primitives provide automatic conversion of objects for parameters and return values of the following types:

Signed (two's complement) and unsigned integers of 8, 16, 32, and 64 bits
Single (32-bit) and double (64-bit) precision floating point numbers
Characters
Strings
Booleans
Handles
Pointers
Structures (pass and return by value, or reference)

The precise set of parameter types, and the automatic conversion and validation applied are documented in type conversions and validation (though it is recommended that you take a peek at InvalidExternalCall class>>validation to check the latest set and validation rules). For example, Dolphin Strings are null-terminated, so they can be safely passed to C/C++ functions as C strings (the null-terminator is not included in the size a String answers when one sends it the #size message) using the lpstr or lpvoid types.

Where an object cannot be automatically converted by the VM, the normal technique is to implement an #asParameter method for that object to answer a more fundamental type which can be passed directly to an appropriately declared external method. This is also more flexible, because polymorphic conversion can be performed as required.

Here is an example from UserLibrary, which is the class representing User32.DLL (one of the base Win32™ DLLs).

childWindowFromPointEx: hwnd pt: aPOINTL uFlags: flags
	"Answers the handle of the window that contains the specified point. 
		HWND ChildWindowFromPointEx(
			HWND hwndParent, 	// handle to parent window
			POINT pt, 	// structure with point coordinates
			UINT uFlags	// skipping flags
		);"

	<stdcall: handle ChildWindowFromPointEx handle POINTL dword>
	^self invalidCall

We might invoke this by evaluating an expression such as:

	UserLibrary default childWindowFromPointEx: View desktop asParameter pt: (300@400) asParameter uFlags: 0.

It is good practice to write helper methods which wrap external library calls into more flexible, object-oriented, and easily used methods. Wrapper methods should perform any useful #asParameter conversions, and should convert any return values to appropriate objects (e.g. window handles should normally be converted to an appropriate View subinstance. Where a wrapper method exists, this should be used in preference to the underlying external method, and should generally be the only sender of the external library selector.

Our example, UserLibrary>>childWindowFromPointEx:pt:uFlags:, is wrapped by View>>subViewFromPoint:flags: thusly:

subViewFromPoint: aPoint flags: cwpFlags
	"Answers the View in the receiver beneath aPoint (in the receivers coordinates)."

	| handleFound viewFound |
	handleFound := UserLibrary default 
		childWindowFromPointEx: self asParameter pt: aPoint asParameter uFlags: cwpFlags.
	^(handleFound notNil
		and: [handleFound ~= self handle
			and: [(viewFound := self class withHandle: handleFound) notNil
				and: [viewFound isManaged]]])
		ifTrue: [viewFound]

In this example the coordinate argument, aPoint, is converted to a POINTL by sending it #asParameter. This allows it to be either a POINTL, or a Point, or some other object which implements #asParameter to return a POINTL. The routine also "improves" on the underlying Win32™ function by always answering a subview, regardless of the depth of nesting, and not answering the same window. When modifying the functionality in this way, you may want to consider providing a more basic wrapping function (perhaps prefixed with #basic...), as this the "raw" form is sometimes needed by subclasses and other closely related classes.

If you pass a Smalltalk object to an external library function by reference (i.e. as an lpvoid or lpstr parameter) and the function is likely to capture that parameter (i.e. save it down and write back into it after the call has returned), then you must ensure that the object is allocated from fixed memory (ExternalStructures are automatically allocated in this way using the fixed allocation primitive). If you do not do this, the object may move in memory during garbage collection, and unpredictable behaviour will be the result! You must also ensure that externally captured objects continue to be referenced from Smalltalk in order that they are not garbage collected, or the result will be the same.

Returning Structures

Dolphin's external call primitives allow structures to be returned by reference or by value. Normally such structures will be subclasses of ExternalStructure (or if not, they must have the same form). For example:

makeRect: left top: top right: right bottom: bottom
	"Private - Answer a RECT instantiated from the four integer corner coordinate arguments.
	This method is primarily present as an example of an external function which returns
	a >8 byte structure (4, 8, >8 byte structures are all returned using different
	mechanisms). The code of the function could be paraphrased as:

		RECT MakeRect(LONG left, LONG top, LONG right, LONG bottom)
		{
			return RECT(left, top, right, bottom);
		}

	Calling this function and having the external function call primitive instantiate and
	return a RECT is also considerably faster than building it in Smalltalk!"

	<stdcall: RECT AnswerDQWORD sdword sdword sdword sdword>
	^self invalidCall

This method of VMLibrary makes a sneaky use of the external call primitive argument/return value conversion in order to instantiate a Win32™ RECT structure from four integers.

This feature is very useful (particularly for implementing OLE interfaces) in that it means that no further manual conversion of the return value (or call-in argument) is needed, however because the object is directly instantiated by the VM, it may not be properly initialized. If your class includes any specific initialization (e.g. an #initialize method), or has a custom implementation of #new, then you may need to send some appropriate messages to the VM instantiated object. This is not normally an issue because ExternalStructures mostly just represent external structures without adding significant extra state and behavior. In some unusual cases, however, you may prefer to instantiate the correct object yourself, and this can be done by simply specifying the return value as an untyped pointer (lpvoid) or untyped structure (<N>, where N is the byte size of the structure). The resulting ExternalAddress or ByteArray, can be used to construct the appropriate Smalltalk object. In either case, you will probably want to write a wrapper method to perform the necessary custom operations.

The above also applies to structure parameters to callback (i.e. call-in) methods or blocks.

External Method Format

Methods which perform external calls (library function and virtual calls) in Dolphin are instances of a specialised form of CompiledMethod called ExternalMethod.

The format of an external call method is similar to a primitive method. External call methods have the usual method header (keyword selector and interspersed argument names). In order to standardise the names of external library selectors (and thus make them both easier to find and less likely to be duplicated), we follow the pattern described in External Method Selectors. This pattern is based on the CORBA Smalltalk mapping name generation scheme.

Following the method header is the external call descriptor:

< <call type> <return type> <descriptor> <parm list> >

where:

<call type> = [virtual] <call convention>

<call convention> = stdcall: | cdecl:

<vfn number> = 1..n

<struct type> = ExternalStructure subclass name

The virtual prefix and virtual function number are only for use when constructing virtual calls.

The parameter list must contain as many parameters as are passed to the method, plus, optionally, one for the return type from the external function (the return of which will form the return from the primitive invocation method).

In the case of a call to the Win32 GetComputerName() function a suitable method (of KernelLibrary) might be:

getComputerName: buffer nSize: pBufSize
	"Retrieves the computer name of the current system into the argument, buffer (which must be large
	enough to contain MAX_COMPUTERNAME_LENGTH+1 characters). Answers whether the name
	was successfully retrieved. If successful then the value of pBufSize will be the number of characters
	in the name.

		BOOL GetComputerName(
			LPTSTR lpBuffer,	// address of name buffer 
			LPDWORD nSize 		// address of size of lpBuffer 
		);"

	<stdcall: bool GetComputerNameA lpstr DWORD* >
	^self invalidCall

This function is wrapped by SessionManager>>computerName, as follows:

computerName
	"Answer the name of the computer hosting the current session. 
	Signals a HostSystemError if the request fails."

	| name nameSize |
	name := String new: MAX_COMPUTERNAME_LENGTH.
	nameSize := DWORD fromInteger: name size+1.
	(KernelLibrary default getComputerName: name nSize: nameSize asParameter)
		ifFalse: [HostSystemError signal].
	^name leftString: nameSize asInteger

Parameter Types, Validation and Conversion

The external method parameter and return types are:

bool: Boolean. As an argument type, accepts true (translated to 1) or false (translated to 0). Also accepts SmallInteger values, pushing their host machine representation. As a return type, if the result is 0 answers false, if the result is non-zero, answers true.
byte: Unsigned byte. Accepts SmallIntegers only. Passes a 32-bit value generated by zero extending the least significant byte. Fails if not in range 0..255. Zero extends into a positive SmallInteger when used as a return type.
char: Signed character. Accepts Characters only.
double: 64-bit floating point. Accepts instances of Float (which contain a host machine representation of a double precision floating point number). SmallIntegers may also be passed (they are promoted to the double precision floating point representation of their integral value).; Doubles are passed on the machine stack (not the FP stack) as 64-bit values. As a return type, answers an instance of Float.
dword: Unsigned double word (32-bits), accepts 32-bit Integers in the range -16r80000000..16rFFFFFFFF. Positive integers are passed as unsigned, and negative integers in their two's complement representation. The largest LargeNegativeInteger which can be passsed is -16r80000000 (or -2147483648) because this is the largest negative number which can be represented in 32-bits in two's complement notation. Also accepts byte objects of length 4, assumed to be in an unsigned bit representation. nil is passed as 0. As a return type, answers a SmallInteger, or a LargePositiveInteger if the result cannot be represented as a positive SmallInteger (i.e. in 30 bits).
float: 32-bit floating point. Accepts instances of class Float, or SmallIntegers (as double). The conversion of Floats (64-bit double precision) to float (32-bit single precision) may result in silent loss of precision. Floats are passed on the machine stack (not the FP stack) as 32-bit values. As a return type answers an instance of class Float (i.e. promotes to double precision).
handle: 32-bit handle. Accepts 32-bit integers, nil, or a byte object of size 4. As a return type, answers an ExternalHandle, unless the returned handle is NULL, in which case answers nil. This is a useful shortcut for specifying ExternalHandle as a pass or return by value struct type.
hresult: 32-bit signed integer value. Validation as sdword. As a return type, if less than 0 (i.e. severity is error), causes the external call primitives to fail with a negative failure reason which is the HRESULT value. This is convenient (especially for OLE) because it means an exception is automatically generated when an external function returns an HRESULT error.
lpstr: Pointer to C (null-terminated) ASCII string type. Accepts null-terminated byte objects (e.g. Strings, Symbols) or nil (null pointer). When used as a return type, answers a String containing the characters of the C string up to the null terminator. Unlike lpvoid, does not accept integer values as pointers, or ExternalAddress (indirection) objects. If the validation is too tight for your requirements, then use lpvoid. Do not use this return type where an external function is called which expects the caller to assume ownership of the returned string and to delete it when it is no longer required, as a memory leak will result (use lpvoid instead).
lpwstr: Pointer to null-terminated wide (Unicode) string. Primarily present as a placeholder for a future Unicode version of Dolphin, and is currently synonomous with lpstr.
lppvoid: Pointer to pointer. Used for functions which take a parameter into which they write an address. The corresponding argument must be an ExternalAddress (or other indirection object), or an object whose first instance variable is such (e.g. an ExternalStructure). The address of the ExternalAddress itself is passed, so that on return it contains the address written back by the external function. nil is not a valid argument value. As a return type answers a pointer instance of LPVOID (i.e. void**) containing the address returned from the function.
lpvoid: General pointer type, accepts byte objects e.g. Strings (pointer to contents passed), nil (null pointer), SmallIntegers (passes as address), or ExternalAddresses (the contained address is passed, not a pointer to the ExternalAddress object). Where the pointer is captured by the external function, care should be taken to ensure that the object whose address was passed is not garbage collected. When used as a return type, the method answers an ExternalAddress with the returned value.
oop: Object identifier. Any object can be passed. This parameter type is intended for use with the forthcoming User Primitive Kit. The value should be treated as an opaque 32-bit value, and should not be stored for later use (it may change during a GC). As a return type, answers the object whose Oop is the result, though this may not be reliable because object's identifiers are not guaranteed to be invariant over time. At present you should not use this parameter type.
sbyte: Signed byte. Accepts SmallIntegers only. Passes a 32-bit value generated by sign extending the least significant byte. Fails if not in range -128..127. Sign extends into a positive or negative SmallInteger when used as a return type.
sdword: Signed double word, accepts any Integer in the range -16r80000000..16r7FFFFFFF (i.e. Integer's with a 32-bit two's complement representation - all SmallIntegers, some 4-byte LargePositiveIntegers, and some 4-byte LargeNegativeIntegers). May also be other byte objects of length 4, which are assumed to contain a 2's complement 32-bit number. As a return type answers a SmallInteger, or if more than 31-bits are required to represent the two's complement result, a LargePositiveInteger or LargeNegativeInteger depending on sign. Also accepts nil (passed as 0).
sqword: Signed quad word. Accepts any Integer in the range which can be represented as a two's complement number in 64 bits (i.e. -16r8000000000000000 to 16r7FFFFFFFFFFFFFFF). Also accepts 8 byte objects, which are assumed to contain 64-bit two's complement numbers. nil is passed as 0. As a return type answers the smallest Integer form which can contain the 64-bit two's complement integer.
<struct>: Where struct is an ExternalStructure class name. Structure passed by value. Accepts only the exact matching structure class. Again, the ExternalStructure arguments may be reference/pointer instances. Note that it is very important to define the associated structure correctly, as if it has an incorrect size an unrecoverable stack fault is the likely result when passing by value. When used as a return value, an instance of the ExternalStructure class is answered, with the bytes of the returned structure as its contents (copied into a ByteArray). Such newly created instances are created directly by the VM, and thus subsidiary initialization may be necessary. We suggest performing any such initialization in a wrapper function. Note that the calling conventions for returning structures by value vary depending on whether the structure is 4, 8, or >8 bytes long, with only the latter being returned on the stack. 4 and 8 byte structures are returned quite efficiently in registers.
<struct>*: Where struct is an ExternalStructure class name. In beta-2 this is equivalent to lpvoid as a argument type - no validation is performed. A later version may perform additional validation. When used as a return type, a pointer instance of the specified ExternalStructure class is answered, containing an ExternalAddress pointing at the externally stored value as its first instance variable. Note that the ExternalStructure is instantiated directly by the VM, and will have the correct form, but may not be correctly initialized. Any subsidiary initialization required is best performed in a wrapper function.
sword: Signed word. As sbyte, but 16-bit, acceptable range -32768..32767. Also accepts a byte object of size 2, which is sign extended to 32 bits.
word: Unsigned word. As byte, but 16-bit, acceptable range 0..65535. Also accepts a byte object of size 2, which is zero extended to 32 bits.
void: Only valid as a return type - the method answers self.
<N>: Where N is the byte size of a pass-by-value structure of unspecified type (check carefully to make sure the specified size is correct). Accepts either byte objects (of the correct size) or ExternalStructure instances with the correct byteSize (or other classes with the same shape as ExternalStructures). ExternalStructures passed to such arguments can be reference instances (i.e. ones containing a pointer to the actual structure bytes, rather than the structure bytes themselves). As a return type, the result is a ByteArray of the specified size.

The reason for range limitations on integer types is that we consider it more within the spirit of Smalltalk to generate an error when something is out of range, than to silently truncate it. Indeed, for signed types in particular, truncation is unlikely to produce the correct result. In the case of return values, it is often the case, particularly with Win32™ routines (which are often coded in assembler in Windows95™), that a function which is specified as returning a WORD value, actually returns a DWORD value (as the return value is in EAX, and the function may not clear the high order 16-bits), so in this case silent truncation must occur.

The treatment of the unsigned integer dword and qword types is slightly inconsistent with other unsigned integer types, as they do not insist that their arguments are positive. This is because no promotion is necessary (i.e. there is no need to zero/sign extend) so this may offer more flexibility without being a significant source of errors. This may be changed in a future release.

In general, the UndefinedObject, nil, is interchangeable with 0, or NULL, when interfacing with external library functions. 'Nullness' can be tested with the #isNull message, with the UndefinedObject and the SmallInteger zero answering true. The lpstr argument type is an exception to this, in that it only accepts nil as the null pointer.

If an external call fails a walkback will occur describing which argument was incorrect, what it was, and what the type expected was (external type, not Smalltalk class, e.g. dword). This error handling is performed by the exception class InvalidExternalCall which examines the primitive failure details for the process, and generates an appropriate exception.

Should you wish to pass objects to the external function which the VM will not automatically convert, or which it converts inappropriately, then the normal practice is to implement an #asParameter method for the object, and to send #asParameter to such objects when making external calls.

Additional types may be added from time to time, if they are of sufficient utility.

External Call Limitations

Variable argument functions (those prototyped with ellipsese in C, such as printf()) are not currently supported, primarily because there is no particularly neat syntax in Smalltalk for sending variable argument messages, or dynamically constructing non-literal Arrays.. In the future these may be supported using an array of arguments.
When passing structures by value (using the struct parameter type), be very careful to declare the correct ExternalStructure, and to declare the ExternalStructure correctly. If you do not then an unrecoverable stack fault is almost certainly going to result and you will lose any unsaved changes.
Typed pointer arguments (i.e. <struct>*) are not validated to be of the correct type at present, primarily for efficiency reasons. Even so it is recommended that typed pointers be used to correctly maintain package prerequisites and for readability.
The fastcall and thiscall (C++ non-virtual member function) calling conventions are not currently supported.

Virtual Calls (C++/OLE COM Interface)

Dolphin can interface directly to objects which obey the C++ virtual calling convention. Obviously this includes C++ objects themselves, but can also include objects implemented in other languages which implement the Microsoft Common Object Model (the basis of OLE), since this is based on the C++ virtual function model. The implementation of such external objects would normally reside in a DLL, or DLLs (under Win32™ there are no special prologs and epilogs for exported functions, so it is always possible to generate a DLL from a class library supplied as a LIB). OLE COM objects can also reside in other executables, and, in the case of Distributed COM, can even reside on remote machines.

In order to interface to C++ objects there needs to be some way to instantiate such objects (or gain access to those which already exist), and then to invoke those objects member functions. These are some of the fundamental features of COM, but we consider the simpler case of interfacing to C++ in DLLs here. We need a Smalltalk class (or classes) to represent those objects in Smalltalk (i.e. to act as a proxy).

For example we might have the simple C++ class:

class CPPBlah
{
        char* m_szName;
public:
        CBlah(const char* szName) { m_szName = strdup(szName); }
        virtual ~CPPBlah() { delete m_szName; }
        virtual const char* getName() { return m_szName; }
        void setName(const char* szName) 
                { delete m_szName; m_szName = strdup(szName); }
};

We might have an ExternalAddress object pointing at an instance of the C++ class CPPBlah, i.e. the object lives outside the Smalltalk object space, or we might want the C++ object to reside in Smalltalk memory. These are precisely the capabilities of ExternalStructure (which can represent a structure by value or reference), so we could add a subclass called, say, ExternalBlah.

Instantiating C++ objects is achieved by calling either an static member function, or an ordinary dynamic link library function. There is no difference, except that the former is likely to have a long mangled name. At present you must determine the mangled name of the function yourself (e.g. by using "dumpbin /exports" or the linker map file). For example we might have a simple factory function (which could be an exported static member of CPPBlah):

__declspec(dllexport) CPPBlah* __stdcall makeNewBlah(const char* szName) 
{
        return new CPPBlah(szName); 
}

We could write a method of an ExternalLibrary subclass, BlahLibrary, to invoke this factory function thus:

makeNewBlah: aString
        "Answer a new external C++ Blah object."

        <stdcall: Blah* '_makeNewBlah@4' lpstr>
        ^self invalidCall

If we then evaluate the following expression:

	BlahLibrary default makeNewBlah: 'I'm a new Blah created from Dolphin'

We would have an instance of ExternalBlah containing an ExternalAddress object pointing at an instance of the C++ class CPPBlah. Alternatively we could instantiate an ExternalBlah object, and pass that to a library function which constructs a C++ object in that memory (either by calling the constructor directly, or by using the placement syntax of operator new()).

Our ExternalStructure subclass, ExternalBlah, is the home for the virtual call methods. We can add additional Smalltalk instance variables to this class as required.

Once we have an object of the appropriate type, then we want to be able to invoke its member functions. These can be either statically bound, or dynamically bound (virtual). The latter are somewhat easier to implement and call, using the same format as an external library call, but with a virtual prefix and specifying a virtual function number (index in the vtbl) instead of a function name or ordinal. Knowledge of the mangled function name is not required, because the function is accessed by offset into the virtual table (this does mean it is very important to get the offset correct, hence this is calculated automatically in the COM add-in package, which generates all external function definitions automatically). We might define the CPPBlah::getName() virtual function as follows:

name
        "Answer a the name of the external C++ Blah object."

        <virtual cdecl: lpstr 1>
       ^self invalidCall

The #name message can then be sent to instances of ExternalBlah in the normal way and will answer a string which is a copy of that stored in the referenced C++ object.

Normal (non-virtual) member functions may be supported in future by implementing them in the relevant ExternalLibrary subclass using the thiscall calling convention. For example:

setName: anExternalObject to: aString
        "Set the name of an external C++ Blah object."

        <thiscall: void '_mangle_mangle_setName_mangle@8' lpvoid lpstr> 
        ^self invalidCall

At present, however, thiscall is not supported (primarily because it is not needed for OLE, as all COM functions must be virtual). The workaround is to ensure that all C++ member functions you might wish to call from Dolphin are declared as virtual, or explicilty declared as with cdecl or stdcall calling conventions.

Ordinary member functions will have to be added to an ExternalLibrary rather than the C++ object's proxy Smalltalk class, because Dolphin needs to be able to locate the functions using GetProcAddress() (for this reason they must also be exported). Furthermore it is necessary to explicitly pass the implicit (in C++) this parameter (being the address from the relevant proxy object). This does make using ordinary member functions considerably less convenient. To mitigate this inconvenience it is suggested that forwarding methods are implemented in the proxy class itself to wrap the external calls.

Static member functions are called in exactly the same way as any other exported dynamic link library functions.

It is suggested that Dolphin's finalization support (see Weak References and Finalization) be used to give the proxy object (e.g. the ExternalBlah instance) a chance to invoke the C++ destructor when the object has no further Smalltalk references. This makes it much easier to synchronise the lifetime of the heap based C++ object with the garbage collected Smalltalk object. We recommend that destructors are always declared as virtual, as this makes it possible to correctly delete a C++ object polymorphically, and, as mentioned, makes it easier to call in a DLL (e.g. from Dolphin).

There is some flexibility over the definition of the proxy class. The virtual call primitive is able to invoke C++ virtual functions against such a class of objects if:

It is a byte object of at least 4 bytes. In this case the primitive assumes that it is the object itself - i.e. the C++ object lives in the Smalltalk memory space.
It is an indirection byte object (i.e. a pointer). ExternalAddress is an example of such a class. In this case the primitive assumes that the objects state is a pointer to the C++ object, which probably lives outside the Smalltalk memory space (i.e. it contains the this pointer). It is not necessary to subclass ExternalAddress to define an indirection class, but see that class to see how to do it. lpvoid return types can be mutated to correctly defined indirection classes by using #becomeA:
It is a pointer object whose first instance variable is as described in (1) or (2), e.g subclasses of ExternalStructure. This is generally the most powerful and flexible solution.

Normally you will find it most convenient to add the proxy class as a subclass of ExternalStructure.

OLE COM Functions

The virtual call interface is the basis of OLE COM call-out support in Dolphin. For example IUnknown (a subclass of ExternalStructure) contains the method:

QueryInterface: QueryInterface ppvObject: ppvObject 
	"Callout for the IUnknown::QueryInterface() interface function.
	N.B. This method has been automatically generated from 
	the vtable defined in IUnknown>>defineFunctions. DO NOT MODIFY!"

	<virtual stdcall: sdword 1 IID* lppvoid >
	^self invalidCall

Note that this method has been generated automatically from the class' function table, a facility of the OLE COM package, but is in other respects simply an external virtual function call.

We will not go into further details of COM interfacing here, as this is a large and separate topic..

External Structures

All native Smalltalk objects have one of three regular formats:

they contain bytes, accessible by index only, or
they contain references to other objects (object pointers), accessible by index and/or name, or
they have an "immediate" representation, whereby their value is encoded in their identity (their object pointer).

The outside world is not so simple - data structures are normally a mish-mash of different data types, packed together into C structs, or C++ classes, or whatever. There is no direct support in standard Smalltalk syntax for accessing the fields of structures which are different data types, and indeed this is somewhat contrary to the Smalltalk "everything is an object" concept. The purpose of the external structure support in Dolphin is to allow one to represent free format data structures, and provide means of getting (creating objects out of) and setting (putting objects into) the fields of such data structures.

ExternalStructure

All of the data types intended for interfacing with the outside world from Dolphin are subclasses of ExternalStructure, This applies to scalar, usually 32-bit, values, as well as "structures". The contents of an ExternalStructure are represented with an object which understands the external bytes protocol (e.g. ByteArray and ExternalAddress). The external bytes protocol includes a number of primitive operations for accessing the native machine representations of fundamental types such as signed and unsigned 32-bit integers, and double prescision floats, at specified offsets. For example, given an 8 byte ByteArray, if I send it #dwordAtOffset: with the argument 4, then it would answer the 32-bit unsigned integer value stored from offset 4 to offset 7 inclusive.

The external bytes protocol has primitive accessors for 32-bit signed and unsigned integers, 16-bit signed and unsigned integers, unsigned bytes, 64-bit floating point numbers, 32-bit floating pointer numbers, strings, etc. Using these primitive accessors it is possible to hand code a structure class which provides higher level accessors for each field. However this is a very tedious and error prone process, because it involves a lot of typing, and you will have to work out the correct field offsets by hand. In addition you will have to code a method for each field you want to access, even if you'd rather not have all those methods clanking around in the image.

Structure Templates

In order to largely automate the process of defining external structures, every ExternalStructure subclass can have a template defined which specifies symbolic names and type descriptors for each field in the corresponding structure. The template definition is stored in the #defineTemplate class method. A simple example is POINTL (the Dolphin structure to represent the Windows™ POINTL structure):

defineFields
	"Define the fields of the Win32 POINTL structure.
		POINTL compileDefinition
	"
	self 
		defineField: #x type: SDWORDField new;	
		defineField: #y type: SDWORDField new

Here we've said that Windows™ POINTLs contain a pair of signed 32-bit integer coordinates, which they do!

Having defined structure, we can then start to use it immediately by using the field names defined in the template as message selectors (e.g. to get the x coordinate of a POINTL we would simply send it #x, and to set it, #x:). The ExternalStructure infrastructure will take care of lazily initializing the template so that it knows the layout and size of the structure. We can also embed structures inside other structures without worrying about the initialization order. However, unless we compile the structure definition explicitly, then we will not get optimum performance since the fields will be accessed dynamically.

Structures can be compiled for optimum performance by sending them the #compileDefinition method, and this will automatically generate a set of correctly defined accessor methods (assuming that you've managed to code the #defineTemplate method correctly, and there are lots of examples in the base image to help you do this). Compiled structures do occupy more space in the image, so infrequently used structures are best left uncompiled.

If we hadn't required read and/or write access to all the fields of POINTL, then we could have restricted the access available to specific fields to read only, write only, or no access (filler).

Dynamic Field Access

Having defined an external structure class in this way, we'd have an uncompiled structure with dynamic field access. Uncompiled structures dynamically access their fields by overriding the #doesNotUnderstand: message which the VM sends to an object when it determines that the object does not implement a method for the message being sent. For example, if we send one of our POINTLs the message #x then the VM will create a Message containing the selector and arguments (in this case there are none), and pass that Message to the #doesNotUnderstand: method of ExternalStructure, which will attempt to access the field named by the selector. In this case it will be successful (the template contains an entry for #x), but otherwise the usual run-time error walkback will result.

Compiled Field Access

This dynamic mechanism is a space efficient way to implement an external structure, but there is considerable overhead inherent in the #doesNotUnderstand: mechanism in comparison to a successful message send. If you need higher performance, and we certainly do for POINTLs, then you can compile the structure by sending the #compileDefinition message to the class. This will automatically generate a set of accessor methods for the instances in the special category **compiled accessors**. Here is one of the compiled accessors generated for POINTL:

y: anObject
	"Set the receiver's y field to the value of anObject.
	Automatically generated set method - do not modify"

	bytes sdwordAtOffset: 4 put: anObject

As you can see, the generated method is very similar to the one you would expect to hand code to access the second 32-bit signed integer field of the structure, but with the difference that you haven't had to work out the offset (trivial in this case), or type so much.

External Field Types

Of course, there's more variety to structure fields than just 32-bit signed and unsigned integers, so Dolphin has a wide range of fields you can insert into the structure template (and you can add more as you need to by defining new classes). Currently the field description objects can all be found as subclasses of ExternalField (a subclass of AttributeDescriptor) and they break down into three broad groupings corresponding to the main immediate subclasses: EmbeddedFields, PointerFields and ScalarFields. There is also the FillerField type for padding.

Scalar Fields

Scalar fields are essentially the familiar base types of C and C++. We have chosen to use the Windows™ names for these types because they are independent of machine word size. So, for example, there are BOOLFields to represent boolean values, and WORDFields to represent unsigned 16-bit integers. These types are straightforward to use, and there are very many example in the system (e.g. LOGBRUSH).

Address Fields

The most quirky scalar field type is LPVOIDField which is used when you want to access the pointers stored in structures as pointers, rather than the object pointed at by the pointer!. More often you will probably not want to worry about the indirection, and have the structure automatically de-reference the pointer so that you can work with objects instead of addresses, and for this purpose you will want to use pointer fields.

An example can be found in the EDITSTREAM structure used with the rich edit control, which stores down a function pointer (into which we store an ExternalCallback):

defineFields
	"Define the fields of the Win32 EDITSTREAM structure.

		EDITSTREAM compileDefinition

	typedef struct _editstream { 
		DWORD dwCookie; 
		DWORD dwError; 
		EDITSTREAMCALLBACK pfnCallback; 
	} EDITSTREAM;"

	self 
		defineField: #dwCookie			type: DWORDField writeOnly;
		defineField: #dwError			type: DWORDField filler;
		defineField: #pfnCallback		type: LPVOIDField writeOnly

Pointer Fields

It is common to find pointers (addresses) embedded in structures where we are not particularly interested in the pointer itself, but the object at which it is pointing. We represent these with PointerField instances. When instantiating pointer fields we generally need to specify the type of object that the pointer is expected to be pointing at, so that the structure can de-reference the pointer and return a Smalltalk object of the correct type (if you do not want the pointer de-referenced, use the scalar field LPVOIDField). A common example is pointers to C strings, DOCINFO has a few of these:

defineFields
	"Define the fields of the Win32 DOCINFO structure.

	typedef struct {		// di  
		int		cbSize;
		LPCTSTR	lpszDocName;
		LPCTSTR	lpszOutput;
		LPCTSTR	lpszDatatype;	// Windows 95 only; ignored on Windows NT
		DWORD	fwType;		// Windows 95 only; ignored on Windows NT
		} DOCINFO;"

	self 
		defineField: #cbSize			type: DWORDField writeOnly;
		defineField: #lpszDocName		type: (PointerField to: String) beWriteOnly;
		defineField: #lpszOutput		type: (PointerField to: String) beWriteOnly;
		defineField: #lpszDatatype		type: (PointerField to: String) beWriteOnly;
		defineField: #fwType			type: DWORDField writeOnly

Pointers to Structures

Less commonly we may need to define a pointer field to another structure type, and we can use PointerFields to do this too. An example can be found in the CHOOSEFONT, where the lpLogFont field is described as a PointerField to LOGFONT::

defineFields
	"Define the fields of the Win32 CHOOSEFONT structure.
	This structure is used only for communication with the font dialog, so we don't compile it.
		CHOOSEFONT defineTemplate

	typedef struct	{
		DWORD		lStructSize;
		HWND			hwndOwner;
		HDC			hDC;
		LPLOGFONT		lpLogFont;
		INT			iPointSize;
		...
	} CHOOSEFONT;"

	self 
		defineField: #lStructSize		type: DWORDField writeOnly;
		defineField: #hwndOwner			type: DWORDField writeOnly;
		defineField: #hDC			type: DWORDField writeOnly;
		defineField: #lpLogFont			type: (PointerField to: LOGFONT);
		defineField: #iPointSize		type: DWORDField readOnly;
		...

If storing a pointer to a Smalltalk object into a structure you need to be careful to ensure that the lifetime of the object corresponds to the timespan over which the structure is used, and this may entail maintaining a reference to the Smalltalk object in an instance variable added to the ExternalStructure for that purpose. For example, the TV_ITEM class includes a field, pszText, which is a pointer to string data. When text is set into a TV_ITEM via the #text: message, the pszText pointer field is set to point to the bytes of the String, and the string is saved into the text instance variable of TV_ITEM.

Even more care over storing pointers to Smalltalk objects into structures passed to external functions is necessary if those function capture the pointer for future use (e.g. the binding of columns to buffers in ODBC). The Dolphin object memory may move objects around during garbage collection, so one cannot normally rely on objects having a fixed address. This has no impact on Smalltalk, because it does not rely on direct memory addresses, but it may upset external subsystems.

To simplify interfacing with external systems which capture addresses, the Dolphin VM includes a special memory allocator for byte objects which allocates from a conventional heap. Objects allocated from the "fixed" heap, are guaranteed to maintain the same address for their lifetime (or that of the session if shorter). ExternalStructures are, by default, allocated from the fixed heap, but you can allocate other byte objects, such as ByteArrays, from the heap using the #newFixed: message.

Embedded Fields

An alternative to embedding a pointer to an object in a structure is to embed the object itself. These types of fields are supported by EmbeddedField and its subclasses.

Embedded Structures

Occasionally you will come across structures which have other structures embedded inside them. For example, LV_FINDINFO contains an embedded POINTL as its pt field:

defineFields
	"Define the fields of the Win32 LV_FINDINFO structure.
		LV_FINDINFO compileDefinition
	"

	self
		defineField: #flags type: DWORDField new;
		defineField: #psz type:(PointerField to: String);
		defineField: #lParam type: DWORDField new;
		defineField: #pt type: (StructureField type: POINTL);
		defineField: #vkDirection type: DWORDField new;

Note that because the field type is a subclass of ExternalStructure, the answer will reference the original embedded data, and modifications will write directly back into that data.

It is possible to nest structures to an arbitrary depth, and ExternalStructure will automatically calculate the correct size, though of course circular nesting is not possible!

Embedded Arrays

Some structures contain embedded arrays. Embedded fields differ from scalar fields in that the size is not fixed, but is some multiple of the size of the array elements. A number of structures embed character arrays (sometimes called "strings"), and there is a specialised embedded field type for these, StringField. A familiar example is the face name (lfFaceName) in the LOGFONT structure:

defineFields
	"Define the Win32 LOGFONT structure.
		LOGFONT compileDefinition.
	"

	self
		defineField: #lfHeight type: SDWORDField new;
		defineField: #lfWidth type: SDWORDField new;
		...
		defineField: #lfPitchAndFamily type: BYTEField new;
		defineField: #lfFaceName type: (StringField length: LF_FACESIZE)

When the lfFaceName field is accessed by sending #lfFaceName, the answer will be a copy of the data in the structure, so modifications to it will not be reflected back into the structure. To update the lfFaceName field in the structure, it must be sent #lfFaceName: with an appropriate String as the parameter.

Embedded Arrays of Structures

These are relatively uncommon (except perhaps in graphics and mathematical systems), but can be defined using the StructureArrayField field type. For example:

defineFields
	"Define the fields of the hypothetical POLYLINE structure.

	struct {
		int		nPoints;
		POINTL		aPoints[100];
	} POLYLINE;"

	self 
		defineField: #nPoints	type: DWORDField new;
		defineField: #aPoints	type: (StructureArrayField type: POINTL length: 100)

Individual elements of the embedded array can be accessed using the normal Smalltalk syntax, for example:

	| pl r |
	pl := POLYLINE new.
	pl nPoints: 100.
	r := Random new.
	1 to: 100 do: [:i | (pl aPoints at: i) x: (r next * 640) truncated; y: (r next * 480) truncated].
	pl

Note that the accessed elements update the original structure in place when modified.

Occassionally you may have to define structures which contain embedded arrays, the contents of which are of no interest to you. In this case it is easiest to define these as being ByteArrays. For example the PAINTSTRUCT structure contains a reserved area of 32 bytes, and this structure is defined in the base Dolphin image as follows:

defineFields
	"Define the Win32 PAINTSTRUCT structure.
		PAINTSTRUCT compileDefinition
	"

	self
		defineField: #hdc type: DWORDField readOnly;
		defineField: #fErase type: BOOLField readOnly;
		defineField: #rcPaint type: (StructureField type: RECT) beReadOnly;
		defineField: #fRestore type: BOOLField filler;
		defineField: #fIncUpdate type: BOOLField filler;
		defineField: #rgbReserved type: (ArrayField type: ByteArray length: 32) beFiller

Note that it is necessary to specify the size in bytes as the length.

Restricting Field Access

When defining a template, you may not be interest in certain fields of the structure at all, or you may only require read or write access to particular fields. The external field objects have a set of attributes which can be set to record this information. For example, we currently retrieve window placement information primarily so that we can reset it, so we define WINDOWPLACEMENT as follows:

defineFields
	"Define the layout of the Win32 WINDOWPLACEMENT structure. 
	Currently to avoid wasting space, the structure is defined as mostly filler 
	fields.

		WINDOWPLACEMENT compileDefinition

		typedef struct tagWINDOWPLACEMENT {
			UINT  length;
			UINT  flags;
			UINT  showCmd;
			POINT ptMinPosition;
			POINT ptMaxPosition;
			RECT  rcNormalPosition;
		} WINDOWPLACEMENT;"

	self 
		defineField: #length type: DWORDField writeOnly;
		defineField: #flags type: DWORDField filler;
		defineField: #showCmd type: DWORDField new;
		defineField: #ptMinPosition type: (StructureField type: POINTL) beFiller;
		defineField: #ptMaxPosition type: (StructureField type: POINTL) beFiller;
		defineField: #rcNormalPosition type: (StructureField type: RECT)

If we compile this structure, then we will not get compiled accessors for the filler fields. If the structure is not compiled or uncompiled, then MessageNotUnderstood exception would be raised if we attempted to get or set those fields.

WINDOWPLACEMENT also includes a write only field, as it employs the common Win32™ practice of using the first structure field to hold the length of the structure for error checking purposes. We must set the length field to satisfy Windows™ that we've passed the appropriate structure, but we never need to read it. When compiled we would not get an automatically generated read accessor for the length field, and if uncompiled attempting to read it would generate a MessageNotUnderstood exception.

We can mark fields as read only in a similar way; examples can be found in DRAWITEMSTRUCT:

defineFields
	"Define the fields of the Win32 DRAWITEMSTRUCT structure.

		DRAWITEMSTRUCT compileDefinition

	typedef struct tagDRAWITEMSTRUCT   // dis 
		UINT  CtlType; 
		UINT  CtlID; 
		UINT  itemID; 
		UINT  itemAction; 
		UINT  itemState; 
		HWND  hwndItem; 
		HDC   hDC; 
		RECT  rcItem; 
		DWORD itemData; 
	 DRAWITEMSTRUCT; "

	self 
		defineField: #ctlType type: DWORDField readOnly;
		defineField: #ctlID type: DWORDField readOnly;
		defineField: #itemID type: DWORDField readOnly;
		defineField: #itemAction type: DWORDField readOnly;
		defineField: #itemState type: DWORDField readOnly;
		defineField: #hwndItem type: DWORDField readOnly;
		defineField: #hDC type: DWORDField readOnly;
		defineField: #rcItem type: (StructureField type: RECT) beReadOnly;
		defineField: #itemData type: DWORDField readOnly

Note that this structure includes another embedded ExternalStructure. When #rcItem is sent to a DRAWITEMSTRUCT the answer will be a pointer instance of RECT which references the original data, i.e. modificiations to the RECT will update the DRAWITEMSTRUCT in place. An identical but independed RECT could be obtained by sending the original answer from #rcItem the #copy message.

External Structure Limitiations

It is not currently possible to define embedded multi-dimensional arrays directly. The workaround is to define classes to represent dimensions, embedding further dimensions in these. A two dimensional array, for example, would require one ExternalStructure class, call it Dimension2, containing a StructureArrayField of the required type and length, and another ExternalStructure class for the outer dimension, containing a StructureArrayField of the type Dimension2 and the required length.
Variable length embedded array fields are not directly supported at present. The workaround is to define manually instantiate the correct size of StructureArray at the base address of the variable length array in the structure.
Structure packing algorithms other than the most simplistic (i.e. no packing) are not currently supported. This is not normally a problem, since most structures are carefully defined to avoid packing, but the workaround is to add additional filler fields to pack out the real fields of the structure to the appropriate boundaries.
External structures must currently be completely compiled, or completely uncompiled. The filler mechanism allows one to define fields which are completely ignored, but it is not possible to match compiled and uncompiled accessors in order to get the best match of performance and space efficiency.
Accessors for bit fields will have to be hand coded (e.g. _FPIEEE_RECORD). These are rather uncommon, so automated support is considered a low priority.
At present there is no parser to automatically generate external buffer template definitions from C header files. Of course, you can always write one...

External Memory Management

On occassion it is necessary to take responsibility for managing the lifetime of objects allocated externally. Such external resources are not, by default,automatically managed by the Dolphin garbage collector, and explicit freeing is required. However, managing the lifetime of such objects can automatically initiated by the garbage collector if we make use of Weak References and Finalization.

Normally external memory blocks (e.g. containing externally allocated structures) are referenced via instances of ExternalAddress. However, external memory blocks which are known to have been allocated from the standard Win32 process heap, are best referenced via instances of the subclass ExternalMemory, which uses finalization to automatically free the memory block back to the heap when the Smalltalk object is garbage collected. An ExternalStructure can be created on a memory block referenced via an ExternalMemory instance by sending the appropriate subclass the #fromAddress: instantiator with the ExternalMemory as the argument.

You can construct an empty ExternalStructure subinstance ready to point at an externally allocated block (which will presumably be filled in by some external function call) by sending the message #newHeapPointer to the appropriate ExternalStructure subclass.

Certain add-in packages (such as OLE COM) add ExternalMemory subclasses for managing external memory blocks from heaps other than the default process heap (e.g. COMTaskMemory), and you can easily add your own subclasses if required.

External Callbacks

Dolphin's external callback facilities are intended to support situations where an external interface expects a function pointer which it then "calls back" (for example a window procedure) They should not be confused with VM callbacks, which are a form of interrupt from the VM to run Smalltalk code on its behalf.

The fundamental requirement is to be able to provide the address of a function which can be directly called by some external library using one of the standard calling conventions (stdcall or cdecl). We cannot directly provide the address of a Smalltalk CompiledMethod because:

Dolphin CompiledMethods do not contain raw machine code.
Smalltalk does not have the concept of 'static' methods (there must always be a receiver).
The memory address of a CompiledMethod is not fixed and may change during a garbage collect.
Synchronisation with the activities of the VM is required.
Conversion of arguments from raw data on the machine stack to Smalltalk objects on a Process stack is required.
myriad other reasons.

Consequently we need to wrap each callback in an object which includes:

A receiver for the callback (and, optionally, any other context we might want to use in the callback)
A evaluable action to perform when the callback is received.
A description of the types of the arguments to the callback so that they may be converted to Smalltalk objects. For consistency this argument conversion is essentially the same as that performed by the external call interface primitives for converting return values from external functions to Smalltalk objects.
A machine code thunk (at a fixed, immovable, address) which calls a generic entry point in the VM passing it the address of the arguments in the stack and the id of the callback object. The VM can then perform one of its callbacks to pass these details into Smalltalk so that the external callback can be handled.

This class of objects is called, not surprisingly, ExternalCallback. You typically create an ExternalCallback by supplying a block to be evaluated, and the types of the arguments expected. For example here is a method from Font class:

fonts: aString do: operation
	"Enumerate the fonts in a specified font family that are available on the receiver's device.
	The triadic valuable argument, operation, is passed the LOGFONT, TEXTMETRIC and font type as its
	three arguments, and should answer true to continue the enueration, false to terminate it (it must
	not contain a ^-return).

		int CALLBACK EnumFontsProc(
			lplf lplf,	// pointer to logical-font data 
			lptm lptm,	// pointer to physical-font data 
			DWORD dwType,	// font type 
			LPARAM lpData 	// pointer to application-defined data  
		);"

	| callback answer |
	callback := ExternalCallback 
		block: [ :lplf :lptm :dwType :lpData | operation value: lplf value: lptm value: dwType ]
		argumentTypes: 'LOGFONT* lpvoid dword dword'.
	answer := GDILibrary default
		enumFonts: self asParameter
		lpFaceName: aString
		lpFontFunc: callback asParameter
		lParam: 0.
	callback free.
	^answer

You can test out this callback by evaluating the expression:

	View desktop canvas fonts: nil do: [:lf :tm :type | Transcript print: lf faceName; cr. true]

Which will print the names of all available screen fonts to the Transcript.

You can use ^-returns in callback blocks, but the normal block far return semantics will be obeyed, so that the value will not be returned to the external invoker of the callback, but to the block's home's sender (which is probably not what you want).

Callbacks can be either synchronous (the callback is used immediately the relevant external library call is made, and there are no more uses of it after that call has returned - e.g. EnumFonts()), or asynchronous (the callback is registered for use whenever it is required, until the registration is revoked). Synchronous callbacks are easier to deal with since the lifetime of the callback object is known. Synchronous callbacks can be explicitly freed. In the case of asynchronous callbacks you will need to maintain a reference (generally in an instance variable) to the ExternalCallback while it is still registered, in order to prevent it being garbage collected. Asynchronous callbacks are normally implicitly freed by finalization when no longer required.

Callback blocks can be debugged in the normal way by inserting a #halt method into the block, or, if the callback is synchronous, by stepping into the external function which registers the callback.

The conversion of arguments supplied by callbacks into the Smalltalk objects passed to the callback methods is performed by a primitive (BlockClosure>>valueWithArgumentsAt:descriptor:) for performance reasons, and for consistency with the rest of the external interfacing support in Dolphin (the conversion applied are the same as those used to create return values from external calls), BUT the VM routes callbacks into Smalltalk so that this mechanism can be modified if required.

The callback interface which you are using may provide a means of passing back a user supplied parameter (usually a 32-bit value) in order to provide "closure". However, as callbacks are implemented with blocks, you can capture whatever closure is required in the block, and can normally ignore the "user data" argument.

The method based callbacks used in beta-1 are still available in beta-2, but may be removed in a future release. We recommend the use of the simpler and more powerful block callbacks.

External Callback guides you through the steps needed to implement your own callbacks, and provides some useful tips too.

It is also possible to implement virtual callbacks in Dolphin, and this is how OLE COM is implemented.