System identifiers

There are two kinds of system identifier: formal system identifiers and simple system identifiers. A system identifier that does not start with < will always be interpreted as a simple system identifier. A simple system identifier will always be interpreted either as a filename or as a URL.

Formal system identifiers

Formal system identifiers are based on the System Identifier facility defined in ISO/IEC 10744 (HyTime) Technical Corrigendum 1, Annex D. A system identifier that is a formal system identifier consists of a sequence of one or more storage object specifications. The objects specified by the storage object specifications are concatenated to form the entity. A storage object specification consists of an SGML start-tag in the reference concrete syntax followed by character data content. The generic identifier of the start-tag is the name of a storage manager. The content is a storage object identifier which identifies the storage object in a manner dependent on the storage manager. The start-tag can also specify attributes giving additional information about the storage object. Numeric character references are recognized in storage object identifiers and attribute value literals in the start-tag. Record ends are ignored in the storage object identifier as with SGML. A system identifier will be interpreted as a formal system identifier if it starts with a < followed by a storage manager name, followed by either > or white-space; otherwise it will be interpreted as a simple system identifier. A storage object identifier extends until the end of the system identifier or until the first occurrence of < followed by a storage manager name, followed by either > or white-space.

The following storage managers are available:

osfile
The storage object identifier is a filename. If the filename is relative it is resolved using a base filename. Normally the base filename is the name of the file in which the storage object identifier was specified, but this can be changed using the base attribute. The filename will be searched for first in the directory of the base filename. If it is not found there, then it will be searched for in directories specified with the -D option in the order in which they were specified on the command line, and then in the list of directories specified by the environment variable SGML_SEARCH_PATH. The list is separated by colons under Unix and by semi-colons under MSDOS.
osfd
The storage object identifier is an integer specifying a file descriptor. Thus a system identifier of <osfd>0 will refer to the standard input.
url
The storage object identifier is a URL. Only the http scheme is currently supported and not on all systems.
neutral
The storage manager is the storage manager of storage object in which the system identifier was specified (the underlying storage manager). However if the underlying storage manager does not support named storage objects (ie it is osfd), then the storage manager will be osfile. The storage object identifier is treated as a relative, hierarchical name separated by slashes (/) and will be transformed as appropriate for the underlying storage manager.
literal
The bit combinations of the storage object identifier are the contents of the storage object.

The following attributes are supported:

records
This describes how records are delimited in the storage object:
cr
Records are terminated by a carriage return.
lf
Records are terminated by a line feed.
crlf
Records are terminated by a carriage return followed by a line feed.
find
Records are terminated by whichever of cr, lf or crlf is first encountered in the storage object.
asis
No recognition of records is performed.

The default is find except for NDATA entities for which the default is asis. This attribute is not applicable to the literal storage manager.

When records are recognized in a storage object, a record start is inserted at the beginning of each record, and a record end at the end of each record. If there is a partial record (a record that doesn't end with the record terminator) at the end of the entity, then a record start will be inserted before it but no record end will be inserted after it.

The attribute name and = can be omitted for this attribute.

zapeof
This specifies whether a Control-Z character that occurs as the final byte in the storage object should be stripped. The following values are allowed:
zapeof
A final Control-Z should be stripped.
nozapeof
A final Control-Z should not be stripped.

The default is zapeof except for NDATA entities, entities declared in storage objects with zapeof=nozapeof and storage objects with records=asis. This attribute is not applicable to the literal storage manager.

The attribute name and = can be omitted for this attribute.

encoding
The encoding attribute specifies the encoding of the storage object. This attribute is used when the encoding is independent of the document character set. The value must be the name of an encoding. This attribute is not applicable to the literal storage manager.
bctf
The BCTF attribute specifies that the encoding of the storage object. This attribute is used when the encoding is document character set dependent. The value must be the name of a BCTF. This attribute is not applicable to the literal storage manager.
tracking
This specifies whether line boundaries should be tracked for this object: a value of track specifies that they should; a value of notrack specifies that they should not. The default value is track. Keeping track of where line boundaries occur in a storage object requires approximately one byte of storage per line and it may be desirable to disable this for very large storage objects.

The attribute name and = can be omitted for this attribute.

base
When the storage object identifier specified in the content of the storage object specification is relative, this specifies the base storage object identifier relative to which that storage object identifier should be resolved. When not specified a storage object identifier is interpreted relative to the storage object in which it is specified, provided that this has the same storage manager. This applies both to system identifiers specified in SGML documents and to system identifiers specified in the catalog entry files.
smcrd
The value is a single character that will be recognized in storage object identifiers (both in the content of storage object specifications and in the value of base attributes) as a storage manager character reference delimiter when followed by a digit. A storage manager character reference is like an SGML numeric character reference except that the number is interpreted as a character number in the inherent character set of the storage manager rather than the document character set. The default is for no character to be recognized as a storage manager character reference delimiter. Numeric character references cannot be used to prevent recognition of storage manager character reference delimiters.
fold
This applies only to the neutral storage manager. It specifies whether the storage object identifier should be folded to the customary case of the underlying storage manager if storage object identifiers for the underlying storage manager are case sensitive. The following values are allowed:
fold
The storage object identifier will be folded.
nofold
The storage object identifier will not be folded.

The default value is fold. The attribute name and = can be omitted for this attribute.

For example, on Unix filenames are case-sensitive and the customary case is lower-case. So if the underlying storage manager were osfile and the system was a Unix system, then <neutral>FOO.SGM would be equivalent to <osfile>foo.sgm.

Simple system identfiers

A simple system identifier is interpreted as a storage object identifier with a storage manager that depends on where the system identifier was specified: if it was specified in a storage object whose storage manager was url or if the system identifier looks like an absolute URL in a supported scheme, the storage manager will be url; otherwise the storage manager will be osfile. The storage manager attributes are defaulted as for a formal system identifier. Numeric character references are not recognized in simple system identifiers.

James Clark
jjc@jclark.com