The size of the header record was set to occupy one or two physical sectors on a hard drive. A hard drive sector is 512 bytes. Conveniently, CharterHouse is able to use other media (such as floppy disks), but for such media, the header may occupy multiple sectors.
In a newly created or freshly reindexed CSAM file, data records immediately follow the header record and the index records follow that. As records are added, data and index records become interspersed; the data first, followed by the indexes. There may be empty space in between the data record(s) and the indexes, especially where record lengths are short.
An example may clarify. Let's say you want a listing of all sales invoices for one specific customer for a calendar year.
In the absence of indexes, the entire invoice file has to be read and the invoices for the specified customer found, then the invoice date must be examined to make sure the invoice is dated within the calendar year you specified. By adding an index that consists of the customer ID plus the invoice date, the database engine can skip records for other customers and also skip those invoice records for the specified customer where the date falls outside the specified range. Skipping all that unnecessary file access and going straight to the desired information hugely speeds up the process of obtaining the information you wanted.
A pointer is a portion of an index record that specifies an offset into the data file. This offset may be the location of a data record or the location of an index record. Offset means "go this far into the file and start reading there". Every index record contains keys wherein pointers are set. A key is the portion of an index record that specifies what data is being refered to, and the pointer value containing the offset to where that data is located.
When a disk drive is read (or written, but for brevity "read" will be used where practical), the minimum amount of data that can be accessed is one sector. Most operating systems access several sectors at once so that the disk drive's head (the movable part that reads data from the media) does not have to move or "seek" too frequently. As much information is transferred as possible while the disk spins under the head. Therefore, "an index record" actually contains a lot of discreete data, which are individual indexes.
The information in an index record includes where within the file the appropriate record is located. It also points to the offset of the previous index record and to the offset of the subsequent index record so that the indexes can be scanned rapidly. It is even possible for the engine to "guess" which index record might contain the desired information so that the entire index structure does not have to be read in order to find the correct index record. By skipping inappropriate indexes, the time spent finding the necessary one is reduced. And, if the guess misses, the backward and forward pointers in the record obtained can be used to improve the accuracy of the next guess.
Because each database file is different, it is not practical to describe the exact structure of an individual index record. In general, an index record contains portions of the data and pointers to the areas within a data record that the program needs in order to facilitate a search. A simple example is the REVENUE.AR file. This file begins with an 8 byte account number, followed by a 20 byte description and then daily and month-to-date amounts. There are two indexes for this file, the first of which is the account number and the second of which is a combination of the description plus the account number. By creating an index on the description, the REVENUE file may be read sequentially (aphabetically) by description using the second key, or it can be read sequentially (numerically) by account number using the first key. In addition, any particular record in the file can be quickly located by finding the correct account number or the correct description. "Quickly located" means that the record is accessed by reading the file at the correct offset rather than sequentially. The engine finds the correct index and then uses the pointer it contains to go straight to the offset in the file where that record is.
Each data record begins with a unique character whose decimal value is 170. The length of a data record is specified, but the unique character assists in recovery if problems occur in the database. Its offset is also one of the pointers contained in an index record. The engine can tell that corruption has occurred if the byte pointed to contains the wrong value. The special byte is not included in the specified data record length; it is transparent except to the engine.
This portion (Problems) is under construction, last
modified 19 July 2011.
Note that CharterHouse "crashes" when the engine finds a problem. This is done deliberately to prevent you from further corrupting the database. It is unwise to try again without first fixing the problem that caused the program to abort.
Just as your lawnmower may occasionally missfire, so can your computer occasionally cause a problem within a data file.
Because index records must be added in order for the database to work, you are cautioned not to interfere with the reindex process. Aborting a reindex could leave data records unmoved, or prevent the addition of the index records, resulting in a trashed database file.
A sparse file is a type of computer file that attempts to use file system space more efficiently when blocks allocated to the file are mostly empty.
Zwi is not any known language for "key", but that is what it means in error messages.
CharterHouse was sold to Tamlin Software Developers; now the programs mentioned below are no longer available for sale. At this time Charter is still being supported, so you may want to discuss your needs with a support technician.
Many data files can be listed, where the output can be to one of Printer, Display or File. If you specify File as the output, the program expects a name of up to 8 characters. The resulting file will have ".RPT" added as an extension and FILENAME.RPT will be created in the same directory as the Charter programs. These files will contain "headers", by which we mean the listing name, page number and so forth. If you edit FILENAME.RPT with a text editor and strip out the Page 1 header, you can import the remainder of the file using Microsoft Excel by selecting DATA and then GET EXTERNAL DATA. The text you are importing is "fixed width", and Excel allows you to define the width of each field. You can also eliminate columns if you wish.
File listings are difficult to import into Excel when the necessary information spans more than one line because each line becomes a line in Excel. In addition, the field widths of any second or subsequent lines normally differ from the widths you present as the first line to be imported into Excel. We are not aware of any method for correctly importing reports where the listing spans more than one line per record. If you can find a text editor that is capable of it, you might be able to concatenate or join two or more lines into a single line. We are not aware of an editor with such capability.
Excel has a limit on the number of lines, so large files must be split into parts where the number of records exceeds Excel's capacity. Most of the time you will recognize that the number of lines is excessive if you edit using Notepad but Notepad tells you the file is too large and suggests that you use Wordpad instead.Data Link CharterHouse sold a program named Data Link that extracts most of the database into one of several different formats. The supply of Data Link manuals has been exhausted and Data Link is no longer a supported product, but you might be able to obtain help with it on an hourly fee basis. If you own Data Link, you should learn how to run it and export your data into a DIF format file. "DIF" is an acronym for Data Interchange Format and DIF format files can be opened by Excel. See the Wikipedia entry for more information about DIF format files. Note that the spreadsheets that can open them have limits on the number of lines of data, so you may need to adjust your Data Link setup to extract portions of a large file into two or more output files.
One excellent hint for Data Link is to tell it to output to the display. When the displayed information is what you need, you can then edit the setup and change the output from display to DIF format.
Data Link is able to extract information from more than one database file at a time, but the files that it accesses when you select your source cannot be altered by us mere humans, so there are limitations you may not be able to overcome. For example, it was desired to extract the invoice number and the corresponding line items from the invoice history file, but Data Link cannot "see" the invoice number because the invoice number is in the invoice header history rather than the line item history file, and only one or the other of header or line item can be used at a time.
Aside: Most people say "sequel", but the proper pronunciation is S Q L. SQL is an acronym for Structured Query Language.