CSAM Explained

Engine

"Engine" or "database engine" is used herein to describe the programming code that reads or writes records in the CSAM database.

Name

CSAM ("SEE Sam") is an acronym for "C programming language Sequential Access Method".  It is fortunate that "C" also is the first character in CharterHouse, because CSAM is a proprietary database.

Database

Basic Structure

ISAM ("EYE Sam") is an acronym for "Indexed Sequential Access Method".  ISAM files, of which CSAM is one, contain both data and index records in the same physical file.  In a CSAM database file, there is a one or two sector record at the very beginning of the file that tells the program, among other things, the size of each data record, the number of indexes applicable to each record, where the first index record is and where the first data record is.

The size of the header record was set to occupy one or two physical sectors on a hard drive.  A hard drive sector is 512 bytes.  Conveniently, CharterHouse is able to use other media (such as floppy disks), but for such media, the header may occupy multiple sectors.

In a newly created or freshly reindexed CSAM file, data records immediately follow the header record and the index records follow that.  As records are added, data and index records become interspersed; the data first, followed by the indexes.  There may be empty space in between the data record(s) and the indexes, especially where record lengths are short.

Indexes

In order to be able to find information in a file quickly and efficiently, a CSAM file contains index records.  These point directly to where in the file the desired data exists.

An example may clarify.  Let's say you want a listing of all sales invoices for one specific customer for a calendar year.

In the absence of indexes, the entire invoice file has to be read and the invoices for the specified customer found, then the invoice date must be examined to make sure the invoice is dated within the calendar year you specified.  By adding an index that consists of the customer ID plus the invoice date, the database engine can skip records for other customers and also skip those invoice records for the specified customer where the date falls outside the specified range.  Skipping all that unnecessary file access and going straight to the desired information hugely speeds up the process of obtaining the information you wanted.

A pointer is a portion of an index record that specifies an offset into the data file.  This offset may be the location of a data record or the location of an index record.  Offset means "go this far into the file and start reading there".  Every index record contains keys wherein pointers are set.  A key is the portion of an index record that specifies what data is being refered to, and the pointer value containing the offset to where that data is located.

When a disk drive is read (or written, but for brevity "read" will be used where practical), the minimum amount of data that can be accessed is one sector.  Most operating systems access several sectors at once so that the disk drive's head (the movable part that reads data from the media) does not have to move or "seek" too frequently.  As much information is transferred as possible while the disk spins under the head.  Therefore, "an index record" actually contains a lot of discreete data, which are individual indexes.

The information in an index record includes where within the file the appropriate record is located.  It also points to the offset of the previous index record and to the offset of the subsequent index record so that the indexes can be scanned rapidly.  It is even possible for the engine to "guess" which index record might contain the desired information so that the entire index structure does not have to be read in order to find the correct index record.  By skipping inappropriate indexes, the time spent finding the necessary one is reduced.  And, if the guess misses, the backward and forward pointers in the record obtained can be used to improve the accuracy of the next guess.

Because each database file is different, it is not practical to describe the exact structure of an individual index record.  In general, an index record contains portions of the data and pointers to the areas within a data record that the program needs in order to facilitate a search.  A simple example is the REVENUE.AR file.  This file begins with an 8 byte account number, followed by a 20 byte description and then daily and month-to-date amounts.  There are two indexes for this file, the first of which is the account number and the second of which is a combination of the description plus the account number.  By creating an index on the description, the REVENUE file may be read sequentially (aphabetically) by description using the second key, or it can be read sequentially (numerically) by account number using the first key.  In addition, any particular record in the file can be quickly located by finding the correct account number or the correct description.  "Quickly located" means that the record is accessed by reading the file at the correct offset rather than sequentially.  The engine finds the correct index and then uses the pointer it contains to go straight to the offset in the file where that record is.

Primary key

A primary key is used to prevent duplication.  For example, there can be only one account number for each REVENUE record, and only one customer ID for each CUSTOMER record.  Were it not for this, it would be possible to have two different customers whose names (and all other information) differ, but that have the same ID.  That's not good, so such duplication is prevented.Since this is not documented, we say here that the AR invoice number is not a primary key.  CharterHouse allows you to use the same invoice number any number of times.

Data

Each data record begins with a unique character whose decimal value is 170.  The length of a data record is specified, but the unique character assists in recovery if problems occur in the database.  Its offset is also one of the pointers contained in an index record.  The engine can tell that corruption has occurred if the byte pointed to contains the wrong value.  The special byte is not included in the specified data record length; it is transparent except to the engine.

Problems

This portion (Problems) is under construction, last modified 19 July 2011.

Note that CharterHouse "crashes" when the engine finds a problem.  This is done deliberately to prevent you from further corrupting the database.  It is unwise to try again without first fixing the problem that caused the program to abort.

Just as your lawnmower may occasionally missfire, so can your computer occasionally cause a problem within a data file.

Reindex

Reindexing is done in two passes.  During "pass one", the data records are found by looking for the unique value (170) that specifies the beginning of each record, and this data is moved to the beginning of the file, just past the header record.  Subsequent data is appended after the previous data record.  As data records are moved, the areas of the file that previously contained index records are overwritten.  Thus, pass one restores the structure of the file so the data records are first and contiguous in the database.  When pass one completes, "pass two" appends the index records.  Note that between pass one and pass two, there is a noticeable delay, which is the result of flushing the buffers to disk so that all data has been written before any index records are added.

Because index records must be added in order for the database to work, you are cautioned not to interfere with the reindex process.  Aborting a reindex could leave data records unmoved, or prevent the addition of the index records, resulting in a trashed database file.

A sparse file is a type of computer file that attempts to use file system space more efficiently when blocks allocated to the file are mostly empty.

Zwi is not any known language for "key", but that is what it means in error messages.

Extraction of data

Because CSAM is proprietary, these are the only alternatives for extracting data from Charter.

CharterHouse was sold to Tamlin Software Developers; now the programs mentioned below are no longer available for sale.  At this time Charter is still being supported, so you may want to discuss your needs with a support technician.

File listings

This also applies to reports generated by Charter, but read the remainder of this section in order to understand the difficulties in dealing with reports where the desired information is presented on more than one line.

Many data files can be listed, where the output can be to one of Printer, Display or File.  If you specify File as the output, the program expects a name of up to 8 characters.  The resulting file will have ".RPT" added as an extension and FILENAME.RPT will be created in the same directory as the Charter programs.  These files will contain "headers", by which we mean the listing name, page number and so forth.  If you edit FILENAME.RPT with a text editor and strip out the Page 1 header, you can import the remainder of the file using Microsoft Excel by selecting DATA and then GET EXTERNAL DATA.  The text you are importing is "fixed width", and Excel allows you to define the width of each field.  You can also eliminate columns if you wish.

File listings are difficult to import into Excel when the necessary information spans more than one line because each line becomes a line in Excel.  In addition, the field widths of any second or subsequent lines normally differ from the widths you present as the first line to be imported into Excel.  We are not aware of any method for correctly importing reports where the listing spans more than one line per record.  If you can find a text editor that is capable of it, you might be able to concatenate or join two or more lines into a single line.  We are not aware of an editor with such capability.

Excel has a limit on the number of lines, so large files must be split into parts where the number of records exceeds Excel's capacity.  Most of the time you will recognize that the number of lines is excessive if you edit using Notepad but Notepad tells you the file is too large and suggests that you use Wordpad instead.Data Link CharterHouse sold a program named Data Link that extracts most of the database into one of several different formats.  The supply of Data Link manuals has been exhausted and Data Link is no longer a supported product, but you might be able to obtain help with it on an hourly fee basis.  If you own Data Link, you should learn how to run it and export your data into a DIF format file.  "DIF" is an acronym for Data Interchange Format and DIF format files can be opened by Excel.  See the Wikipedia entry for more information about DIF format files.  Note that the spreadsheets that can open them have limits on the number of lines of data, so you may need to adjust your Data Link setup to extract portions of a large file into two or more output files.

One excellent hint for Data Link is to tell it to output to the display.  When the displayed information is what you need, you can then edit the setup and change the output from display to DIF format.

Data Link is able to extract information from more than one database file at a time, but the files that it accesses when you select your source cannot be altered by us mere humans, so there are limitations you may not be able to overcome.  For example, it was desired to extract the invoice number and the corresponding line items from the invoice history file, but Data Link cannot "see" the invoice number because the invoice number is in the invoice header history rather than the line item history file, and only one or the other of header or line item can be used at a time.

CharterHouse Data Export

CharterHouse sold a program named CharterHouse Data Export that extracts all modules except Payroll into a Microsoft Access database (.MDB file).  If you can find a source for it, the program is expensive and there never was a manual for it.  Using the MDB file requires the ability to work with SQL.  However, in this author's opinion, it is the most useful of the extraction methods available, though rarely worth the price.

Aside: Most people say "sequel", but the proper pronunciation is S Q L.  SQL is an acronym for Structured Query Language.


Return to Help
Copyright © JK