TGN is a thesaurus, compliant with ISO and NISO standards for thesaurus construction; it contains hierarchical, equivalence, and associative relationships. Note that TGN is not a GIS (Geographic Information System). While many records in TGN include coordinates, these coordinates are approximate and are intended for reference only.
The focus of each TGN record is a place. There are around 912,000 places in the TGN. In the database, each place record (also called a subject) is identified by a unique numeric ID. Linked to the record for the place are names, the place's parent or position in the hierarchy, other relationships, geographic coordinates, notes, sources for the data, and place types, which are terms describing the role of the place (e.g., inhabited place and state capital). The temporal coverage of the TGN ranges from prehistory to the present and the scope is global.
More about scope and structure: The TGN is a hierarchical database; its trees branch from a root called Top of the TGN hierarchies (Subject_ID: 1000000). Currently most of the TGN data is located under the facet World. Under the World, the places are generally arranged in hierarchies representing the current political and physical world, although some historical nations and empires are also included. There may be multiple broader contexts, making the TGN polyhierarchical.
Coordinates Geographic coordinates indicating the position of the place, expressed in degrees/minutes and decimal fractions of degrees. Latitude (Lat.) is the angular distance north or south of the equator, measured along a meridian. Longitude (Long.) is the angular distance east or west of the Prime Meridian at Greenwich, England. Bounding coordinates and elevation may also be included (as in the example for Great Lakes Region below). While many records in TGN include coordinates, these coordinates are approximate and are intended for reference.
Geographic coordinates in TGN typically represent a single point, corresponding to a point in or near the center of the inhabited place, political entity, or physical feature. For linear features such as rivers, the point represents the source of the feature.
Names Names and appellations referring to the place, including a preferred name and variant names. All names in a record (i.e., all names linked by a single Subject ID) are considered equivalents (i.e., synonyms). A TGN record may contain the vernacular and English names of the place, variant names in other languages, and historical names. One name is flagged as the preferred name, which is the indexing form of the name most often found in scholarly or authoritative publications.
While any number of fields may be included in datasets submitted for entry into the BGSD, only a subset are uploaded. Extraneous fields not necessary for the purposes of the BGSD are deleted and, in most cases, additional fields are added during processing of the dataset. There are three main fields in the BGSD upload file; "name" which holds the feature name, "term" which holds the feature type, and "geometry" which holds the spatial information. In addition, "time_period_id" is used to indicate whether the feature name is current, former, proposed or historical, and "related_name" is used to hold a concatenated list of alternative names for the feature. Two fields are used to record the date of processing for a record set ("entryDate") or the date selected records were reprocessed ("modificationDate"). Another field relates each record to the dataset it came from ("g_coll_name") and another to the metadata about that dataset ("g_coll_note").
Importing data into the Biogeomancer Spatial Database involves a number of checks, modifications, and transformations. If metadata is not provided it must be created de novo using whatever information is available. When the input file is a shapefile, it must first be checked for projection. The BGSD does not use projected data, rather, data are stored with latitude and longitude as coordinates, using the WGS84 (World Geodetic Survey 1984) spheroid for the horizontal datum. This provides for an accurate and uniform storage of feature coordinates for the world and it is sometimes known as a Geographic Projection. If the input shapefile is in any other projection (e.g. UTM, Lambert Conformal Conic) it must be reprojected or converted to the WGS84 projection. If the feature is a polygon or a line, rather than a point, its geometry is then checked and, if necessary repaired. The problems that could arise in the geometry of a feature include short segments, null geometries, incorrect ring orderings, incorrect segment orientations, self intersections, unclosed rings, and/or empty parts. These problems are repaired with a script in ArcGIS software ver. 9.1 (ESRI, Redlands, CA, USA). If the input file is text, it must be first converted to a GIS layer using spatial information contained in the file (X, Y) and its associated metadata.
The input records are assigned feature types and all data imported into an ArcGIS Personal Geodatabase. These may have a typing system included, but in almost all cases this will differ from the one used by the BGSD, which itself is based on that of the Alexandria Digital Library. During processing, a feature type field, "term", is added to the dataset. If an incoming dataset is heterogeneous and has a field for feature type, a cross-reference table is created to convert from the dataset feature type lexicon to that of the BGSD. The use of feature types is integral to the proper functioning of the Biogeomancer Spatial Lookup Module, because it allows default extent for the feature type to be used in cases where an extent for the specific feature is not available. The feature types are useful for indexing and assigning relative uncertainty measures where geographic feature extents are unknown.
Where possible, we include the original metadata information for the dataset.