The development of models centers around three main tasks: 1) classification of independent variables; 2) classification of dependent variables; and 3) expression of the relationship between them. For example, the dependent variable could be site presence/absence and the independent variables the environmental variables. In an archaeologically unknown region, selection of variables for a model predicting settlement locations is based on a consideration of which characteristics of the environment might have been important to the people using that environment. Such evaluations may be based on known relationships between site locations and environmental variables in ethnographic or archaeological contexts thought to be similar to the area of study (Parker 1985:184).
Since different cultural groups interact with each other and their environment in different ways, the critical independent and dependent variables and their relationship can vary widely from cultural system to cultural system. The goal of predictive modelling is to produce reasonably accurate representations of selected interrelationships for particular cultural systems. A successful model, or series of models, organizes information about archaeological sites - their function, location, and cultural affiliation - into a series of statements about human behaviour. Under controlled conditions, these statements can be applied to unknown areas to provide predictions concerning archaeological resources located in these areas.
The goal is to correctly identify important aspects of the natural or social environment that had influenced the location of human activities, and to interpret the archaeological record as the result of a set of functional, temporal, spatial, and behavioral responses to a varied environment. "Although governed to some extent by the demonstrably regular and consistent rules that apply to all living systems, human behaviour is organized into cultural systems, which exert additional influences on that behaviour beyond those of natural forces. There is good reason to believe that site locations cannot, in general, be fully predicted from environmental variables alone" (Kincaid 1988:551).
The first step in developing a predictive model of archaeological site location for a specific region is to amass the available data. Four basic sources of data are commonly used: historical documents, ethnographic documents/research, archaeological data, and environmental data (Altschul 1988:78). The quality of data on previously recorded archaeological sites must be carefully reviewed for location accuracy and completeness (Kincaid 1988:556).
To formulate a good initial model, the ethnohistory of the study area needs to be summarized. This will provide information on important site settlement modelling criteria, such as average settlement size and population by season, average length of settlement stay by season and site settlement selection by season.
once the available data have been gathered, they must be evaluated in terms of their applicability for predicting site location. one of the first tasks is to identify general trends of cultural change and stability and trends in the distribution of known sites. One result of this type of background research must be the identification of known sites or at least of the types of sites crucial to understanding regional settlement systems (Altschul 1988:80).
It is important at the data-evaluation stage to determine (or hypothesize) the types of sites expected to be found for each culture period and their probable locations. The definition of site types reflecting temporal, functional, and cultural differences is perhaps one of the most useful tasks that can be performed to prepare for model building. Environmental data are also needed for model building. To be useful, however, environmental data should be of consistent quality and scale throughout the study area (Altschul 1988:81).
All previously archaeologically surveyed areas should be mapped on base maps. The type and completeness of survey coverage must be carefully examined. Using information provided in project reports, the archaeologist should separate projects in which coverage appears to have been biased, incomplete, or otherwise suspect from those in which survey and recording practices conform to acceptable standards.
The plan should next address the second step of the project - the fieldwork phase. The formulation of a trial model for the study area must include detailed information about proposed field methods, rates of inventory, recording standards, and collection strategies as well as detail concerning the rationale for selection of sample inventory units. All this information must be collected and evaluated prior to initiation of fieldwork (Kincaid 1988:559).
Statistically representative data are not necessary to develop a model; if new data collection is planned for purposes of model development, however, these are certainly the most effective data to collect. Model testing, on the other hand, does depend on the availability of unbiased data that are representative of the study area, most often data that were collected using some form of random sampling. Until a representative sample of data is obtained through a carefully designed inventory project, any model developed for the area must remain essentially untested and should be used accordingly (Kincaid 1988:558).
In many cases, the value of the results depends on the detail of the environmental data recorded for each sample unit and the levels and types of measurement used in recording the data. The results of a model are only as good as the data on which the model is based (Kincaid 1988:561).
Previous archaeological analyses of the distribution of sites have often focused on the distinction between a settlement pattern and a settlement system; "the pattern being the empirical evidence of site distribution while the system is the behavioral abstraction of regularities in the processes which generated the pattern" (Parker 1985:174). This analysis proposes that the generating process is one of location assessment and choice, which in itself is sufficient to characterize the settlement system. The regularities defining a settlement pattern are seen as a reflection of underlying regularities in the settlement system.
A good location model does not focus on the archaeological site as the unit of analysis. It recognizes all possible site location choices; those without sites as well as those with. It predicts where sites are unlikely to occur as well as where they are likely to occur. If environmental information is known only for locations where sites are found, then it is impossible to use the proposed methodology. Locations where sites are not expected to be found cannot be predicted. An effective predictive model must focus on any potential site location, not simply the locations where sites are known to exist.
If site locations only are examined for some variable, such as distance to nearest water, interpretations could be in error if comparisons are made to the distribution of this variable for the total area with no consideration of the non-site locations in the sample (Parker 1985:175).
Undoubtedly there were some differences in site location selection throughout the entire prehistoric period. Hunting and gathering, as well as agricultural subsistence patterns, are probably represented in any sample of prehistoric sites. It is assumed, however, "that there may be enough similarities in the environmental characteristics of the chosen site locations to allow the derivation of a significant predictive model for prehistoric site location. Such similarities would represent fundamental physical properties of locations necessary for survival. The set of site locations used in building the predictive model, therefore, is concordant with the list of variables chosen" (Parker 1985:187).
Trial models developed in early stages are not unusable, but their use is limited, and they should be used with caution. Trial models can, however, provide a check on the adequacy of field recording procedures (Kincaid 1988:557).
The models of interest here are simplified sets of testable hypotheses, based either on behavioral assumptions, or on empirical correlations, which at a minimum attempt to predict locations of past human activities resulting in the deposition of artifacts or alteration of the landscape. First, the fact that such models are simplifications of reality means that they are fallible; no model that is simple enough to be useful can possibly anticipate all the contingencies that might result in the deposition of cultural materials. Second, by virtue of making predictions, such models are, or should be, testable (Kohler 1985:13).
Model development is a repetitive process of inventory and analysis that is most effective as a long-term strategy. In general, the quality of the model depends on the quality of the data; better data yield more precise and accurate models (Kincaid 1988:556). Location modelling normally assumes that certain environmental variables strongly influence archaeological site location (Kohler 1988:28).
There is good reason to consider intuitive thought in a discussion of predictive modelling. Many models for site location or settlement behaviour are intuitive or not fully operationalized. If a model can be objectively replicated and mapped, it is operationalized; a model consisting of the statement that "sites are located near rivers on dry, level ground," for example, is not mappable until site, near, river, dry, level, and ground have been rigorously defined (Kohler 1988:35).
Much of the recorded archaeological data base in British Columbia is due to intuitive models. Archaeologists have only recently concerned themselves with formalizing their notions about site location into research designs. Many archaeologists have surveyed and continue to survey land based on their ideas about where they will find sites. Moreover, these intuitive models are often the basis for more intensive research projects (Altschul 1988:65).
The most important characteristic of intuitive models from a scientific standpoint is that the components are not fully conceptualized. While everyone may understand the statement "sites will be found on high ground and near water", there will not necessarily be agreement on what is high ground or what "near water" means. The relationship(s) among landform, distance to water and archaeological sites is only partially established. Until everyone can agree on what the terms mean they cannot be operationalized in a way that can be replicated (Altschul 1988:64).
Many predictive models developed in Cultural Resource Management (CRM) studies take the form of relatively simple pattern-recognition or associational models. Associational models are among the most commonly used predictive models in cultural resource management (Altschul 1988:66). They are attractive primarily because of their simplicity; they are easy to construct and relatively straightforward to understand.
Associational models provide a means of operationalizing the environmental variables that may be related to site location. In this sense they are a tremendous improvement over intuitive models. Associational models can be used to provide a first guess about site location and as a basis for future research; they can, for instance, define environmental parameters that will be useful in stratifying a region for an archaeological survey.
Areal models are those that predict certain characteristics of sites or cultural resources, such as density or frequency, per a specified unit of land. For the most part, areal models are more attractive than associational models because the latter only produce relative statements about site location, such as "more sites will be found in this area than in that one" or "more sites found in this zone than would be expected by chance alone," and these statements are often inadequate for research or management needs. In many instances researchers and managers want to know more than the fact that one zone will contain more sites than another; they want to know how many sites each zone will contain and what site density in each zone will be (Altschul 1988:68).
One of the more popular types of predictive models used in CRM is an areal-based pattern-recognition model. Most of these models utilize sample data to compute a mathematical function, which is then used to predict some aspect of site location (ie., presence/absence or site density) for unsurveyed units (Altschul 1988:69).
Some models are deductively derived and attempt to predict how particular patterns of human land use will be reflected in the archaeological record while others work with inductively derived models that identify and quantify relationships between archaeological site locations and environmental variables. The latter models are termed correlative, and are by far the most common in current archaeological modelling practice.
If a research project requires information about the general nature of human use of a landscape, correlative models provide invaluable data. For example, it is clear from the ethnographic record, that human groups employing different subsistence strategies make use of their environments in very different ways. The nature and strength of correlations between cultural remains and features of the environment will be strongly affected by differences in prehistoric resource selection (Sebastian/Judge 1988:4).
Correlative models work by comparing the locations of a sample of sites with environmental features and forecasting the locations of other, unknown sites in areas that are similar environmentally. Consequently, we move from a total mapping of the environment and a partial mapping of archaeological resources to a total predicted mapping of archaeological resources (Kohler/Parker 1986:401).
As stated previously, any predictive model constructed inductively can be only as accurate as the survey data on which it is based. If models must be built without the benefit of a probabilistic sample, they should not be used for serious planning purposes until they have been validated or revised according to rigorous sampling procedures. In correlative models, the bridge between the sample and the target population is built with formal statistical inference. For example, a common procedure for building predictive models begins with a small probabilistic sample of a region using relatively large independently selected survey units (quadrats) stratified by some environmental variable (Kohler/Parker 1986:404).
Inductive (or correlative) models have limited explanatory value because they do not account for observed correlations between independent and dependent variables. For example, empirical analysis may demonstrate that a certain type of site in a sample is always located within a limited distance of outcrops of a particular geologic formation. While this information may be very useful in certain contexts, it has not been demonstrated that the presence of outcrops actually influenced site locations (Kincaid 1988:567).
Correlative models are not immediately "transferable," that is, when developed for one geographical location, they do not necessarily work in another; there is no logical reason why they should. "The question then is whether it is more cost-effective to redevelop the correlative model for use in a new area or to develop the explanatory model in the first place, since the latter would be applicable in a variety of areas and would address other management needs (interpretation, evaluation) at the same time" (judge/Martin 1988:578).
There is a second theoretical approach, the deductive approach, that is sometimes used in archaeological predictive modelling. Briefly, a deductive approach or explanatory model (ie., one proceeding from theory to data) often explains why a model works. This is important, especially if the model is to be successfully applied to other settings. The major drawback of deductive models is the difficulty in making them operational. In contrast, inductive models proceed from data to theory; observed correlations in the data are used to formulate general hypotheses. If, for example, several major village sites in a particular area are located near or on one particular soil type, one might hypothesize that large habitation sites tend to be located close to this particular soil type.
From a research perspective, explanation is our ultimate goal. Deductively derived models are also superior from a management point of view. If we do not understand why patterns occur, we cannot be confident they will reoccur in the future. Pattern-recognition models often show that settlement distributions are highly patterned, but without some sort of explanatory framework, management decisions based on these patterns are unsupported (Altschul 1988:67).
Theory-based, deductive areal models have not received much attention in cultural resource management studies, for three reasons. First, theoretically based models require more time to create. The internal connections between variables must be explicitly stated, as well as the logical arguments supporting those relationships. Second, validation procedures are more complex. Deductive models must demonstrate that they are not only consistent with the data but also more sound than any alternative. In contrast, inductive models are judged primarily on the accuracy of their predictions. No statement is necessary about how the population was formed in these models, only that the dependent variable interacts with one or more independent variables. Thirdly, the predictive statements derived from some types of deductive models are not always useful for management purposes (Altschul 1988:72).
In reality, in the long-term time frame of cultural resource management programs, the distinction between deductive and inductive approaches becomes blurred. The model building and refinement process is based on a continuous cycle of data collection, analysis, and model refinement. The results of one cycle of field testing and analysis are used to refine the model, which then guides the next phase of data collection (Kincaid 1988:555).
The foregoing discussion has reviewed several kinds of predictive models of site location. Some are largely or wholly operationalized, others are intuitive; some are based on deductive arguments, others are inductive.
In a cultural resource management context, predictive models and sampling have been used in the formulation of planning documents. One of the best such studies was performed in central Colorado by Nickens and Associates (Kvamme 1980). A random sample inventory survey was performed, for use in generating a predictive model of site location. Previous work in the region guided researchers, in terms of which data were to be collected for the model.
Once the data were collected and analyzed, a model was constructed which compared the environmental qualities of site and non-site areas. Environmental factors crucial to site location included vertical distance to water, view, shelter, low slope, and forested resources (Kvamme 1980:96-103). These were combined and weighted in a discriminant function analysis. Any given plot of ground could then be rated by this analysis as to its likelihood for containing a site. This model was tested against independent data gathered later from within the project area (Klesert 1987:230).
In at least two Pacific Northwest Forests, rather informal location models based on a small amount of data, are being used to determine sampling proportion (but not intensity) in future survey. Areas similar to those where few resources were located in past surveys are sampled at very low rates, with survey effort concentrated in areas similar to those which, in the past, evidenced high densities of cultural resources (Kohler 1985:15).
In the Southwest, attention has focused upon "complete" surveys, not samples, in a preliminary effort to determine the potential for building predictive models. "The question asked of each project was: would it have been possible, given these data, to design a survey strategy that would have resulted in the location of most of the sites by examining only one part of the area? Where the answer was yes, one looked for environmental characteristics that could be used to define areas containing sites" (DeBloois 1985:8).
In Ontario and Saskatchewan archaeologists are working with foresters "to investigate the feasibility of using predictive models to anticipate the spatial distribution of heritage resources" in boreal forest environments (Dalla Bona/Larcombe 1992:36).
Location models are currently employed primarily to focus survey efforts on areas with probable dense distributions of archaeological resources. "This may be good for archaeology, in the limited sense that it builds up the site inventory relatively rapidly, but it is a strategy that may miss important segments of adaptations from some time periods, while missing other periods entirely". Moreover, this approach may be neither cost-effective nor good management (Kohler 1985:18).
The location selection decision-making process involves three basic components: 1) the biophysical properties of location, 2) the social subsystem; and 3) the "cultural" information. Ideally, all three of these components should be built into a location choice model. The most readily available information for an archaeological context is data regarding the biophysical properties of locations, the social and cultural components of the location choice being largely unknown except for well understood archaeological situations (Parker 1985:174).
If it is not understood why a model works in one study area, there can be no way of knowing whether it will work in a new study area. In order for a cultural resource manager to use information derived from models, even for the most general planning purposes, he or she must know that the model works within specified levels of confidence and precision (Sebastian/Judge 1988:6).
With explanatory models on the other hand, eventually it may be possible to offer general models that can be demonstrated to be applicable in any situation characterized by a specified set of cultural system and ecosystem variables (Sebastian/Judge 1988:7).