General Overview
SAMS is a computer software package that deals with the stochastic analysis, modeling, and simulation of hydrologic time series. It is written in C, Fortran and C++, and runs under modern windows operating systems such as WINDOWS XP. The package consists of many menu options which enables the user to choose between different options that are currently available. SAMS 2005 is a modified and expanded version of SAMS-96.1 and SAMS 2000. It consists of three primary application modules: 1) Data Analysis, 2) Fit a Model, and 3) Generate Series. Figure 2.1 shows SAMS’s main window. The main menu bar indicates “Model” next to “Fit Model” where the model parameters can be shown. It also allows resetting the model. In addition, “Plot Properties” is shown next to “Generate Series”, which enables one selecting some useful plotting features grid and zoom.

Before running the applications, the user must import a file that contains the (historical) input data to be analyzed. This can be done by clicking on "File Menu" then choosing the “Import Flow File” option as shown in Figure 2.2.

The “Data Analysis” is one of the main applications of SAMS. The functions of this module consist of data plotting, checking the normality of the data, data transformation, and computing and displaying the statistical (stochastic) characteristics of the data. Plotting the data may help detecting trends, shifts, outliers, or errors in the data. Probability plots are included for verifying the normality of the data. The data can be transformed to normal by using different transformation techniques. Currently, logarithmic, power, gamma, and Box-Cox transformations are available. SAMS determines a number of statistical characteristics of the data. These include basic statistics such as mean, standard deviation, skewness, serial correlations (for annual data), spectrum, season-to-season correlations (for seasonal data), annual and seasonal cross-correlations for multisite data, and drought, surplus, and storage related statistics. These statistics are important in investigating the stochastic characteristics of the data.

The second main application of SAMS “Fit Model” includes parameter estimation and model testing for alternative univariate and multivariate stochastic models. The following models are included: (1) univariate ARMA(p,q) model, where p and q can vary from 1 to 10, (2) univariate GAR(1) model, (3) univariate periodic PARMA(p,q) model, (4) univariate shifting-mean SM model, (5) univariate seasonal disaggregation, (6) multivariate autoregressive MAR(p) model, (7) contemporaneous multivariate CARMA(p, q) model, where p and q can vary from 1 to 10, (8) multivariate periodic MPAR(p) model, (9) multivariate CSM-CARMA(p, q) model, (10) multivariate annual (spatial) disaggregation model, and (11) multivariate temporal disaggregation model.

Two estimation methods are available, namely the method of moments (MOM) and the least squares method (LS). MOM is available for most of the models while LS is available only for univariate ARMA, PARMA, and CARMA models. For CARMA models, both the method of moments (MOM) and the method of maximum likelihood (MLE) are available for estimation of the variance-covariance (G) matrix. Regarding multivariate annual (spatial) disaggregation models, parameter estimation is based on Valencia-Schaake or Mejia-Rousselle methods, while for annual to seasonal (temporal) disaggregation Lane's condensed method is applied.

For stochastic simulation at several sites in a stream network system a direct modeling approach based on multivariate autoregressive and CARMA processes are available for annual data and multivariate periodic autoregressive process is available for seasonal data. In addition, two schemes based on disaggregation principles are available. For this purpose, it is convenient to divide the stations into key stations, substations, subsequent stations, etc. Generally the key stations are the farthest downstream stations, substations are the next upstream stations, and subsequent stations are the next further upstream stations etc. In the first scheme, the annual flows at the key stations are added creating an annual flow data at an “artificial or index station”. Subsequently, a univariate ARMA(p,q) model is fitted to the annual flows of the index station. Then, a spatial disaggregation model relating the annual flows of the index station to the annual flows of the key stations is fitted. Further, one or more statistical disaggregation models relating the annual flows of the key stations to those of the substations are fitted. This process can be repeated as long as there are any unmodeled stations left, where each modeled station can be defined as key station at the next disaggregation level and each unmodeled station can be defined as substation. In the second scheme a multivariate model is fitted to the annual data of the key stations, then the rest of the model relating the annual flows at the key station, substations, and subsequent stations are conducted in a similar manner as in the first scheme. Furthermore, if the objective of the modeling exercise is to generate seasonal data by using disaggregration approaches, then an additional temporal disaggregration model is fitted that relates the annual flows of a group of stations with the corresponding seasonal flows. The foregoing schemes of modeling and generation at the annual time scale with spatial disaggregation as needed and then performing the temporal disaggregation can also be reversed, i.e. starting with temporal disaggregation of key station annual flows to seasonal flows followed by spatial disaggregation.

The third main application of SAMS is “Generate Series”, i.e. simulating synthetic data. Data generation is based on the models, approaches, and schemes as mentioned above. The model parameters for data generation are those that are estimated by SAMS. The user also has the option of importing annual series at key stations (e.g. series generated using a software other than SAMS). The statistical characteristics of the generated data are presented in graphical or tabular forms along with the historical statistics of the data that was used in fitting the generating model. The generated data including the "generated" statistics can be displayed graphically or in table form, and be printed and/or written on specified output files. As a matter of clarification, we will summarize here the overall data generation procedure for generating seasonal data based on scheme 2:

(a) a multivariate model, such as AR(p), is utilized to generate the annual flows at the key stations;
 (b) a spatial disaggregation model is used to disaggregate the generated annual flows at the key stations into annual flows at the substations, followed by additional spatial disaggregations until all upstream stations are taken into account;
 (c) a temporal disaggregation model is used to disaggregate the annual flows at one or more groups of stations into the corresponding seasonal flows at those stations.

Statistical Analysis of Data
Figure 2.3 shows the “Data Analysis” menu. By selecting this menu the user can carry out statistical analysis on the annual or seasonal data, either original or transformed data. The following are the four operations that the user may choose:

1. Plot Time Series.
2. Transform.
3. Show Statistics.
4. Plot Statistics.

We will examine and illustrate each of these options below.

Plot Time Series
Plotting the data can help detecting trends, shifts, outliers, and errors (in the data.) Figure 2.4 shows the menu after choosing the “Plot Time Series” function. Annual or seasonal time series may be plotted in the original or transformed domain. Figure 2.5 illustrates a time series plot for annual data. The user may plot either the entire time series or just part of it. To do so, one must activate the “Plot Properties” menu (also shown in Figure 2.3) and chose “Range” or “Rectangle” under the menu “ZOOM”. The time series plots and any other plots produced by SAMS can be easily transferred into other word/image processing or spreadsheet applications such as MS Word, Excel, and Adobe Photoshop. The transferring can be done by using the “Copy to Clipboard” function, which is also available under the “Plot Properties” menu and then paste the plot into other applications.

Transform Time series
SAMS tests the normality of the data by plotting the data on normal probability paper and by using the skewness and the Filliben tests of normality. To examine the adequacy of the transformation, the comparison of the theoretical distribution based on the transformation and the counterpart historical sample distribution is shown. Meanwhile the critical values and the results of the test are displayed in table format. Figure 2.6 is the display obtained after clicking on the “Transform” menu. The user can test the annual or seasonal data of any site by selecting proper options of “Data Type” and “Station #” on the left hand side panel. To plot the empirical frequency distribution the user may select either the Cunnane’s or the Weibull’s plotting position equations.

If the data at hand is not normal, one may try using a transformation function. The transformation methods available in SAMS include: logarithmic, power, and Box-Cox transformations as shown in the left panel in Figure 2.6 After selecting the type of transformation method one must click on the “Accept Transformation" button. The results of the transformation are displayed in graphical forms where the plot of the frequency distribution of the original and the transformed data may be shown on the normal probability paper. The graphical results include the theoretical distribution as well as numerical values of the tests of normality. Figure 2.7 displays the results after a logarithm transformation for site 1 and season (month) 1 of the data.

Show Statistics
A number of statistical characteristics can be calculated for the annual and seasonal data either original or transformed. The results can be displayed in tabular formats and can be saved in a file. These calculations can be done by choosing the “Show Statistics” under the “Data Analysis” menu. The statistics include: (1) Basic Statistics such as mean, standard deviation, skewness coefficient, coefficient of variation, maximum, and minimum values, autocorrelation coefficients, season-to season correlations, spectrum, and cross-correlations. The equations utilized for the calculations are described in section 3.1. Figure 2.8 shows an example of some of the calculated basic statistics. (2) Storage, Drought, and Surplus Related Statistics such as the longest deficit period, maximum deficit volume, longest surplus period, maximum surplus volume, storage capacity, rescaled range, and the Hurst coefficient. The equations used for the calculation are shown in section 3.2.

To calculate the drought statistics, the user needs to specify a demand level. Figure 2.9 shows the menu where the demand level has been specified as a fraction of the sample mean, and the results of the various storage, drought, and surplus related statistic also displayed.

Any tabular displays in SAMS all can be easily saved to a text file. Just highlight the window of the tabular displays and then to the “File” menu and using the “Save Text” function. Some users may prefer to use MS Excel to further process the results of the calculations done by SAMS. This can be done by using the “Export to Excel” function also under the “File” menu.

Plot Statistics
Some of the statistical characteristics may be displayed in graphical formats. These statistics include annual and seasonal correlation (autocorrelation) coefficients, season-to-season correlations, cross correlation coefficient between different sites, spectrum, and seasonal statistics including mean, standard deviation, skewness coefficient, coefficient of variation, maximum, and minimum values.

Figures 2.10 and 2.11 show the menu for plotting the serial correlation coefficient and the cross correlation coefficient, respectively along with some examples. The left hand side window in Figure 2.10 shows 15 as the maximum number of lags for calculating the autocorrelation function. It also shows whether the calculation will be done for the original of the transformed series. And the bottom part of the window shows the slots for selecting the station number to be analyzed and the type of data, i.e. annual or seasonal. The correlogram shown corresponds to the annual flows for station 1 (Colorado River near Glenwood Springs). Figure 2.11 shows the menu for calculating the cross-correlation function between (two) sites 19 and 20. The plot of the spectrum (spectral density function) against the frequency is displayed in Figure 2.12. The left hand side of the figure has slots for selecting the smoothing function (window), the maximum number of lags (in terms of a fraction of the sample size N), and the spacing. The right hand side of the figure shows the spectrum for the annual flows of the Colorado River at site 20. In addition, the various seasonal statistics may be seen graphically. Figure 2.13 shows the monthly means for the monthly streamflows of the Colorado River at site 20.

Any plot produced in SAMS can be shown in tabular format (i.e. display the values that are used for making the plots). This can be done by using the “Show Plot Values” function under the “Plot Properties” menu. These values can be further saved to a text file or transferred into Excel. Figure 2.14 shows an example of the values used in the plot for the serial correlation coefficients.

Fitting a Stochastic Model
The LAST package included a number of programs to perform several objectives regarding stochastic modeling of time series. The basic procedure involved modeling and generating the annual time series using a multivariate AR(1) or AR(2) model, then using a disaggregation model to disaggregate the generated annual flows to their corresponding seasonal flows. In contrast, SAMS has two major modeling strategies which may be categorized as direct and indirect modeling. Direct modeling means fitting an stationary model (e.g. univariate ARMA or multivariate AR, CARMA or CSM-CARMA) directly to the annual data or fitting a periodic (seasonal) model (e.g. univariate PARMA or multivariate PAR) directly to the seasonal data of the system at hand. Disaggregation modeling, on the other hand, is an indirect procedure because the modeling of the annual data for a site can rely on the modeling of the annual data of another site (key station), and the modeling of seasonal data involves also modeling the corresponding annual data as well before the seasonal data are obtained by temporal disaggregation. SAMS categorizes the models into those for the annual data and for the seasonal data. In each category, there are univariate, multivariate, and disaggregation models. The following specific models are currently available in SAMS under each category:

  1. For annual data:
    • Univariate ARMA(p,q) model.
    • Univariate GAR(1) model.
    • Univariate Shifting Mean (SM) model.
    • Multivariate AR(p) model (MAR).
    • Contemporaneous ARMA(p,q) model (CARMA).
    • CSM-CARMAR(p,q) model.
    • Multivariate annual (spatial) disaggregation.
  2. For seasonal data:
    • Univariate PARMA(p,q) model.
    • Multivariate PAR(p) model (MPAR(p)).
    • Univariate seasonal disaggregation model.
    • Multivariate spatial-seasonal disaggregation model.
    • Multivariate seasonal-spatial disaggregation model.
The operation for fitting the models rather than a disaggregation model is basically the same. After clicking on the “Fit Model” menu and choosing the desired model, a menu for fitting the chosen model will appear where the site number, the model order, etc. can be specified. The user needs to specify the station (site) number(s). If standardization of the data is desired, one must click on the "Standardize Data" button. Generally, the modeling is performed with data in which the mean is subtracted. Thus, standardization implies that not only the mean is subtracted but in addition the data will be further transformed to have standard deviation equal to one. For example, for monthly data the mean for month 5 is subtracted and the result is divided by the standard deviation for that month. As a result, the mean and the standard deviation of the standardized data for month 5 become equal to zero and one, respectively. Then, the order of the model to be fitted is selected, for instance for ARMA models, one must enter p and q. In the case of MAR or MPAR models, one must key in the order p only. Subsequently, the method of estimation of the model parameters must be selected.

Currently SAMS provides two methods of estimation namely the method of moments (MOM) and the least squares (LS) method. MOM is available for the ARMA(p,q), GAR(1), SM, MAR(p), CSM part of the CSM-CARMA, PARMA(p,1), and MPAR(p) models while LS is available for ARMA(p,q), CARMA(p,q), and PARMA(p,q) models. The LS method is often iterative and may require some initial parameters estimates (starting points). These starting points are either based on fitting a high order simpler model using LS or by using the MOM parameters estimates as starting points. For cases where the MOM estimates are not available such as for the PARMA(p,q) model where q>1, the MOM parameter estimates of the closest model will be used instead. For fitting CARMA(p,q) models, the residual variance-covariance G matrix can be estimated using either the method of moments (MOM) or the maximum likelihood estimation (MLE) method (Stedinger et al., 1985). Figure 2.15 shows an example of fitting a CARMA(1,0) model.

In the case of fitting the CSM-CARMA(p,q) model a special dialog box will appear, and the user need to key in proper information for the model setup (see Figure 2.16). The mixed model can be used to fit a CSM model only or a CARMA model only and is recommended over using the single CARMA model option.

Fitting disaggregation models needs additional operations. Before explaining these operations, it is necessary to describe briefly the concept in setting up disaggregation models in SAMS. In disaggregation modeling, the user should conduct the process to setup the model configuration step by step. The configuration depends upon the orders and positions of the stations in the system relative to each other. The system structure means defining for each main river system the sequence of stations (sites) that conform the river network. SAMS uses the concept of key stations and substations. A key station is a downstream station along a main stream. It could be the farthest downstream station or any other station depending on the particular problem at hand. For instance, referring to the Colorado River system shown in Figure 2.17, station 29 is a key station if one is interested in modeling the entire river system. On the other hand, if station 29 is not used in the analysis, station 28 will become the key station. Also there could be several key stations. Let us continue the explanations assuming that stations 8 and 16 are key stations for the Upper Colorado River Basin. Substations are the next upstream stations draining to a key station. For instance, stations 2, 6, and 7 are substations draining to key station 8. Likewise, stations 11, 12, 13, 14, and 15 are substations for key station 16. Subsequent stations are the next upstream stations draining into a substation. For instance, stations 1, 5, and 10 are subsequent stations relative to substations 2, 6, and 11, respectively.

In addition, for defining a "disaggregation procedure" SAMS uses the concept of groups. A group consists of one or more key stations and their corresponding substations. Groups must be defined in each disaggregation step. Each group contains a certain number of stations to be modeled in a multivariate fashion, i.e. jointly, in order to preserve their cross-correlations. For instance, if a certain group has two key stations and three substations, then the disaggregation process will preserve the cross-correlations between all stations (key and substations.) On the other hand, if two separate groups are selected, then the cross-correlations between the stations that belong to the same group will be preserved, but the cross-correlations between stations belonging to different groups will not be preserved.

The definition of a group is important in the disaggregation process. For instance, referring to Figure 2.17, key station 8 and substations 2, 6, and 7 may form one group in which the flows of all these stations are modeled jointly in a multivariate framework, while key station 16 and its substations 11, 12, 13, 14, and 15 may form another group. In this case, the cross-correlations between the stations within each group will be preserved but the cross-correlations among stations of the two different groups will not be preserved. For example, the cross-correlations between stations 8 and 16 will not be preserved but the cross-correlations between stations 8 and 2 will be preserved. On the other hand, if all the stations are defined in a single group, then the cross-correlations between all the stations will be preserved. After modeling and generating the annual flows at the desired stations, the annual flows can be disaggregated into seasonal flows. This is handled again by using the concept of groups as explained above. The user, for example, may choose stations 11, 12, 13, 14, 15, and 16 as one group. Then, the annual flows for these stations may be disaggregated into seasonal flows by a multivariate disaggregation model so as to preserve the seasonal cross-correlations between all the stations.

Figure 2.18 shows the menu available for “fitting the model”. The user must choose whether the model (and generation thereof) is for annual or for seasonal data. Figure 2.18 shows the selection for seasonal data. The options to choose depend whether the modeling (and generation) problem is for 1 site (1 series) of for several sites (more than 1 series). Accordingly the model may be either univariate or multivariate, respectively. Choosing a univariate or multivariate model implies fitting the model using a direct modeling approach, e.g. for 3 sites using a trivariate periodic (seasonal) model based on the seasonal data available for the three sites. On the other hand, one may generate seasonal flows indirectly using aggregation and disaggregation methods. When using disaggregation methods two broad options are available (Figure 2.18), i.e. spatial-seasonal and seasonal-spatial. The first option defines a modeling approach whereby annual flow are generated first at key stations, subsequently, spatial disaggregation is applied to generate annual flows at upstream stations, then seasonal flow are obtained using temporal disaggregation. Alternatively, the second option defines a modeling approach where annual flows are generated at key stations, which are then disaggregated into seasonal flows based on temporal disaggregation models. And the final step is to disaggregate such seasonal flows spatially to obtain the seasonal flows at all stations in the system at hand.

SAMS has two schemes for modeling the key stations. In the first scheme, denoted as Scheme 1, the annual flows of the key stations that belong to a given group are aggregated to form an “index station”, then a univariate ARMA(p,q) model is used to model the aggregated flows (of the index station.) The aggregated annual flows are then disaggregated (spatially) back to each key station by using the Valencia and Schaake or the Mejia and Rouselle disagregation methods. Then the annual flows at the key stations are disaggregated spatially to obtain the annual flows at the substations and then to the subsequent stations, etc. The second scheme, denoted as Scheme 2, uses a multivariate model to represent (generate) the annual flows of the key stations belonging to a given group and then disaggregate those flows spatially to obtain the annual flows for the substations, subsequent stations, etc. For either Scheme 1 or 2, temporal disaggregation may be carried out if seasonal flows are desired. The mathematical description of the disaggregation methods is presented in chapter 4, and examples of disaggregation modeling applied to real streamflow data are presented in chapter 5.

In applying disaggregation methods the user needs to choose the specific disaggregation models for both spatial and temporal disaggregation. For example, when modeling seasonal data the user may select either the “spatial-temporal” or the “temporal-spatial” option. In any selection one must determine the type of disaggregation models. Figure 2.19 shows the windows option after choosing the “spatial-temporal” option. The modeling scheme as either 1 or 2 (as noted above) must model) be chosen, as well as the type of spatial disaggregation (either the Valencia-Schaake or Mejia-Rousselle model) and the type of temporal disaggregation (for this purpose only Lane’s model is available). The option “Temporal-Spatial” is slightly different where the user has a choice between two temporal disaggregation models, namely Lane’s model and Grygier and Stedinger model.

As illustration some of the steps and options followed in using a disaggregation approach are shown in Figures. 2.19 to 2.23. They are summarized as:
  • In Figure 2.19 Scheme 1 is selected along with the V-S model for spatial disaggregation and Lane’s model for temporal disaggregation.
  • In Figure 2.20 stations 8 and 16 (refer to Figure 2.17) are selected as key stations and an index station will be formed (the aggregation of he annual flows for sites 8 and 16). Then the ARMA(1,0) model was chosen to generate the annual flows of the index station.
  • The spatial disaggregation of the annual flows for key to substations must be carried our by groups. For example, this could be accomplished by considering key station 8 and 16 and their corresponding substations 2, 6, and 7 and 11, 12, 13, 14, and 15, respectively into a single group or by forming two or more groups. For instance, 2 groups were formed one per key station and Figures 2.21 and 2.22 show the procedure for selecting the group corresponding to key station 8.
  • The temporal disaggregation (from annual into seasonal flows) is also performed by groups (of stations) as shown in Figure 2.23. The specifications for the disaggregation modeling are completed by pressing the “Finish” button shown in Figure 2.23.
After fitting a stochastic model, one may view a summary of the model parameters by using the “Show Parameters” function under the “Model” menu. Figure 2.24 shows part of the model parameters regarding the simulation of seasonal flows using disaggregation methods as described above.

Generating Synthetic Series
Data generation is an important subject in stochastic hydrology and has received a lot of attention in hydrologic literature. Data generation is used by hydrologists for many purposes. These include, for example, reservoir sizing, planning and management of an existing reservoir, and reliability of a water resources system such as a water supply or irrigation system (Salas et al,1980). Stochastic data generation can aid in making key management decisions especially in critical situations such as extended droughts periods (Frevert et al, 1989). The main philosophy behind synthetic data generation is that synthetic samples are generated which preserve certain statistical properties that exist in the natural hydrologic process (Lane and Frevert, 1990). As a result, each generated sample and the historic sample are equally likely to occur in the future. The historic sample is not more likely to occur than any of the generated samples (Lane and Frevert, 1990).

Generation of synthetic time series is based on the models, approaches and schemes. Once the model has been defined and the parameters have been estimated, one can generate synthetic samples based on this model. SAMS allows the user to generate synthetic data and eventually compare important statistical characteristics of the historical and the generated data. Such comparison is important for checking whether the model used in generation is adequate or not. If important historical and generated statistics are comparable, then one can argue that the model is adequate. The generated data can be stored in files. This allows the user to further analyze the generated data as needed. Furthermore, when data generation is based on spatial or temporal disaggregation, one may like to make adjustments to the generated data. This may be necessary in many cases to enforce that the sum of the disaggregated quantities will add up to the original total quantity. For example, spatial adjustments may be necessary if the annual flows at a key station is exactly the sum of the annual flows at the corresponding substations. Likewise, in the case of temporal disaggregation, one may like to assure that the sum of monthly values will add up to the annual value. Various options of adjustments are included in SAMS. Further descriptions on spatial and temporal adjustments are described in later sections of this manual.

Figure 2.25 shows the data generation menu. In this menu the user must specify necessary information for the generation process. For example, the length of the generated data, how many samples will be generated, and whether the generated data or the statistics of the generated data will be saved to files should be specified by the user. Figure 2.26 show the window for the adjustment. The user can chose a method for the spatial adjustment.

After the generation of data, the user can compare the generated data to the historical record by using the “Compare” function under the “Generate” menu. The comparison can be made between the basic statistics, drought statistics, autocorrelations, and the time series plots. Figure 2.27 shows the menu for the comparison, and the comparison of the basic statistics. Figure 2.28 shows the comparison of the time series.