Abrupt Temporary Impact. In Time Series, the abrupt temporary impact pattern implies an initial abrupt increase or decrease due to the intervention which then slowly decays, without permanently changing the mean of the series. This type of intervention can be summarized by the expressions:
Prior to intervention: Impact_t = 0
At time of intervention: Impact_t =

After intervention: Impact_t = *Impact_t-1
Note that this impact pattern is again defined by the two parameters

(delta) and

(omega). As long as the

parameter is greater than 0 and less than 1 (the bounds of system stability), the initial abrupt impact will gradually decay. If

is near 0 (zero) than the decay will be very quick, and the impact will have entirely disappeared after only a few observations. If

is close to 1 then the decay will be slow, and the intervention will affect the series over many observations. Note that, when evaluating a fitted model, it is again important that both parameters are statistically significant; otherwise one could reach paradoxical conclusions. For example, suppose the

parameter is not statistically significant from 0 (zero) but the

parameter is; this would mean that an intervention did not cause an initial abrupt change, which then showed significant decay.

Abrupt Permanent Impact. In Time Series, a permanent abrupt impact pattern simply implies that the overall mean of the times series shifted after the intervention; the overall shift is denoted by (omega).

Accept-Support (AS) Testing. In this type of statistical test, the statistical null hypothesis is the hypothesis which, if true, supports the experimenter's theoretical hypothesis. Consequently, in AS testing, the experimenter would prefer not to obtain "statistical significance."

In AS testing, accepting the null hypothesis supports the experimenter's theoretical hypothesis.

For more information see the chapter on Power Analysis.

Activation Function (in Neural Networks). A function used to transform the activation level of a unit (neuron) into an output signal. Typically, activation functions have a "squashing" effect. Together with the PSP function (which is applied first) this defines the unit type.

Neural Networks supports a wide range of activation functions. Only a few of these are used by default; the others are available for customization.

Identity. The activation level is passed on directly as the output. Used in a variety of network types, including linear networks, and the output layer of radial basis function networks.

Logistic. This is an S-shaped (sigmoid) curve, with output in the range (0,1).

Hyperbolic. The hyperbolic tangent function (tanh): a sigmoid curve, like the logistic function, except that output lies in the range (-1,+1). Often performs better than the logistic function because of its symmetry. Ideal for customization of multilayer perceptrons, particularly the hidden layers.

Exponential. The negative exponential function. Ideal for use with radial units. The combination of radial synaptic function and negative exponential activation function produces units that model a Gaussian (bell-shaped) function centered at the weight vector. The standard deviation of the Gaussian is given by the formula below, where d is the "deviation" of the unit stored in the unit's threshold:

Softmax. Exponential function, with results normalized so that the sum of activations across the layer is 1.0. Can be used in the output layer of multilayer perceptrons for classification problems, so that the outputs can be interpreted as probabilities of class membership (Bishop, 1995; Bridle, 1990).

Unit sum. Normalizes the outputs to sum to 1.0. Used in PNNs to allow the outputs to be interpreted as probabilities.

Square root. Used to transform the squared distance activation in an SOFM network or Cluster network to the actual distance as an output.

Sine. Possibly useful if recognizing radially-distributed data; not used by default.

Ramp. A piece-wise linear version of the sigmoid function. Relatively poor training performance, but fast execution.

Step. Outputs either 1.0 or 0.0, depending on whether the Synaptic value is positive or negative. Can be used to model simple networks such as perceptrons.

The mathematical definitions of the activation functions are given in the table below:

Activation Functions

Function	Definition	Range
Identity	x	(-inf,+inf)
Logistic		(0,+1)
Hyperbolic		(-1,+1)
-Exponential		(0, +inf)
Softmax		(0,+1)
Unit sum		(0,+1)
Square root		(0, +inf)
Sine	sin(x)	[0,+1]
Ramp		[-1,+1]
Step		[0,+1]

Additive Models. Additive models represent a generalization of Multiple Regression (which is a special case of general linear models). Specifically, in linear regression, a linear least-squares fit is computed for a set of predictor or X variables, to predict a dependent Y variable. The well know linear regression equation with m predictors, to predict a dependent variable Y, can be stated as:

Y = b₀ + b₁*X₁ + .. b_m*X_m

Where Y stands for the (predicted values of the) dependent variable, X₁ through X_m represent the m values for the predictor variables, and b₀, and b₁ through b_m are the regression coefficients estimated by multiple regression. A generalization of the multiple regression model would be to maintain the additive nature of the model, but to replace the simple terms of the linear equation b_i*X_i with f_i(X_i) where f_i is nonparametric function of the predictor X_i. In other words, instead of a single coefficient for each variable (additive term) in the model, in additive models an unspecified (non-parametric) function is estimated for each predictor, to achieve the best prediction of the dependent variable values.

For additional information, see Hastie and Tibshirani, 1990, or Schimek, 2000.

Additive Season, Damped Trend. In this Time Series model, the simple exponential smoothing forecasts are "enhanced" both by a damped trend component (independently smoothed with the single parameter , this model is an extension of Brown's one-parameter linear model, see Gardner, 1985, p. 12-13) and an additive seasonal component (smoothed with parameter ). For example, suppose we wanted to forecast from month to month the number of households that purchase a particular consumer electronics device (e.g., VCR). Every year, the number of households that purchase a VCR will increase, however, this trend will be damped (i.e., the upward trend will slowly disappear) over time as the market becomes saturated. In addition, there will be a seasonal component, reflecting the seasonal changes in consumer demand for VCR's from month to month (demand will likely be smaller in the summer and greater during the December holidays). This seasonal component may be additive, for example, a relatively stable number of additional households may purchase VCR's during the December holiday season. To compute the smoothed values for the first season, initial values for the seasonal components are necessary. Also, to compute the smoothed value (forecast) for the first observation in the series, both estimates of S₀ and T₀ (initial trend) are necessary. By default, these values are computed as:

T₀ = (1/)*(M_k-M₁)/[(k-1)*p]

where
     is the smoothing parameter
k       is the number of complete seasonal cycles
M_k    is the mean for the last seasonal cycle
M₁    is the mean for the first seasonal cycle
p       is the length of the seasonal cycle
and S₀ = M₁ - p*T₀/2

Additive Season, Exponential Trend. In this Time Series model, the simple exponential smoothing forecasts are "enhanced" both by an exponential trend component (independently smoothed with parameter ) and an additive seasonal component (smoothed with parameter ). For example, suppose we wanted to forecast the monthly revenue for a resort area. Every year, revenue may increase by a certain percentage or factor, resulting in an exponential trend in overall revenue. In addition, there could be an additive seasonal component, for example a particular fixed (and slowly changing) amount of added revenue during the December holidays.

To compute the smoothed values for the first season, initial values for the seasonal components are necessary. Also, to compute the smoothed value (forecast) for the first observation in the series, both estimates of S₀ and T₀ (initial trend) are necessary. By default, these values are computed as:

T₀ = exp((log(M₂) - log(M₁))/p)

where
M₂    is the mean for the second seasonal cycle
M₁    is the mean for the first seasonal cycle
p       is the length of the seasonal cycle
and S₀ = exp(log(M₁) - p*log(T₀)/2)

Additive Season, Linear Trend. In this Time Series model, the simple exponential smoothing forecasts are "enhanced" both by a linear trend component (independently smoothed with parameter ) and an additive seasonal component (smoothed with parameter ). For example, suppose we were to predict the monthly budget for snow-removal in a community. There may be a trend component (as the community grows, there is a steady upward trend for the cost of snow removal from year to year). At the same time, there is obviously a seasonal component, reflecting the differential likelihood of snow during different months of the year. This seasonal component could be additive, meaning that a particular fixed additional amount of money is necessary during the winter months, or (see below) multiplicative, that is, given the respective budget figure, it may increase by a factor of, for example, 1.4 during particular winter months.

T₀ = (M_k-M₁)/((k-1)*p

where
k       is the number of complete seasonal cycles
M_k    is the mean for the last seasonal cycle
M₁    is the mean for the first seasonal cycle
p       is the length of the seasonal cycle
and S₀ = M₁ - T₀/2

Additive Season, No Trend. This Time Series model is partially equivalent to the simple exponential smoothing model; however, in addition, each forecast is "enhanced" by an additive seasonal component that is smoothed independently (see The seasonal smoothing parameter ). This model would, for example, be adequate when computing forecasts for monthly expected amount of rain. The amount of rain will be stable from year to year, or change only slowly. At the same time, there will be seasonal changes ("rainy seasons"), which again may change slowly from year to year.

To compute the smoothed values for the first season, initial values for the seasonal components are necessary. The initial smoothed value S₀ will by default be computed as the mean for all values included in complete seasonal cycles.

Adjusted means. These are the means that one would get after removing all differences that can be accounted for by the covariate in an analysis of variance design (see ANOVA).

The general formula (see Kerlinger & Pedhazur, 1973, p. 272) is

Y-bar_j(adj) = Y-bar_j - b(X-bar_j - X-bar)

where
Y-bar_j(adj)  the adjusted mean of group j;
Y-bar_j      the mean of group j before adjustment;
b         the common regression coefficient;
X-bar_j      the mean of the covariate for group j;
X-bar      the grand mean of the covariate.

AID. AID (Automatic Interaction Detection) is a classification program developed by Morgan & Sonquist (1963) that led to the development of the THAID (Morgan & Messenger, 1973) and CHAID (Kass, 1980) classification tree programs. These programs perform multi-level splits when computing classification trees. For discussion of the differences of AID from other classification tree programs, see A Brief Comparison of Classification Tree Programs.

Akaike Information Criterion (AIC). When a model involving q parameters is fitted to data, the criterion is defined as -Lq + 2q where Lq is the maximized log likelihood. Akaike suggested maximizing the criterion to choose between models with different numbers of parameters. It was originally proposed for time-series models, but is also used in regression. Akaike Information Criterion (AIC) can be used in Generalized Linear/Nonlinear Models (GLZ) when comparing the subsets of effects during best subset regression. Since the evaluation of the score statistic does not require iterative computations, best subset selection based on the score statistic is computationally faster, while the selection based on the AIC statistic usually provides more accurate results.

Algorithm. As opposed to heuristics (which contain general recommendations based on statistical evidence or theoretical reasoning), algorithms are completely defined, finite sets of steps, operations, or procedures that will produce a particular outcome. For example, with a few exceptions, all computer programs, mathematical formulas, and (ideally) medical and food recipes are algorithms.

Anderson-Darling Test. The Anderson-Darling procedure is a general test to compare the fit of an observed cumulative distribution function to an expected cumulative distribution function. This test is applicable to complete data sets (without censored observations). The critical values for the Anderson-Darling statistic have been tabulated (see, for example, Dodson, 1994, Table 4.4) for sample sizes between 10 and 40; this test is not computed for n less than 10 and greater than 40.

The Anderson-Darling test is used in Weibull and Reliability/Failure Time Analysis; see also, Mann-Scheuer-Fertig Test and Hollander-Proschan Test.

Append a network. A function to allow two neural networks (with compatible output and input layers) to be joined into a single network.

Append Cases and/or Variables. Functions that add new cases (i.e., rows of data) and/or variables (i.e., columns of data) to the end of the data set (the "bottom" or the right hand side, respectively). Cases and Variables can also be inserted in arbitrary locations of the data set.

Application Programming Interface (API). Application Programming Interface is a set of functions that conform to conventions of a particular operating system (e.g., Windows) which allows the user to programmatically access the functionality of another program. For example, the kernel of STATISTICA Neural Networks can by accessed by other programs packages (e.g., Visual Basic, STATISTICA BASIC, Delphi, C, C++) in a variety of ways.

Arrow. An element in a path diagram used to indicate causal flow from one variable to another, or, in narrower interpretation, to show which of two variables in a linear equation is the independent variable and which is the dependent variable.

Assignable Causes and Actions. In the context of monitoring quality characteristics you have to distinguish between two different types of variability: Common cause variation describes random variability that is inherent in the process and affects all individual values. Ideally, when your process is in-control, only common cause variation will be present. In a quality control chart, it will show up as a random fluctuation of the individual samples around the center line with all samples falling between the upper and lower control limit and no non-random patterns (runs) of adjacent samples. Special cause or assignable cause variation is due to specific circumstances that can be accounted for. It will usually show up in the QC chart as outlier samples (i.e., exceeding the lower or upper control limit) or as a systematic pattern (run) of adjacent samples. It will also affect the calculation of the chart specifications (center line and control limits).

With some software programs, if you investigate the out-of-control conditions and you find an explanation for them, you can assign descriptive labels to those out-of-control samples and explain the causes (e.g., valve defect) and actions that have been taken (e.g. valve fixed). Having causes and actions displayed in the chart will document that the center line and the control limits of the chart are affected by special cause variation in the process.

Association Rules. Data mining for association rules is often the first and most useful method for analyzing data that describe transactions, lists of items, unique phrases (in text mining) etc. In general, association rules take the form If Body then Head, where Body and Head stand for simple codes, text values, items, consumer choices, phrases etc., or the conjunction of codes and text values etc. (e.g., if (Car=Porsche and Age<20 and ThrillSeeking=High) then (Risk=High and Insurance=High); here the logical conjunction before the then would be the Body, and the logical conjunction following the then would be the Head of the association rule). The A-priori algorithm (see Agrawal and Swami, 1993; Agrawal and Srikant, 1994; Han and Lakshmanan, 2001; see also Witten and Frank, 2000) is a popular and efficient algorithm for deriving such association rules from large data sets, based on some user-defined "threshold" values for rule.

Asymmetrical Distribution. If you split the distribution in half at its mean (or median), then the distribution of values on the two sides of this central point would not be the same (i.e., not symmetrical) and the distribution would be considered "skewed."

Attribute (attribute variable). An alternative name for a nominal variable.

Augmented Product Moment Matrix. For a set of p variables, this is a (p + 1) X (p + 1) square matrix. The first p rows and columns contain the matrix of moments about zero, while the last row and column contain the sample means for the p variables. The matrix is therefore of the form:

where M is a matrix with element

and is a vector with the means of the variables (see Structural Equation Modeling).

Autoassociative Network. A neural network (usually a multilayer perceptron) designed to reproduce its inputs at its outputs, while "squeezing" the data through a lower-dimensionality middle layer. Used for compression or dimensionality reduction purposes (see Fausett, 1994; Bishop, 1995).

Automatic Network Designer. An heuristic algorithm (implemented in STATISTICA Neural Networks) which experimentally determines an appropriate network architecture to fit a specified data set.