Shown below is an Analytics Glossary with concepts commonly associated with Analytics along with related fields such as Statistics, Operations Research, and Economic Theory.


Additonally, all of these definitions are directly pertinent to the Certified Analytics Professional(CAP) exam. This webpage can serve as one of your primary study aids for this exam.


  1. 5 Whys – an iterativeinterrogative technique used to explore the cause-and-effect relationships underlying a particular problem (
  2. 5S – workplace organization method promoting efficiency and effectiveness; five terms based on Japanese words for: sorting, set in order, systematic cleaning, standardizing, and sustaining (
  3. 80/20 Rule (Pareto Principle) – states that, for many events, roughly 80% of the effects come from 20% of the causes (
  4. Acceptance Sampling – uses statistical samplingto determine whether to accept or reject a production lot of material (
  5. Activity-Based Costing (ABC) – a costing method that assigns overhead and indirect costs to related products and services (
  6. Adjusted Coefficient of Determination (Adjusted R2) – the proportion of the variance in the dependent variable that is predictable from the independent variable(s);  the number of variables of the data set is taken into consideration (
  7. Agent-based Modeling – a class of computational modelsfor simulating the actions and interactions of autonomous agents (both individual or collective entities such as organizations or groups) with a view to assessing their effects on the system as a whole (
  8. Akaike Information Criterion (AIC) – an estimator of out-of-sample prediction error and thereby relative quality of statistical models for a given set of data. Given a collection of models for the data, AIC estimates the quality of each model, relative to each of the other models. Thus, AIC provides a means for model selection. (
  9. Amortization – an accounting technique used to periodically lower the book value of a loan or intangible asset over a set period of time (
  10. Analysis of Variance (ANOVA) – a collection of statistical models and their associated estimation procedures (such as the “variation” among and between groups) used to analyze the differences among group means in a sample (
  11. Analytics – scientific process of transforming data into insight for making better decisions (INFORMS)
  12. Anchoring Bias – occurs when individuals use an initial piece of information to make subsequent judgments. Once an anchor is set, other judgements are made by adjusting away from that anchor, and there is a bias toward interpreting other information around the anchor. (
  13. Analysis of Covariance (ANCOVA) – a general linear model which blends ANOVA and regression. ANCOVA evaluates whether the means of a dependent variable (DV) are equal across levels of a categorical independent variable (IV) often called a treatment, while statistically controlling for the effects of other continuous variables that are not of primary interest, known as covariates (CV) or nuisance variables. Mathematically, ANCOVA decomposes the variance in the DV into variance explained by the CV(s), variance explained by the categorical IV, and residual variance. Intuitively, ANCOVA can be thought of as ‘adjusting’ the DV by the group means of the CV(s). (
  14. Autoregressive Integrated Moving Average (ARIMA) – a generalization of an autoregressive moving average (ARMA) model. Both of these models are fitted to time series data either to better understand the data or to predict future points in the series (forecasting). (
  15. Artificial Intelligence – intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans. Leading AI textbooks define the field as the study of “intelligent agents”: any device that perceives its environment and takes actions that maximize its chance of successfully achieving its goals. (
  16. Artificial Neuron – a mathematical function conceived as a model of biological neurons. The artificial neuron receives one or more inputs (representing excitatory postsynaptic potentials and inhibitory postsynaptic potentials at neural dendrites) and sums them to produce an output (or activation, representing a neuron’s action potential which is transmitted along its axon). Usually each input is separately weighted, and the sum is passed through a non-linear function known as an activation function or transfer function[clarification needed]. (
  17. Artificial Neural Networks – computing systems vaguely inspired by the biological neural networks that constitute animal brains. Such systems “learn” to perform tasks by considering examples, generally without being programmed with task-specific rules. For example, in image recognition, they might learn to identify images that contain cats by analyzing example images that have been manually labeled as “cat” or “no cat” and using the results to identify cats in other images. (
  18. Assemble to Order (ATO) – a business production strategy where products ordered by customers are produced quickly and are customizable to a certain extent. (
  19. Assignment Problem – a fundamental combinatorial optimization problem. It consists of finding, in a weighted bipartite graph, a matching of a given size, in which the sum of weights of the edges is a minimum. In its most general form, the problem is as follows: The problem instance has a number of agents and a number of tasks. Any agent can be assigned to perform any task, incurring some cost that may vary depending on the agent-task assignment. It is required to perform as many tasks as possible by assigning at most one agent to each task and at most one task to each agent, in such a way that the total cost of the assignment is minimized. (
  20. Association Rule Mining – consists of first finding frequent item-sets (sets of items, such as A and B, satisfying a minimum support threshold, or percentage of the task relevant tuples), from which strong association rules in the form of A=>B are generated. These rules also satisfy a minimum confidence threshold (a pre-specified probability of satisfying B under the condition that A is satisfied). Associations can be further analyzed to uncover correlation rules, which convey statistical correlations between item-sets A and B. (
  21. Autocorrelation (Serial Correlation) – the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is the similarity between observations as a function of the time lag between them. The analysis of autocorrelation is a mathematical tool for finding repeating patterns, such as the presence of a periodic signal obscured by noise, or identifying the missing fundamental frequency in a signal implied by its harmonic frequencies. It is often used in signal processing for analyzing functions or series of values, such as time domain signals. (
  22. Autoregressive Modeling – a technique used to forecast time series with autocorrelation. A first-order autocorrelation refers to the association between consecutive values in a time series. A second-order autocorrelation refers to the relationship between values that are two periods apart. A pth-order autocorrelation refers to the correlation between values in a time series that are p periods apart. (; specifies that the output variable depends linearly on its own previous values and on a stochastic term (an imperfectly predictable term); thus the model is in the form of a stochastic difference equation. (
  23. Backpropagation – an algorithm for supervised learning of artificial neural networks using gradient descent. Given an artificial neural network and an error function, the method calculates the gradient of the error function with respect to the neural network’s weights. It is a generalization of the delta rule for perceptrons to multilayer feedforward neural networks. (
  24. Basic Variable – variables not set to zero I nan objective function (
  25. Bayes’ Theorem – describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For example, if cancer is related to age, then, using Bayes’ theorem, a person’s age can be used to more accurately assess the probability that they have cancer than can be done without knowledge of the person’s age. (
  26. Benchmark Problems – comparison of different algorithms using a large test set; when an algorithm is evaluated, we must look for the kind of problems where its performance is good, in order to characterize the type of problems for which the algorithm is suitable (
  27. Beta Distribution – a family of continuous probability distributions defined on the interval [0, 1] parametrized by two positive shape parameters, denoted by α and β, that appear as exponents of the random variable and control the shape of the distribution. It is a special case of the Dirichlet distribution. (
  28. Bias – refers to the tendency of a measurement process to over- or under-estimate the value of a population parameter. In survey sampling, for example, bias would be the tendency of a sample statistic to systematically over- or under-estimate a population parameter. (; a feature of a statistical technique or of its results whereby the expected value of the results differs from the true underlying quantitative parameter being estimated (; mean of the difference of an over-estimation or under-estimation (
  29. Big Data – a field that treats ways to analyze, systematically extract information from, or otherwise deal with data sets that are too large or complex to be dealt with by traditional data-processing application software. Data with many cases (rows) offer greater statistical power, while data with higher complexity (more attributes or columns) may lead to a higher false discovery rate. (
  30. Bimodial Distribution – a continuous probability distribution with two different modes. These appear as distinct peaks (local maxima) in the probability density function (
  31. Binary Attributes – a nominal attribute with only two categories or states: 0 or 1, where 0 typically means that the attribute is absent, and 1 means that it is present. Binary attributes are referred to as Boolean if the two states correspond to true and false. (
  32. Binary Distribution (Bernoulli Distribution) – the discrete probability distribution of a random variable which takes the value 1 with probability p and the value 0 with probability q = 1 – p that is, the probability distribution of any single experiment that asks a yes–no question; the question results in a boolean-valued outcome, a single bit whose value is success/yes/true/one with probability p and failure/no/false/zero with probability q. (
  33. Binary Integer Programming – when a branch and bound problem is customized for a situation where all of the variables are binary (
  34. Bootstrapping – any test or metric that relies on random sampling with replacement. Bootstrapping allows assigning measures of accuracy (defined in terms of bias, variance, confidence intervals, prediction error or some other such measure) to sample estimates. (
  35. Bounded Solution – a function f defined on some set X with real or complex values is called bounded if the set of its values is bounded. In other words, there exists a real number M such that|f(x)| Greater than/equal to M for all x in X (
  36. Box-Jenkins Method – refers to a systematic method of identifying, fitting, checking, and using integrated autoregressive, moving average (ARIMA) time series models. The method is appropriate for time series of medium to long length (at least 50 observations). (
  37. Branch and Bound – an algorithm design paradigm for discrete and combinatorial optimization problems, as well as mathematical optimization. A branch-and-bound algorithm consists of a systematic enumeration of candidate solutions by means of state space search: the set of candidate solutions is thought of as forming a rooted tree with the full set at the root. The algorithm explores branches of this tree, which represent subsets of the solution set. Before enumerating the candidate solutions of a branch, the branch is checked against upper and lower estimated bounds on the optimal solution, and is discarded if it cannot produce a better solution than the best one found so far by the algorithm. (
  38. Business Analytics – the skills, technologies, practices for continuous iterative exploration and investigation of past business performance to gain insight and drive business planning. Business analytics focuses on developing new insights and understanding of business performance based on data and statistical methods (
  39. Business Case – captures the reasoning for initiating a project or task (
  40. Business Intelligence – comprises the strategies and technologies used by enterprises for the data analysis of business information. BI technologies provide historical, current and predictive views of business operations. (
  41. 5 Algorithm – an algorithm used to generate a decision tree. The decision trees generated by C4.5 can be used for classification, and for this reason, C4.5 is often referred to as a statistical classifier. (
  42. Canopy Clustering – an unsupervised pre-clustering algorithm that is often used as preprocessing step for the K-means algorithm or the Hierarchical clustering algorithm. It is intended to speed up clustering operations on large data sets, where using another algorithm directly may be impractical due to the size of the data set. (
  43. Classification And Regression Tree (CART) – a term used to describe decision tree algorithms that are used for classification and regression learning tasks (
  44. C-Chart – a type of control chart used to monitor “count”-type data, typically total number of nonconformities per unit. It is also occasionally used to monitor the total number of events occurring in a given unit of time. (
  45. Central Limit Theorem – establishes that, in some situations, when independent random variables are added, their properly normalized sum tends toward a normal distribution (informally a “bell curve”) even if the original variables themselves are not normally distributed. The theorem is a key concept in probability theory because it implies that probabilistic and statistical methods that work for normal distributions can be applicable to many problems involving other types of distributions. (
  46. Centroid–Based Methods – clusters are represented by a central vector, which may not necessarily be a member of the data set. When the number of clusters is fixed to k, k-means clustering gives a formal definition as an optimization problem: find the k cluster centers and assign the objects to the nearest cluster center, such that the squared distances from the cluster are minimized. (
  47. Chief Analytics Officer – possible title of one overseeing analytics for a company; (CAO) may include mobilizing data, people, and systems for successful deployment, working with others to inject analytics into company strategy and decisions, supervising activities of analytical people, consulting with internal business functions and units so they may take advantage of analytics, contracting with external providers of analytics (Davenport, Enterprise Analytics, p. 173)
  48. Chi-Square Distribution – used in the common chi-square tests for goodness of fit of an observed distribution to a theoretical one, the independence of two criteria of classification of qualitative data, and in confidence interval estimation for a population standard deviation of a normal distribution from a sample standard deviation. (
  49. Chi-Square Test (x2 test) – a statistical measure of difference used to compare the observed and estimated covariance matrices. It is the only measure that has a direct statistical test as to its significance, and it forms the basis for many other goodness-of-fit measures.
  50. Chi-squared Automated Interaction Detection (CHAID) – a decision tree technique, based on adjusted significance testing (Bonferroni testing); often used in the context of direct marketing to select groups of consumers and predict how their responses to some variables affect other variables (
  51. Classification – a form of data analysis that extracts models describing data classes. A classifier, or classification model, predicts categorical labels (classes). Numeric prediction models continuous valued functions. Classification and numeric prediction are the two major types of prediction problems. (
  52. Cluster Sampling – a sampling plan used when mutually homogeneous yet internally heterogeneous groupings are evident in a statistical population. It is often used in marketing research. In this sampling plan, the total population is divided into these groups (known as clusters) and a simple random sample of the groups is selected. The elements in each cluster are then sampled. If all elements in each sampled cluster are sampled, then this is referred to as a “one-stage” cluster sampling plan. If a simple random subsample of elements is selected within each of these groups, this is referred to as a “two-stage” cluster sampling plan. A common motivation for cluster sampling is to reduce the total number of interviews and costs given the desired accuracy. For a fixed sample size, the expected random error is smaller when most of the variation in the population is present internally within the groups, and not between the groups. (
  53. Clustering – grouping of a set of objects in such a way that objects in the same group (cluster) are more similar to each other than to those in other groups or clusters (
  54. Coefficient of Variation – a standardizedmeasure of dispersion of a probability distribution or frequency distribution. It is often expressed as a percentage, and is defined as the ratio of the standard deviation {\displaystyle \ \sigma } to the mean {\displaystyle \ \mu } (or its absolute value) (
  55. Coefficient of Determination (or R-squared (R2)) – a statistical measure that represents the proportion of the variance for a dependent variable that’s explained by an independent variable or variables in a regression model. Whereas correlation explains the strength of the relationship between an independent and dependent variable, R-squared explains to what extent the variance of one variable explains the variance of the second variable. So, if the R2 of a model is 0.50, then approximately half of the observed variation can be explained by the model’s inputs. In investing, R-squared is generally interpreted as the percentage of a fund or security’s movements that can be explained by movements in a benchmark index. For example, an R-squared for a fixed-income security versus a bond index identifies the security’s proportion of price movement that is predictable based on a price movement of the index. The same can be applied to a stock versus the S&P 500 index, or any other relevant index. (
  56. Collinearity (or Multicollinearity) – the expression of the relationship between two (collinearity) or more (multicollinearity) independent variables. Two independent variables are said to exhibit complete collinearity if their correlation coefficient is 1, and complete lack of collinearity if their correlation coefficient is 0. Multicollinearity occurs when any single independent variable is highly correlated with a set of other independent variables. An extreme case of collinearity/multicollinearity is singularity, in which an independent variable is perfectly predicted (i.e., correlation of 1.0) by another independent variable (or more than one). (
  57. Combinatorial Optimization – a topic that consists of finding an optimal object from a finite set of objects. In many such problems, exhaustive search is not tractable. It operates on the domain of those optimization problems in which the set of feasible solutions is discrete or can be reduced to discrete, and in which the goal is to find the best solution. Some common problems involving combinatorial optimization are the travelling salesman problem (“TSP”), the minimum spanning tree problem (“MST”), and the knapsack problem. (
  58. Concordance Correlation Coefficient – measures the agreement between two variables, e.g., to evaluate reproducibility or for inter-rater reliability. (
  59. Confidence Interval – a type of estimate computed from the statistics of the observed data. This proposes a range of plausible values for an unknown parameter. The interval has an associated confidence level that the true parameter is in the proposed range. This is more clearly stated as: the confidence level represents the probability that the unknown parameter lies in the stated interval. The level of confidence can be chosen by the investigator. In general terms, a confidence interval for an unknown parameter is based on sampling the distribution of a corresponding estimator. (
  60. Confidence Level – the probability that the value of a parameter falls within a specified range of values. (
  61. Confirmatory Factor Analysis – a special form of factor analysis, most commonly used in social research. It is used to test whether measures of a construct are consistent with a researcher’s understanding of the nature of that construct (or factor). As such, the objective of confirmatory factor analysis is to test whether the data fit a hypothesized measurement model. This hypothesized model is based on theory and/or previous analytic research. (
  62. Confusion Matrix – used to evaluate a classifier’s quality. For a two-class problem, it shows the true positives, true negatives, false positives, and false negatives. Measures that assess a classifier’s predictive ability include accuracy, sensitivity (also known as recall), specificity, and precision. Reliance on the accuracy measure can be deceiving when the main class of interest is in the minority. ((
  63. Conjoint Analysis – a survey-based statistical technique used in market research that helps determine how people value different attributes (feature, function, benefits) that make up an individual product or service. The objective of conjoint analysis is to determine what combination of a limited number of attributes is most influential on respondent choice or decision making. (
  64. Consistent Estimators – an estimator—a rule for computing estimates of a parameter θ0—having the property that as the number of data points used increases indefinitely, the resulting sequence of estimates converges in probability to θ0. This means that the distributions of the estimates become more and more concentrated near the true value of the parameter being estimated, so that the probability of the estimator being arbitrarily close to θ0 converges to one. (
  65. Constant Failure Rate – a concept stating that failure rates of electronic components remain constant during the useful life of the component (
  66. Constraint – a condition of an optimization problem that the solution must satisfy. There are several types of constraints—primarily equality constraints, inequality constraints, and integer constraints. The set of candidate solutions that satisfy all constraints is called the feasible set. (
  67. Control Chart – a graph used to study how a process changes over time. Data are plotted in time order. A control chart always has a central line for the average, an upper line for the upper control limit, and a lower line for the lower control limit. These lines are determined from historical data. By comparing current data to these lines, you can draw conclusions about whether the process variation is consistent (in control) or is unpredictable (out of control, affected by special causes of variation). (
  68. Control Limit – horizontal lines drawn on a statistical process control chart, usually at a distance of ±3 standard deviations of the plotted statistic from the statistic’s mean. (
  69. Convenience Sampling – a type of non-probability sampling method where the sample is taken from a group of people easy to contact or to reach. For example, standing at a mall or a grocery store and asking people to answer questions would be an example of a convenience sample. (
  70. Coordinate Transforms – refers to the rotation of coordinate axes and the modification of components in a coordinate system (
  71. Correlation – any statistical relationship, whether causal or not, between two random variables or bivariate data. In the broadest sense correlation is any statistical association, though it commonly refers to the degree to which a pair of variables are linearly related. Familiar examples of dependent phenomena include the correlation between the physical statures of parents and their offspring, and the correlation between the demand for a limited supply product and its price. (
  72. Correlation Coefficient (r) – a coefficient that indicates the strength of the association between any two metric variables. The sign (+ or -) indicates the direction of the relationship. The value can range from +1 to -1, with +1 indicating a perfect positive relationship, O indicating no relationship, and -1 indicating a perfect negative or reverse relationship (as one variable grows larger, the other variable grows smaller). (
  73. Cost of Capital – the cost of a company’s funds (both debt and equity), or, from an investor’s point of view “the required rate of return on a portfolio company’s existing securities”. It is used to evaluate new projects of a company. It is the minimum return that investors expect for providing capital to the company, thus setting a benchmark that a new project has to meet. (
  74. Covariance – a measure of the joint variability of two random variables. If the greater values of one variable mainly correspond with the greater values of the other variable, and the same holds for the lesser values, (i.e., the variables tend to show similar behavior), the covariance is positive. In the opposite case, when the greater values of one variable mainly correspond to the lesser values of the other, (i.e., the variables tend to show opposite behavior), the covariance is negative. The sign of the covariance therefore shows the tendency in the linear relationship between the variables. The magnitude of the covariance is not easy to interpret because it is not normalized and hence depends on the magnitudes of the variables. The normalized version of the covariance, the correlation coefficient, however, shows by its magnitude the strength of the linear relation. (
  75. Criterion of Realism – decision rule is an attempt to make a tradeoff between complete risk indifference (as in the Maximax rule), and total risk aversion (as in the Maximin rule). With this procedure, the decision maker will decide how much emphasis to put on each extreme. (
  76. Critical Path Method (CPM) – an algorithm for scheduling a set of project activities. It is commonly used in conjunction with the program evaluation and review technique (PERT). A critical path is determined by identifying the longest stretch of dependent activities and measuring the time required to complete them from start to finish. (
  77. Cronbach Alpha – a measure of internal consistency, that is, how closely related a set of items are as a group. It is considered to be a measure of scale reliability. (
  78. Cross-Correlation – refers to the correlations between the entries of two random vectors X and Y (
  79. Cross Tabulation and Contingency Table – a type of table in a matrix format that displays the (multivariate) frequency distribution of the variables. They are heavily used in survey research, business intelligence, engineering and scientific research. They provide a basic picture of the interrelation between two variables and can help find interactions between them.(
  80. Cumulative Density Function – a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x (
  81. Cutting Stock Problem – the problem of cutting standard-sized pieces of stock material, such as paper rolls or sheet metal, into pieces of specified sizes while minimizing material wasted. It is an optimization problem in mathematics that arises from applications in industry. In terms of computational complexity, the problem is an NP-hard problem reducible to the knapsack problem. The problem can be formulated as an integer linear programming problem. (
  82. Cyclic Component – The cyclical component of a time series refers to (regular or periodic) fluctuations around the trend, excluding the irregular component, revealing a succession of phases of expansion and contraction.; The cyclical component can be viewed as those fluctuations in a time series which are longer than a given threshold, e.g. 1½ years, but shorter than those attributed to the trend. (
  83. Data Cube – a multi-dimensional (“n-D”) array of values. Typically, the term datacube is applied in contexts where these arrays are massively larger than the hosting computer’s main memory; examples include multi-terabyte/petabyte data warehouses and time series of image data. (
  84. Data Cleansing – the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database; usually performed as an iterative two-step process consisting of discrepancy detection and data transformation.; may also involve harmonization of data, and standardization of data (
  85. Data Envelopment Analysis – a nonparametric method in operations research and economics for the estimation of production frontiers. It is used to empirically measure productive efficiency of decision making units (DMUs). (
  86. Data Integration – combines data from multiple sources to form a coherent data store. The resolution of semantic heterogeneity, metadata, correlation analysis, tuple duplication detection, and data conflict detection contribute to smooth data integration. (
  87. Data Mining – the process of discovering interesting patterns from massive amounts of data. As a knowledge discovery process, it typically involves data cleaning, data integration, data selection, data transformation, pattern discovery, pattern evaluation, and knowledge presentation. (
  88. Data Reduction – obtain a reduced representation of the data while minimizing the loss of information content. These include methods of dimensionality reduction, numerosity reduction, and data compression. Dimensionality reduction reduces the number of random variables or attributes under consideration. Methods include wavelet transforms, principal components analysis, attribute subset selection, and attribute creation. Numerosity reduction methods use parametric or non-parametric models to obtain smaller representations of the original data. Parametric models store only the model parameters instead of the actual data. Examples include regression and log-linear models. Non-parametric methods include histograms, clustering, sampling, and data cube aggregation. Data compression methods apply transformations to obtain a reduced or “compressed” representation of the original data. The data reduction is lossless if the original data can be reconstructed from the compressed data without any loss of information; otherwise, it is lossy. (
  89. Data Transformation – convert the data into appropriate forms for mining. For example, in normalization, attribute data are scaled so as to fall within a small range such as 0.0 to 1.0. Other examples are data discretization and concept hierarchy generation. Data visualization techniques may be pixel-oriented, geometric-based, icon-based, or hierarchical. These methods apply to multidimensional relational data. (
  90. Data Warehouse – central repositories of integrated data from one or more disparate sources. They store current and historical data in one single place that are used for creating analytical reports for workers throughout the enterprise. Data warehouse systems provide multidimensional data analysis capabilities, collectively referred to as online analytical processing. ( (
  91. Database Normalization – the process of structuring a relational database[clarification needed] in accordance with a series of so-called normal forms in order to reduce data redundancy and improve data integrity (
  92. Data Integrity – the maintenance of, and the assurance of the accuracy and consistency of data over its entire life-cycle, and is a critical aspect to the design, implementation and usage of any system which stores, processes, or retrieves data. (
  93. Decision Tree – graphic illustration of how data leads to decision when branches of the tree are followed to their conclusion; different branches may lead to different decisions (
  94. Decision Tree Induction – a top-down recursive tree induction algorithm, which uses an attribute selection measure to select the attribute tested for each non-leaf node in the tree. ID3, C4.5, and CART are examples of such algorithms using different attribute selection measures. Tree pruning algorithms attempt to improve accuracy by removing tree branches reflecting noise in the data. Early decision tree algorithms typically assume that the data are memory resident. Several scalable algorithms, such as Rain Forest, have been proposed for scalable tree induction. (
  95. Decision Variables – a decision variable represents a problem entity for which a choice must be made. For instance, a decision variable might represent the position of a queen on a chessboard, for which there are 100 different possibilities (choices) on a 10×10 chessboard or the start time of an activity in a scheduling problem. Each possible choice is represented by a value, hence the set of possible choices constitutes the domain that is associated with a variable (A. Holder, editor. Mathematical Programming Glossary. INFORMS Computing Society,, 2006-08. Originally authored by Harvey J. Greenberg, 1999-2006.)
  96. Delphi Method – also known as Estimate-Talk-Estimate [ETE]) is a structured communication technique or method, originally developed as a systematic, interactive forecasting method which relies on a panel of experts. The technique can also be adapted for use in face-to-face meetings, and is then called mini-Delphi or Estimate-Talk-Estimate (ETE). Delphi has been widely used for business forecasting and has certain advantages over another structured forecasting approach, prediction markets. (
  97. Dendrogram – a diagram that shows the hierarchical relationship between objects. It is most commonly created as an output from hierarchical clustering. The main use of a dendrogram is to work out the best way to allocate objects to clusters. (
  98. Density-based Cluster method (DBSCAN) – clusters objects based on the notion of density. It grows clusters either according to the density of neighborhood objects (e.g., in DBSCAN) or according to a density function (e.g., in DENCLUE). OPTICS is a density-based method that generates an augmented ordering of the data’s clustering structure. (
  99. Descriptive Analytics – the interpretation of historical data to better understand changes that have occurred in a business. Descriptive analytics describes the use of a range of historic data to draw comparisons. Most commonly reported financial metrics are a product of descriptive analytics—for example, year-over-year pricing changes, month-over-month sales growth, the number of users, or the total revenue per subscriber. These measures all describe what has occurred in a business during a set period. (
  100. Design of Experiments – a systematic method to determine the relationship between factors affecting a process and the output of that process. In other words, it is used to find cause-and-effect relationships. This information is needed to manage process inputs in order to optimize the output. (
  101. Deterministic Algorithm – an algorithm which, given a particular input, will always produce the same output, with the underlying machine always passing through the same sequence of states. Deterministic algorithms are by far the most studied and familiar kind of algorithm, as well as one of the most practical, since they can be run on real machines efficiently. (
  102. Discrete Event Simulation (DES) – models the operation of a system as a (discrete) sequence of events in time. Each event occurs at a particular instant in time and marks a change of state in the system. Between consecutive events, no change in the system is assumed to occur; thus the simulation time can directly jump to the occurrence time of the next event, which is called next-event time progression. In addition to next-event time progression, there is also an alternative approach, called fixed-increment time progression, where time is broken up into small time slices and the system state is updated according to the set of events/activities happening in the time slice. Because not every time slice has to be simulated, a next-event time simulation can typically run much faster than a corresponding fixed-increment time simulation. (
  103. Discrete Wavelet Transforms – any wavelet transform for which the wavelets are discretely sampled. As with other wavelet transforms, a key advantage it has over Fourier transforms is temporal resolution: it captures both frequency and location information (location in time). (
  104. Double Exponential Smoothing– Simple exponential smoothing does not do well when there is a trend in the data, which is inconvenient. In such situations, several methods were devised under the name “double exponential smoothing” or “second-order exponential smoothing.”, which is the recursive application of an exponential filter twice, thus being termed “double exponential smoothing”. (
  105. Durbin Watson (DW) Statistic – a test for autocorrelation in the residuals from a statistical regression analysis. The Durbin-Watson statistic will always have a value between 0 and 4. A value of 2.0 means that there is no autocorrelation detected in the sample. Values from 0 to less than 2 indicate positive autocorrelation and values from from 2 to 4 indicate negative autocorrelation. (
  106. Dynamic Programming – a method for solving a complex problem by breaking it down into a collection of simpler subproblems, solving each of those subproblems just once, and storing their solutions using a memory-based data structure (array, map,etc). Each of the subproblem solutions is indexed in some way, typically based on the values of its input parameters, so as to facilitate its lookup. So the next time the same subproblem occurs, instead of recomputing its solution, one simply looks up the previously computed solution, thereby saving computation time. This technique of storing solutions to subproblems instead of recomputing them is called memoization. (
  107. Eager Learning – a learning method in which the system tries to construct a general, input-independent target function during training of the system, as opposed to lazy learning, where generalization beyond the training data is delayed until a query is made to the system.; Examples: Decision tree classifiers, Bayesian classifiers, classification by backpropagation, support vector machines, and classification based on frequent patterns; The main disadvantage with eager learning is that it is generally unable to provide good local approximations in the target function. (
  108. Economic Analysis – involves assessing or examining topics or issues from an economist’s perspective. Economic analysis is the study of economic systems. It may also be a study of a production process or an industry. The analysis aims to determine how effectively the economy or something within it is operating. For example, an economic analysis of a company focuses mainly on how much profit it is making. Economists say that economic analysis is a systematic approach to find out what the optimum use of scarce resources is. (
  109. Economic Order Quantity – the ideal order quantity a company should purchase to minimize inventory costs such as holding costs, shortage costs, and order costs. The formula assumes that demand, ordering, and holding costs all remain constant. (
  110. Effective Domain – the domain of a function for which its value is finite (A. Holder, editor. Mathematical Programming Glossary.INFORMS Computing Society,, 2006-08. Originally authored by Harvey J. Greenberg, 1999-2006.)
  111. Efficiency – the comparison of what is actually produced or performed with what can be achieved with the same consumption of resources (money, time, labor, etc.). It is an important factor in determination of productivity (
  112. Efficient Estimators – an estimator that estimates the quantity of interest in some “best possible” manner. The notion of “best possible” relies upon the choice of a particular loss function — the function which quantifies the relative degree of undesirability of estimation errors of different magnitudes. The most common choice of the loss function is quadratic, resulting in the mean squared error criterion of optimality. (
  113. Efficient Frontier – the set of optimal portfolios that offer the highest expected return for a defined level of risk or the lowest risk for a given level of expected return. Portfolios that lie below the efficient frontier are sub-optimal because they do not provide enough return for the level of risk. Portfolios that cluster to the right of the efficient frontier are sub-optimal because they have a higher level of risk for the defined rate of return. (
  114. Engagement – an estimate of the depth of visitor interaction against a clearly defined set of goals; may be measured through analytical models (Davenport, Enterprise Analytics, p.73-74)
  115. Ensemble Methods – used to increase overall accuracy by learning and combining a series of individual (base) classifier models. Bagging, boosting, and random forests are popular ensemble methods. (
  116. Enterprise Resource Planning (ERP) – business process management software that allows an organization to use a system of integrated applications to manage the business and automate many back office functions related to technology, services and human resources. (
  117. ETL (Extract, Transform, Load) – refers to three separate functions combined into a single programming tool. First, the extract function reads data from a specified source database and extracts a desired subset of data. Next, the transform function works with the acquired data—using rules or lookup tables, or creating combinations with other data—to convert it to the desired state. Finally, the load function is used to write the resulting data (either all of the subset or just the changes) to a target database, which may or may not previously exist (http://searchdatamanagement.
  118. Evolutionary Algorithms – a subset of evolutionary computation, a generic population-based metaheuristic optimization algorithm. An EA uses mechanisms inspired by biological evolution, such as reproduction, mutation, recombination, and selection. Candidate solutions to the optimization problem play the role of individuals in a population, and the fitness function determines the quality of the solutions (see also loss function). Evolution of the population then takes place after the repeated application of the above operators. (
  119. Expected Value (EV) – an anticipated value for an investment at some point in the future. In statistics and probability analysis, the expected value is calculated by multiplying each of the possible outcomes by the likelihood each outcome will occur and then summing all of those values. By calculating expected values, investors can choose the scenario most likely to give the desired outcome. (
  120. Experimental Design – in quality management, a written plan that describes the specifics for conducting an experiment, such as which conditions, factors, responses, tools, and treatments are to be included or used; see also, Design of experiments (
  121. Expert Systems – a computer system that emulates the decision-making ability of a human expert. Expert systems are designed to solve complex problems by reasoning through bodies of knowledge, represented mainly as if–then rules rather than through conventional procedural code. (
  122. Exploratory Data Analysis – an approach to analyzing data sets to summarize their main characteristics, often with visual methods. A statistical model can be used or not, but primarily EDA is for seeing what the data can tell us beyond the formal modeling or hypothesis testing task. (
  123. Exponential Distribution – the probability distribution of the time between events in a Poisson point process, i.e., a process in which events occur continuously and independently at a constant average rate. It is a particular case of the gamma distribution. It is the continuous analogue of the geometric distribution, and it has the key property of being memoryless. In addition to being used for the analysis of Poisson point processes it is found in various other contexts. (
  124. Exponential Smoothing – a rule of thumb technique for smoothing time series data using the exponential window function. Whereas in the simple moving average the past observations are weighted equally, exponential functions are used to assign exponentially decreasing weights over time. It is an easily learned and easily applied procedure for making some determination based on prior assumptions by the user, such as seasonality. Exponential smoothing is often used for analysis of time-series data. (
  125. F-Test – any statistical test in which the test statistic has an F-distribution under the null hypothesis. It is most often used when comparing statistical models that have been fitted to a data set, in order to identify the model that best fits the population from which the data were sampled. Exact “F-tests” mainly arise when the models have been fitted to the data using least squares. (
  126. F- Distribution – the probability distribution associated with the f statistic. (
  127. F-Statistic – a value you get when you run an ANOVA test or a regression analysis to find out if the means between two populations are significantly different. It’s similar to a T statistic from a T-Test; A-T test will tell you if a single variable is statistically significant and an F test will tell you if a group of variables are jointly significant. (
  128. F-Test – any statistical test in which the test statistic has an F-distribution under the null hypothesis. It is most often used when comparing statistical models that have been fitted to a data set, in order to identify the model that best fits the population from which the data were sampled. Exact “F-tests” mainly arise when the models have been fitted to the data using least squares. (
  129. Factor Analysis – a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. For example, it is possible that variations in six observed variables mainly reflect the variations in two unobserved (underlying) variables. Factor analysis searches for such joint variations in response to unobserved latent variables. The observed variables are modelled as linear combinations of the potential factors, plus “error” terms. Factor analysis aims to find independent latent variables. (
  130. Failure Mode and Effects Analysis (FMEA) – a systematic, proactive method for evaluating a process to identify where and how it might fail and to assess the relative impact of different failures, in order to identify the parts of the process that are most in need of change. (
  131. Fast Fourier Transforms – an algorithm that computes the discrete Fourier transform (DFT) of a sequence, or its inverse (IDFT). Fourier analysis converts a signal from its original domain (often time or space) to a representation in the frequency domain and vice versa. The DFT is obtained by decomposing a sequence of values into components of different frequencies. (
  132. Fixed Cost – a cost that does not change with an increase or decrease in the amount of goods or services produced or sold. Fixed costs are expenses that have to be paid by a company, independent of any specific business activities. (
  133. Future Value (FV) – the value of a current asset at a future date based on an assumed rate of growth. The future value (FV) is important to investors and financial planners as they use it to estimate how much an investment made today will be worth in the future. (
  134. Fuzzy C-Means Clustering – a form of clustering in which each data point can belong to more than one cluster. (
  135. Fuzzy Logic – a form of many-valued logic in which the truth values of variables may be any real number between 0 and 1 both inclusive. It is employed to handle the concept of partial truth, where the truth value may range between completely true and completely false. By contrast, in Boolean logic, the truth values of variables may only be the integer values 0 or 1. (
  136. Game Theory – in general, a (mathematical) game can be played by one player, such as a puzzle, but its main connection with mathematical programming is when there are at least two players, and they are in conflict. Each player chooses a strategy that maximizes his payoff. When there are exactly two players and one player’s loss is the other’s gain, the game is called zero sum. In this case, a payoff matrix A is given where Aij is the payoff to player 1, and the loss to player 2, when player 1 uses strategy i and player 2 uses strategy j. In this representation each row of A corresponds to a strategy of player 1, and each column corresponds to a strategy of player 2. If A is m × n, this means player 1 has m strategies, and player 2 has n strategies (A. Holder, editor. Mathematical Programming Glossary. INFORMS Computing Society,, 2006-08. Originally authored by Harvey J. Greenberg, 1999-2006.)
  137. Gamma Distribution – a two-parameter family of continuous probability distributions. The exponential distribution, Erlang distribution, and chi-squared distribution are special cases of the gamma distribution. (
  138. Gaussian Mixture Models – a probabilistic model for representing normally distributed subpopulations within an overall population. Mixture models in general don’t require knowing which subpopulation a data point belongs to, allowing the model to learn the subpopulations automatically. Since subpopulation assignment is not known, this constitutes a form of unsupervised learning. (
  139. Generalized Linear Regression Model (GLM) – a flexible generalization of ordinary linear regression that allows for dependent variables that have error distribution models other than a normal distribution. The GLM generalizes linear regression by allowing the linear model to be related to the dependent variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value. (
  140. Genetic Algorithms – a metaheuristic inspired by the process of natural selection that belongs to the larger class of evolutionary algorithms (EA). Genetic algorithms are commonly used to generate high-quality solutions to optimization and search problems by relying on bio-inspired operators such as mutation, crossover and selection. (
  141. Geometric Data – comprises geometric aspects of image analysis, pattern analysis and shape analysis or the approach of multivariate statistics that treats arbitrary data sets as clouds of points in n-dimensional space. This includes topological data analysis, cluster analysis, inductive data analysis, correspondence analysis, multiple correspondence analysis, principal components analysis and fr:Iconographie des corrélations. (
  142. Global Optimal – refers to mathematical programming without convexity assumptions, which are NP-hard. In general, there could be a local optimum that is not a global optimum. Some authors use this term to imply the stronger condition there are multiple local optima. Some solution strategies are given as heuristic search methods (including those that guarantee global convergence, such as branch and bound). As a process associated with algorithm design, some regard this simply as attempts to assure convergence to a global optimum (unlike a purely local optimization procedure, like steepest ascent). (A. Holder, editor. Mathematical Programming Glossary. INFORMS Computing Society,, 2006-08. Originally authored by Harvey J. Greenberg, 1999-2006. See the supplement by J.D. Pintér.)
  143. Goodness of Fit – a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, e.g. to test for normality of residuals, to test whether two samples are drawn from identical distributions (see Kolmogorov–Smirnov test), or whether outcome frequencies follow a specified distribution (see Pearson’s chi-squared test). (
  144. Greedy Heuristics – an algorithm that follows the problem-solving heuristic of making the locally-optimal choice at each stage with the hope of finding a global optimum (
  145. Grid-based Cluster Method – first quantizes the object space into a finite number of cells that form a grid structure, and then performs clustering on the grid structure. STING is a typical example of a grid-based method based on statistical information stored in grid cells. CLIQUE is a grid-based and subspace clustering algorithm. ;differs from the conventional clustering algorithms in that it is concerned not with the data points but with the value space that surrounds the data points ( (
  146. Hadoop – a collection of open-source software utilities that facilitate using a network of many computers to solve problems involving massive amounts of data and computation. It provides a software framework for distributed storage and processing of big data using the MapReduce programming model. (
  147. Heteroscedasticity – refers to the circumstance in which the variability of a variable is unequal across the range of values of a second variable that predicts it. A scatterplot of these variables will often create a cone-like shape, as the scatter (or variability) of the dependent variable (DV) widens or narrows as the value of the independent variable (IV) increases. (
  148. Heuristic – a technique designed for solving a problem more quickly when classic methods are too slow, or for finding an approximate solution when classic methods fail to find any exact solution. This is achieved by trading optimality, completeness, accuracy, or precision for speed. In a way, it can be considered a shortcut. (
  149. Hidden Markov Models – a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobservable (i.e. hidden) states. The hidden Markov model can be represented as the simplest dynamic Bayesian network. (
  150. Hierarchical Clustering – creates a hierarchical decomposition of the given set of data objects. The method can be classified as being either agglomerative (bottom-up) or divisive (top-down), based on how the hierarchical decomposition is formed. To compensate for the rigidity of merge or split, the quality of hierarchical agglomeration can be improved by analyzing object linkages at each hierarchical partitioning (e.g., in Chameleon), or by first performing micro-clustering (that is, grouping objects into “micro-clusters”) and then operating on the micro-clusters with other clustering techniques such as iterative relocation (as in BIRCH) (
  151. Histogram – an accurate representation of the distribution of numerical data. It is an estimate of the probability distribution of a continuous variable (
  152. Homoscedasticity – a sequence (or a vector) of random variables is homoscedastic if all of its random variables have the same finite variance (
  153. Hotelling’s T-squared Distribution – a multivariate distribution proportional to the F-distribution and arises importantly as the distribution of a set of statistics which are natural generalizations of the statistics underlying Student’s t-distribution. Hotelling’s t-squared statistic (t2) is a generalization of Student’s t-statistic that is used in multivariate hypothesis testing. (
  154. Hypothesis Testing – the theory, methods, and practice of testing a hypothesis by comparing it with the null hypothesis. The null hypothesis is only rejected if its probability falls below a predetermined significance level, in which case the hypothesis being tested is said to have that level of significance (
  155. ID3 Algorithm – used to generate a decision tree from a dataset. ID3 is the precursor to the C4.5 algorithm, and is typically used in the machine learning and natural language processing domains. (
  156. Imputation – the process of replacing missing data with substituted values. When substituting for a data point, it is known as “unit imputation”; when substituting for a component of a data point, it is known as “item imputation”. (
  157. Paired and Independent T-Tests – Paired-samples t tests compare scores on two different variables but for the same group of cases; independent-samples t tests compare scores on the same variable but for two different groups of cases. (
  158. Indifference Curve – connects points on a graph representing different quantities of two goods, points between which a consumer is indifferent. That is, any combinations of two products indicated by the curve will provide the consumer with equal levels of utility, and the consumer has no preference for one combination or bundle of goods over a different combination on the same curve. One can also refer to each point on the indifference curve as rendering the same level of utility (satisfaction) for the consumer. In other words, an indifference curve is the locus of various points showing different combinations of two goods providing equal utility to the consumer. Utility is then a device to represent preferences rather than something from which preferences come. The main use of indifference curves is in the representation of potentially observable demand patterns for individual consumers over commodity bundles. (
  159. Influence Diagram – a compact graphical and mathematical representation of a decision situation. It is a generalization of a Bayesian network, in which not only probabilistic inference problems but also decision making problems (following the maximum expected utility criterion) can be modeled and solved. (
  160. INFORMS – the largest professional society in the world for professionals in the field of operations research (OR), management science, and analytics (
  161. Innovative Applications in Analytics Award – award administered by the Analytics Section of INFORMS to recognize creative and unique developments, applications, or combinations of analytical techniques. The prize promotes the awareness of the value of analytics techniques in unusual applications, or in creative combination to provide unique insights and/or business value (
  162. Integer Programming – a mathematical optimization or feasibility program in which some or all of the variables are restricted to be integers. In many settings the term refers to integer linear programming (ILP), in which the objective function and the constraints (other than the integer constraints) are linear. (
  163. Internal Rate of Return (IRR) – the rate of growth that a project or investment is expected to create, expressed as a percentage, over a specified term. IRR is, in essence, the theoretical interest rate earned by the project (
  164. Inverse Proportionality – the relationship between two variables when their product is equal to a constant value. When the value of one variable increases, the other decreases, so their product is unchanged. (
  165. Interval type of Data – a data type which is measured along a scale, in which each point is placed at equal distance from one another. Interval data always appears in the form of numbers or numerical values where the distance between the two points is standardized and equal. (
  166. Interval-Level of Measurement – not only classifies and orders the measurements, but it also specifies that the distances between each interval on the scale are equivalent along the scale from low interval to high interval. (
  167. Irregular and Sparse Data Models – If no value exists for a given combination of dimension values, no row exists in the fact table. For example, if not every product is sold in every market. In this case, Market and Product are sparse dimensions. (
  168. Judgement Sampling – a type of random sample that is selected based on the opinion of an expert. Results obtained from a judgment sample are subject to some degree of bias, due to the frame and population not being identical. The frame is a list of all the units, items, people, etc., that define the population to be studied. Judgement sampling is noble to provide detailed information about the difficulties in obtaining the distinction. A random sample would provide less bias, but potentially less raw information. (
  169. Knowledge Discovery in Databases (KDD) – refers to the broad process of finding knowledge in data, and emphasizes the “high-level” application of particular data mining methods. It is of interest to researchers in machine learning, pattern recognition, databases, statistics, artificial intelligence, knowledge acquisition for expert systems, and data visualization. (
  170. K-Means Clustering – a method of vector quantization, originally from signal processing, that is popular for cluster analysis in data mining. k-means clustering aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean, serving as a prototype of the cluster. This results in a partitioning of the data space into Voronoi cells. (
  171. K-Medoids – a clustering algorithm reminiscent to the k-means algorithm. In contrast to the k-means algorithm, k-medoids chooses data points as centers (medoids or exemplars) and can be used with arbitrary distances, while in k-means the centre of a cluster is not necessarily one of the input data points (it is the average between the points in the cluster). (
  172. Knapsack Problem – a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible. It derives its name from the problem faced by someone who is constrained by a fixed-size knapsack and must fill it with the most valuable items. (
  173. Kruskal-Wallis (One-Way ANOVA on Ranks) – a non-parametric method for testing whether samples originate from the same distribution. It is used for comparing two or more independent samples of equal or different sample sizes. It extends the Mann–Whitney U test, which is used for comparing only two groups. The parametric equivalent of the Kruskal–Wallis test is the one-way analysis of variance (ANOVA). (
  174. Kurtosis – the measure of the peakedness or flatness of a distribution when compared with a normal distribution. A positive value indicates a relatively peaked distribution, and a negative value indicates a relatively flat distribution. (
  175. Lamda – In probability theory, lambda represents the density of occurrences within a time interval, as modeled by the Poisson distribution.; Lambda indicates an eigenvalue in the mathematics of linear algebra.; In mathematical logic and computer science, lambda is used to introduce anonymous functions expressed with the concepts of lambda calculus.; In evolutionary algorithms, λ indicates the number of offspring that would be generated from μ current population in each generation. (
  176. Lazy Learning – a learning method in which generalization of the training data is, in theory, delayed until a query is made to the system, as opposed to in eager learning, where the system tries to generalize the training data before receiving queries. Examples: K-nearest neighbors algorithms, recommendation systems, nearest-neighbor classifiers, etc.; lazy classifiers are most useful for large, continuously changing datasets with few attributes that are commonly queried. (
  177. Lead Time – time between the initial phase of a process and the emergence of results, as between the planning and completed manufacture of a product (
  178. Lean Production – a Japanese approach to management that focuses on cutting out waste while ensuring quality. This approach can be applied to all aspects of a business – from design through production to distribution (
  179. Lift – a measure of the performance of a targeting model (association rule) at predicting or classifying cases as having an enhanced response (with respect to the population as a whole), measured against a random choice targeting model. A targeting model is doing a good job if the response within the target is much better than the average for the population as a whole. Lift is simply the ratio of these values: target response divided by average response. (
  180. Lift Curve – a measure of the effectiveness of a predictive model calculated as the ratio between the results obtained with and without the predictive model; lift charts consisting of lift curve and a baseline are visuals aids for measuring model performance (
  181. Linear Discriminant Analysis – a method used in statistics, pattern recognition, and machine learning to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification. (
  182. Linear Estimators – A linear function of observable random variables, used (when the actual values of the observed variables are substituted into it) as an approximate value (estimate) of an unknown parameter of the stochastic model under analysis (see Statistical estimator). The special selection of the class of linear estimators is justified for the following reasons. Linear estimators lend themselves more easily to statistical analysis, in particular to the investigation of consistency, unbiasedness, efficiency, the construction of corresponding confidence intervals, etc. (
  183. Linear Programming – an optimization technique for a system of linear constraints and a linear objective function. An objective function defines the quantity to be optimized, and the goal of linear programming is to find the values of the variables that maximize or minimize the objective function. (
  184. Linear Regression – a linear approach to modeling the relationship between a scalar response (or dependent variable) and one or more explanatory variables (or independent variables). The case of one explanatory variable is called simple linear regression. For more than one explanatory variable, the process is called multiple linear regression. This term is distinct from multivariate linear regression, where multiple correlated dependent variables are predicted, rather than a single scalar variable. (
  185. Linearization – finds the linear approximation to a function at a given point. The linear approximation of a function is the first order Taylor expansion around the point of interest. In the study of dynamical systems, linearization is a method for assessing the local stability of an equilibrium point of a system of nonlinear differential equations or discrete dynamical systems. This method is used in fields such as engineering, physics, economics, and ecology. (
  186. Little’s Law – a theorem by John Little which states that the long-term average number L of customers in a stationary system is equal to the long-term average effective arrival rate λ multiplied by the average time W that a customer spends in the system. Expressed algebraically the law is L= λW (
  187. Little’s MCAR Test (Missing Completely At Random) – Tests the null hypothesis that the missing data is Missing Completely At Random (MCAR). A p.value of less than 0.05 is usually interpreted as being that the missing data is not MCAR (i.e., is either Missing At Random or non-ignorable). (
  188. Local Optimal – a solution that is optimal (either maximal or minimal) within a neighboring set of candidate solutions. This is in contrast to a global optimum, which is the optimal solution among all possible solutions, not just those in a particular neighborhood of values. (
  189. Logarithmic – the inverse function to exponentiation. That means the logarithm of a given number x is the exponent to which another fixed number, the base b, must be raised, to produce that number x. In the simplest case, the logarithm counts the number of occurrences of the same factor in repeated multiplication; e.g., since 1000 = 10 × 10 × 10 = 103, the “logarithm base 10” of 1000 is 3. (
  190. Logistic Curve – an S-shaped curve formed by the logit transformation that represents the probability of an event. The S-shaped form is nonlinear, because the probability of an event must approach O and 1, but never fall outside these limits. (; an S-shaped (sigmoidal) curve that can be used to model functions that increase gradually at first, more rapidly in the middle growth period, and slowly at the end, leveling off at a maximum value after some period of time. The initial part of the curve is exponential; the rate of growth accelerates as it approaches the midpoint of the curve. At the midpoint (K/2), the growth rate begins to decelerate but continues to grow until it reaches an asymptote, K which is called the “Carrying Capacity” for the environment. (
  191. Logistic Regression – used to model the probability of a certain class or event existing such as pass/fail, win/lose, alive/dead or healthy/sick. This can be extended to model several classes of events such as determining whether an image contains a cat, dog, lion, etc. Each object being detected in the image would be assigned a probability between 0 and 1 and the sum adding to one. (
  192. Machine Learning – the scientific study of algorithms and statistical models that computer systems use to perform a specific task without using explicit instructions, relying on patterns and inference instead. It is seen as a subset of artificial intelligence. Machine learning algorithms build a mathematical model based on sample data, known as “training data”, in order to make predictions or decisions without being explicitly programmed to perform the task. (
  193. MANCOVA (Multivariate Analysis of Covariance) – an extension of analysis of covariance (ANCOVA) methods to cover cases where there is more than one dependent variable and where the control of concomitant continuous independent variables – covariates – is required. The most prominent benefit of the MANCOVA design over the simple MANOVA is the ‘factoring out’ of noise or error that has been introduced by the covariant. A commonly used multivariate version of the ANOVA F-statistic is Wilks’ Lambda (Λ), which represents the ratio between the error variance (or covariance) and the effect variance (or covariance). (
  194. Mann-Whitney U Test – a nonparametric test of the null hypothesis that it is equally likely that a randomly selected value from one population will be less than or greater than a randomly selected value from a second population. This test can be used to investigate whether two independent samples were selected from populations having the same distribution. A similar nonparametric test used on dependent samples is the Wilcoxon signed-rank test. (
  195. MANOVA (Multivariate Analysis of Variance) – a procedure for comparing multivariate sample means. As a multivariate procedure, it is used when there are two or more dependent variables, and is often followed by significance tests involving individual dependent variables separately.
  196. Markov Chain – a stochastic model describing a sequence of possible events in which the probability of each event depends only on the state attained in the previous event. In continuous-time, it is known as a Markov process. (
  197. Maximax Strategy – a strategy in game theory where a player, facing uncertainty, makes a decision that yields the ‘best of the best’ outcome. All decisions will have costs and benefits, and a maximax strategy is one that seeks out where the greatest benefit can be found. The maximax theorem was first formulated in 1928 by John von Neumann. It is often referred to as an aggressive or optimistic strategy. (
  198. Maximin Strategy – A maximin strategy is a strategy in game theory where a player makes a decision that yields the ‘best of the worst’ outcome. All decisions will have costs and benefits, and a maximin strategy is one that seeks out the decision that yields the smallest loss. It is also referred to as a pessimistic or conservative strategy. (
  199. Mean – the arithmetic average of a set of values or distribution; however, for skewed distributions, the mean is not necessarily the same as the middle value (median), or the most likely (mode); see also, Average (
  200. Mean Absolute Deviation – the average of the absolute deviations or the positive difference of the given data and that certain value (generally central values). It is a summary statistic of statistical dispersion or variability. In the general form, the central point can be the mean, median, mode, or the result of any other measure of central tendency or any random data point related to the given data set. The absolute values of the difference, between the data points and their central tendency, are totaled and divided by the number of data points. (
  201. Mean Squared Error (MSE) – tells you how close a regression line is to a set of points. It does this by taking the distances from the points to the regression line (these distances are the “errors”) and squaring them. The squaring is necessary to remove any negative signs. It also gives more weight to larger differences. It’s called the mean squared error as you’re finding the average of a set of errors. (
  202. Mean Time Between Failures (MTBF) – the predicted elapsed time between inherent failures of a mechanical or electronic system, during normal system operation. MTBF can be calculated as the arithmetic mean (average) time between failures of a system. (
  203. Median – the value separating the higher half from the lower half of a data sample (a population or a probability distribution). For a data set, it may be thought of as the “middle” value. (
  204. Memoization – an optimization technique used primarily to speed up computer programs by storing the results of expensive function calls and returning the cached result when the same inputs occur again. Memoization has also been used in other contexts (and for purposes other than speed gains), such as in simple mutually recursive descent parsing. (
  205. Metaheuristics – a high-level problem-independent algorithmic framework that provides a set of guidelines or strategies to develop heuristic optimization algorithms (Sörensen and Glover, 2013). Notable examples of metaheuristics include genetic/evolutionary algorithms, tabu search, simulated annealing, variable neighborhood search, (adaptive) large neighborhood search, and ant colony optimization, although many more exist. A problem-specific implementation of a heuristic optimization algorithm according to the guidelines expressed in a metaheuristic framework is also referred to as a metaheuristic. (
  206. Minimax Regret – strategy is the one that minimises the maximum regret. It is useful for a risk-neutral decision maker. Essentially, this is the technique for a ‘sore loser’ who does not wish to make the wrong decision. ‘Regret’ in this context is defined as the opportunity loss through having made the wrong decision. (
  207. Minimin Strategy – A strategy or algorithm that seeks to minimize the minimum possible result. (
  208. Mixed Integer Programming – A problem is one where some of the decision variables are constrained to be integer values (i.e. whole numbers such as -1, 0, 1, 2, etc.) at the optimal solution. The use of integer variables greatly expands the scope of useful optimization problems that you can define and solve. (
  209. Mode – a set of data values is the value that appears most often (
  210. Model Evaluation – an integral part of the model development process. It helps to find the best model that represents our data and how well the chosen model will work in the future. Evaluating model performance with the data used for training is not acceptable in data science because it can easily generate overoptimistic and overfitted models. There are two methods of evaluating models in data science, Hold-Out and Cross-Validation. To avoid overfitting, both methods use a test set (not seen by the model) to evaluate model performance. (
  211. Monte Carlo Simulation – used to model the probability of different outcomes in a process that cannot easily be predicted due to the intervention of random variables. It is a technique used to understand the impact of risk and uncertainty in prediction and forecasting models. (
  212. Moving Average – a widely used indicator in technical analysis that helps smooth out price action by filtering out the “noise” from random short-term price fluctuations. It is a trend-following, or lagging, indicator because it is based on past prices. (
  213. Multicollinearity – a phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy. In this situation the coefficient estimates of the multiple regression may change erratically in response to small changes in the model or the data. Multicollinearity does not reduce the predictive power or reliability of the model as a whole, at least within the sample data set; it only affects calculations regarding individual predictors. That is, a multivariate regression model with collinear predictors can indicate how well the entire bundle of predictors predicts the outcome variable, but it may not give valid results about any individual predictor, or about which predictors are redundant with respect to others. (
  214. Multidimensional Data Model – typically used for the design of corporate data warehouses and departmental data marts. Such a model can adopt a star schema, snowflake schema, or fact constellation schema. The core of the multidimensional model is the data cube, which consists of a large set of facts (or measures) and a number of dimensions. Dimensions are the entities or perspectives with respect to which an organization wants to keep records and are hierarchical in nature. (
  215. Multidimensional Scaling – a means of visualizing the level of similarity of individual cases of a dataset. MDS is used to translate “information about the pairwise ‘distances’ among a set of n objects or individuals” into a configuration of n points mapped into an abstract Cartesian space. More technically, MDS refers to a set of related ordination techniques used in information visualization, in particular to display the information contained in a distance matrix. It is a form of non-linear dimensionality reduction. (
  216. Naïve Bayes Classifier – a family of simple “probabilistic classifiers” based on applying Bayes’ theorem with strong (naïve) independence assumptions between the features. They are among the simplest Bayesian network models. (; based on Bayes’ theorem of posterior probability. It assumes classconditional independence—that the effect of an attribute value on a given class is independent of the values of the other attributes. (
  217. Net Present Value – applies to a series of cash flows occurring at different times. The present value of a cash flow depends on the interval of time between now and the cash flow. It also depends on the discount rate. NPV accounts for the time value of money. It provides a method for evaluating and comparing capital projects or financial products with cash flows spread over time, as in loans, investments, payouts from insurance contracts plus many other applications. (
  218. Net Working Capital – the difference between a company’s current assets, such as cash, accounts receivable (customers’ unpaid bills) and inventories of raw materials and finished goods, and its current liabilities, such as accounts payable. Net operating working capital is a measure of a company’s liquidity and refers to the difference between operating current assets and operating current liabilities. In many cases these calculations are the same and are derived from company cash plus accounts receivable plus inventories, less accounts payable and less accrued expenses. (
  219. Network Optimization – a set of best practices used to improve network performance. A variety of tools and techniques can be used to monitor and improve network performance such as: global load balancing, minimize latency, packet loss monitoring and bandwidth management. (
  220. Newman-Keuls Method – a stepwise multiple comparisons procedure used to identify sample means that are significantly different from each other. This procedure is often used as a post-hoc test whenever a significant difference between three or more sample means has been revealed by an analysis of variance (ANOVA). (
  221. Next Best Offer (NBO) – a targeted offer or proposed action for customers based on analyses of past history and behavior, other customer preferences, purchasing context, attributes of the produces, or services from which they can choose (Davenport, Enterprise Analytics, p. 83)
  222. Nominal Group Technique (NGT) – defined as a structured method for group brainstorming that encourages contributions from everyone and facilitates quick agreement on the relative importance of issues, problems, or solutions. Team members begin by writing down their ideas, then selecting which idea they feel is best. Once team members are ready, everyone presents their favorite idea, and the suggestions are then discussed and prioritized by the entire group using a point system. NGT combines the importance ratings of individual group members into the final weighted priorities of the group. (
  223. Nominal Level – the numbers in the variable are used only to classify the data. In this level of measurement, words, letters, and alpha-numeric symbols can be used.  Suppose there are data about people belonging to three different gender categories. In this case, the person belonging to the female gender could be classified as F, the person belonging to the male gender could be classified as M, and transgendered classified as T.  This type of assigning classification is nominal level of measurement. (
  224. Non-Basic Variable – variables set to zero in linear optimization equations. In contrast, basic variables are variables in equations that are equaled to each other.
  225. Non-Parametric Statistics – the branch of statistics that is not based solely on parametrized families of probability distributions (common examples of parameters are the mean and variance). Nonparametric statistics is based on either being distribution-free or having a specified distribution but with the distribution’s parameters unspecified. Nonparametric statistics includes both descriptive statistics and statistical inference. (
  226. Nonlinear Programming – the process of solving an optimization problem where some of the constraints or the objective function are nonlinear. An optimization problem is one of calculation of the extrema (maxima, minima or stationary points) of an objective function over a set of unknown real variables and conditional to the satisfaction of a system of equalities and inequalities, collectively termed constraints. It is the sub-field of mathematical optimization that deals with problems that are not linear. (
  227. Nondeterministic Algorithm – an algorithm that, even for the same input, can exhibit different behaviors on different runs, as opposed to a deterministic algorithm. There are several ways an algorithm may behave differently from run to run. A concurrent algorithm can perform differently on different runs due to a race condition. A probabilistic algorithm’s behaviors depends on a random number generator. An algorithm that solves a problem in nondeterministic polynomial time can run in polynomial time or exponential time depending on the choices it makes during execution. The nondeterministic algorithms are often used to find an approximation to a solution, when the exact solution would be too costly to obtain using a deterministic one. (
  228. Normal Distribution – a purely theoretical continuous probability distribution in which the horizontal axis represents all possible values of a variable and the vertical axis represents the probability of those values occurring. The scores on the variable are clustered around the mean in a symmetrical, unimodal pattern known as the bell-shaped, or normal, curve. (
  229. Normal Probability Plot – a graphical comparison of the form of the distribution to the normal distribution. In the normal probability plot, the normal distribution is represented by a straight line angled at 45 degrees. The actual distribution is plotted against this line so that any differences are shown as deviations from the straight line, making identification of differences quite apparent and interpretable. Normality is degree to which the distribution of the sample data corresponds to a normal distribution. (
  230. Normality – degree to which the distribution of the sample data corresponds to a normal distribution. (
  231. Normalization – splits up data to avoid redundancy (duplication) by moving commonly repeating groups of data into new tables. Normalization therefore tends to increase the number of tables that need to be joined to perform a given query, but reduces the space required to hold the data and the number of places where it needs to be updated if the data changes (
  232. NP (Non-Deterministic-Polynomial Time) – These are the decision problems which can be verified in polynomial time. That means, if I claim that there is a polynomial time solution for a particular problem, you ask me to prove it. Then, I will give you a proof which you can easily verify in polynomial time. These kind of problems are called NP problems. Note that, here we are not talking about whether there is a polynomial time solution for this problem or not. But we are talking about verifying the solution to a given problem in polynomial time. (
  233. NP-Complete – These are the problems which are both NP and NP-Hard. That means, if we can solve these problems, we can solve any other NP problem and the solutions to these problems can be verified in polynomial time. (
  234. NP-Hard – These are at least as hard as the hardest problems in NP. If we can solve these problems in polynomial time, we can solve any NP problem that can possibly exist. Note that these problems are not necessarily NP problems. That means, we may/may-not verify the solution to these problems in polynomial time. (
  235. Null Hypothesis – proposes that no statistical significance exists in a set of given observations. The null hypothesis attempts to show that no variation exists between variables or that a single variable is no different than its mean. It is presumed to be true until statistical evidence nullifies it for an alternative hypothesis. (; Hypothesis with samples that come from populations with equal means (i.e., the group means are equal) for either a dependent variable (univariate test) or a set of dependent variables (multivariate test). The null hypothesis can be accepted or rejected depending on the results of a test of statistical significance. (
  236. Objective Function – the function that it is desired to maximize or minimize. (
  237. Ogive Graph – a type of frequency polygon that shows cumulative frequencies. In other words, the cumulative percents are added on the graph from left to right. An ogive graph plots cumulative frequency on the y-axis and class boundaries along the x-axis. It’s very similar to a histogram, only instead of rectangles, an ogive has a single point marking where the top right of the rectangle would be. It is usually easier to create this kind of graph from a frequency table. (
  238. OLAP (Online Analytical Processing) – the technology behind many Business Intelligence (BI) applications. OLAP is a powerful technology for data discovery, including capabilities for limitless report viewing, complex analytical calculations, and predictive “what if” scenario (budget, forecast) planning. (
  239. OLAP (Online Analytical Processing) Cube – refers to a multi-dimensional dataset, which is also sometimes called a hypercube if the number of dimensions is greater than 3.; each cell of the cube holds a number that represents some measure of the business, such as sales, profits, expenses, budget, and forecast (
  240. OLTP (Online Transactional Processing) – a category of data processing that is focused on transaction-oriented tasks. OLTP typically involves inserting, updating, and/or deleting small amounts of data in a database. (
  241. Operating Margin – measures how much profit a company makes on a dollar of sales, after paying for variable costs of production, such as wages and raw materials, but before paying interest or tax. It is calculated by dividing a company’s operating profit by its net sales. (
  242. Operations Management – deals with the design and management of products, processes, services, and supply chains. It considers the acquisition, development, and utilization of resources that firms need to deliver the goods and services their clients want (
  243. Operations Research – a discipline that deals with the application of advanced analytical methods to help make better decisions (
  244. Opportunity Cost – represent the benefits an individual, investor or business misses out on when choosing one alternative over another. While financial reports do not show opportunity cost, business owners can use it to make educated decisions when they have multiple options before them. Bottlenecks are often a cause of opportunity costs. (
  245. Optimization – the selection of a best element (with regard to some criterion) from some set of available alternatives; an optimization problem consists of maximizing or minimizing a real function by systematically choosing input values from within an allowed set and computing the value of the function. (
  246. Ordinal-Level of Measurement – depicts some ordered relationship among the variable’s observations. Suppose a student scores the highest grade of 100 in the class.  In this case, he would be assigned the first rank.  Then, another classmate scores the second highest grade of an 92; she would be assigned the second rank.  A third student scores a 81 and he would be assigned the third rank, and so on. (
  247. P (Polynomial Time) – As name itself suggests, these are the problems which can be solved in polynomial time. (
  248. Paired Sample T-Test – a statistical procedure used to determine whether the mean difference between two sets of observations is zero. In a paired sample t-test, each subject or entity is measured twice, resulting in pairs of observations. Common applications of the paired sample t-test include case-control studies or repeated-measures designs. Suppose you are interested in evaluating the effectiveness of a company training program. One approach you might consider would be to measure the performance of a sample of employees before and after completing the program, and analyze the differences using a paired sample t-test. (
  249. Panel Data Regression – a method for estimating data which is both time series and cross sectional; examples of cross-sectional units are industry, firm and country (
  250. Parameter Estimation – refers to the process of using sample data (in reliability engineering, usually times-to-failure or success data) to estimate the parameters of the selected distribution. Several parameter estimation methods are available. This section presents an overview of the available methods used in life data analysis. More specifically, we start with the relatively simple method of Probability Plotting and continue with the more sophisticated methods of Rank Regression (or Least Squares), Maximum Likelihood Estimation and Bayesian Estimation Methods. (
  251. Parametric Statistics – a branch of statistics which assumes that sample data come from a population that can be adequately modeled by a probability distribution that has a fixed set of parameters. Conversely a non-parametric model differs precisely in that the parameter set (or feature set in machine learning) is not fixed and can increase, or even decrease, if new relevant information is collected.(
  252. Partitioning Cluster Method – first creates an initial set of k partitions, where parameter k is the number of partitions to construct. It then uses an iterative relocation technique that attempts to improve the partitioning by moving objects from one group to another. Typical partitioning methods include kmeans, k-medoids, and CLARANS. (; clustering methods used to classify observations, within a data set, into multiple groups based on their similarity. The algorithms require the analyst to specify the number of clusters to be generated. (
  253. Pattern Recognition – the process of recognizing patterns by using machine learning algorithm. Pattern recognition can be defined as the classification of data based on knowledge already gained or on statistical information extracted from patterns and/or their representation. (
  254. Payback Period – refers to the amount of time it takes to recover the cost of an investment. Simply put, the payback period is the length of time an investment reaches a breakeven point. (
  255. P-Chart – used in statistical quality control to graph proportions of defective items. The chart is based on the binomial distribution; each item on the chart has only two possibilities: pass or fail. An “item” could be anything you’re interested in charting, including: gadgets from a production line, wait times, or delivery times. (
  256. Lift Chart – graphically represents the improvement that a mining model provides when compared against a random guess, and measures the change in terms of a lift score. By comparing the lift scores for different models, you can determine which model is best. You can also determine the point at which the model’s predictions become less useful. For example, by reviewing the lift chart, you might realize that a promotional campaign is likely to be effective against only 30% of your customers, and use that figure to limit the scope of the campaign. (
  257. Program Evaluation Review Technique (PERT) – a project management tool that provides a graphical representation of a project’s timeline. The Program Evaluation Review Technique (PERT) breaks down the individual tasks of a project for analysis. PERT charts are considered preferable to Gantt charts because they identify task dependencies, but they’re often more difficult to interpret. (
  258. Poisson Distribution – the discrete probability distribution of the number of events occurring in a given time period, given the average number of times the event occurs over that time period.; A certain fast-food restaurant gets an average of 3 visitors to the drive-through per minute. This is just an average, however. The actual amount can vary. A Poisson distribution can be used to analyze the probability of various events regarding how many customers go through the drive-through. It can allow one to calculate the probability of a lull in activity (when there are 0 customers coming to the drive-through) as well as the probability of a flurry of activity (when there are 5 or more customers coming to the drive-through). This information can, in turn, help a manager plan for these events with staffing and scheduling. (
  259. Post-Optimality Analysis – the optimal solution of a degenerate linear program using a pivoting algorithm (; Deals with making changes in the parameters of the model and finding the new optimum solution; these changes require periodic re-calculation of the optimum solution and the new computations are rooted in the use duality and the primal-dual relationships. (
  260. Polynomial – an expression consisting of variables (also called indeterminates) and coefficients, that involves only the operations of addition, subtraction, multiplication, and non-negative integer exponents of variables. (
  261. Precision – a description of random errors, a measure of statistical variability; n simpler terms, given a set of data points from repeated measurements of the same quantity, the set can be said to be accurate if their average is close to the true value of the quantity being measured, while the set can be said to be precise if the values are close to each other. (
  262. Predictive Analytics – encompasses a variety of statistical techniques from data mining, predictive modelling, and machine learning, that analyze current and historical facts to make predictions about future or otherwise unknown events. (
  263. Pricing – a tactic in the simplex method, by which each variable is evaluated for its potential to improve the value of the objective function. (A. Holder, editor. Mathematical Programming Glossary. INFORMS Computing Society,, 2006-08. Originally authored by Harvey J. Greenberg, 1999-2006.)
  264. Principal Component Analysis (PCA) – a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables (entities each of which takes on various numerical values) into a set of values of linearly uncorrelated variables called principal components. This transformation is defined in such a way that the first principal component has the largest possible variance (that is, accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to the preceding components. The resulting vectors (each being a linear combination of the variables and containing n observations) are an uncorrelated orthogonal basis set. PCA is sensitive to the relative scaling of the original variables. (
  265. Probability Density Function – a statistical expression that defines a probability distribution (the likelihood of an outcome) for a discrete random variable (e.g., a stock or ETF) as opposed to a continuous random variable. The difference between a discrete random variable is that you can identify an exact value of the variable. For instance, the value for the variable, e.g., a stock price, only goes two decimal points beyond the decimal (e.g. 52.55), while a continuous variable could have an infinite number of values (e.g. 52.5572389658…). (
  266. Probability Distributions – a statistical function that describes all the possible values and likelihoods that a random variable can take within a given range. This range will be bounded between the minimum and maximum possible values, but precisely where the possible value is likely to be plotted on the probability distribution depends on a number of factors. These factors include the distribution’s mean (average), standard deviation, skewness, and kurtosis. (
  267. Probability Distribution Fitting – the fitting of a probability distribution to a series of data concerning the repeated measurement of a variable phenomenon. The aim of distribution fitting is to predict the probability of the phenomenon in a certain interval. The distribution giving a close fit is supposed to lead to good predictions. (
  268. Problem Assessment/ Framing – initial step in the analytics process; involves buy in from all parties involved on what the problem is before a solution can be found (
  269. Process Capability Index – a statistical tool, to measure the ability of a process to produce output within customer’s specification limits. (
  270. Proprietary Data – data that no other organization possesses; produced by a company to enhance its competitive posture (Davenport, Enterprise Analytics, p. 37)
  271. Q-Q Plot (Quantile-Quantile Plot) – a scatterplot created by plotting two sets of quantiles against one another. If both sets of quantiles came from the same distribution, we should see the points forming a line that’s roughly straight.; a graphical tool to help us assess if a set of data plausibly came from some theoretical distribution such as a Normal or exponential. For example, if we run a statistical analysis that assumes our dependent variable is Normally distributed, we can use a Normal Q-Q plot to check that assumption. It’s just a visual check, not an air-tight proof, so it is somewhat subjective. But it allows us to see at-a-glance if our assumption is plausible, and if not, how the assumption is violated and what data points contribute to the violation. (
  272. Queuing Theory – the mathematical study of waiting lines, or queues. A queueing model is constructed so that queue lengths and waiting time can be predicted. Queueing theory is generally considered a branch of operations research because the results are often used when making business decisions about the resources needed to provide a service. (
  273. Quota Sampling – means to take a very tailored sample that’s in proportion to some characteristic or trait of a population. For example, you could divide a population by the state they live in, income or education level, or sex. The population is divided into groups (also called strata) and samples are taken from each group to meet a quota. Care is taken to maintain the correct proportions representative of the population. For example, if your population consists of 45% female and 55% male, your sample should reflect those percentages. Quota sampling is based on the researcher’s judgment and is considered a non-probability sampling technique. (
  274. Radar Chart – a graphical method of displaying multivariate data in the form of a two-dimensional chart of three or more quantitative variables represented on axes starting from the same point. The relative position and angle of the axes is typically uninformative, but various heuristics, such as algorithms that plot data as the maximal total area, can be applied to sort the variables (axes) into relative positions that reveal distinct correlations, trade-offs, and a multitude of other comparative measures. (
  275. Random – of or characterizing a process of selection in which each item of a set has an equal probability of being chosen (
  276. Range – the difference between the maximum and minimum observations providing an estimate of the spread of the data (
  277. Ratio-Level of Measurement – the observations, in addition to having equal intervals, can have a value of zero as well. The zero in the scale makes this type of measurement unlike the other types of measurement, although the properties are similar to that of the interval level of measurement.  In the ratio level of measurement, the divisions between the points on the scale have an equivalent distance between them. (
  278. Recursive Function – a routine that calls itself directly or indirectly. (
  279. Regression – a statistical measure that attempts to determine the strength of the relationship between one dependent variable (usually denoted by Y) and a series of other changing variables (known as independent variables) (
  280. Regression Analysis – statistical approach to forecasting change in a dependent variable (e.g., sales revenue) on the basis of change in one or more independent variables (e.g., population and income); AKA curve fitting or line fitting (
  281. Reliability – the overall consistency of a measure. A measure is said to have a high reliability if it produces similar results under consistent conditions. “It is the characteristic of a set of test scores that relates to the amount of random error from the measurement process that might be embedded in the scores. Scores that are highly reliable are accurate, reproducible, and consistent from one testing occasion to another. (
  282. Residual – the residual of an observed value is the difference between the observed value and the estimated value of the quantity of interest (for example, a sample mean). (
  283. Response Surface Methodology (RSM) – explores the relationships between several explanatory variables and one or more response variables. The main idea of RSM is to use a sequence of designed experiments to obtain an optimal response and using a second-degree polynomial model to do this. (
  284. Return on Investment (ROI) – a performance measure used to evaluate the efficiency of an investment or compare the efficiency of a number of different investments. ROI tries to directly measure the amount of return on a particular investment, relative to the investment’s cost. To calculate ROI, the benefit (or return) of an investment is divided by the cost of the investment. The result is expressed as a percentage or a ratio. (
  285. Revenue Management (Yield m=Management) – the application of disciplined analytics that predict consumer behaviour at the micro-market levels and optimize product availability and price to maximize revenue growth. The primary aim of revenue management is selling the right product to the right customer at the right time for the right price and with the right pack. The essence of this discipline is in understanding customers’ perception of product value and accurately aligning product prices, placement and availability with each customer segment. (
  286. RFM (Recency, Frequency, Monetary) – allows marketers to target specific clusters of customers with communications that are much more relevant for their particular behavior – and thus generate much higher rates of response, plus increased loyalty and customer lifetime value. Like other segmentation methods, RFM segmentation is a powerful way to identify groups of customers for special treatment. RFM stands for recency, frequency and monetary – more about each of these shortly. (
  287. Risk-Return Tradeoffs – states that the potential return rises with an increase in risk. Using this principle, individuals associate low levels of uncertainty with low potential returns, and high levels of uncertainty or risk with high potential returns. According to the risk-return tradeoff, invested money can render higher profits only if the investor will accept a higher possibility of losses. (
  288. Robust Optimization – a field of optimization theory that deals with optimization problems in which a certain measure of robustness is sought against uncertainty that can be represented as deterministic variability in the value of the parameters of the problem itself and/or its solution. (
  289. Receiver Operating Characteristic (ROC) Curve – a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. The ROC curve is created by plotting the true positive rate (TPR) against the false positive rate (FPR) at various threshold settings. The true-positive rate is also known as sensitivity, recall or probability of detection in machine learning. The false-positive rate is also known as probability of false alarm and can be calculated as (1 − specificity). It can also be thought of as a plot of the power as a function of the Type I Error of the decision rule (when the performance is calculated from just a sample of the population, it can be thought of as estimators of these quantities). The ROC curve is thus the sensitivity as a function of fall-out. (
  290. Root Mean Square Error (RMSE) – a frequently used measure of the differences between values (sample or population values) predicted by a model or an estimator and the values observed. The RMSD represents the square root of the second sample moment of the differences between predicted values and observed values or the quadratic mean of these differences. These deviations are called residuals when the calculations are performed over the data sample that was used for estimation and are called errors (or prediction errors) when computed out-of-sample. The RMSD serves to aggregate the magnitudes of the errors in predictions for various times into a single measure of predictive power. RMSD is a measure of accuracy, to compare forecasting errors of different models for a particular dataset and not between datasets, as it is scale-dependent. (
  291. Scalar-Level of Measurement – The term “scalar” comes from linear algebra, where it is used to differentiate a single number from a vector or matrix. The meaning in computing is similar. It distinguishes a single value like an integer or float from a data structure like an array. (
  292. Scenario Analysis – the process of estimating the expected value of a portfolio after a given period of time, assuming specific changes in the values of the portfolio’s securities or key factors take place, such as a change in the interest rate. Scenario analysis is commonly used to estimate changes to a portfolio’s value in response to an unfavorable event and may be used to examine a theoretical worst-case scenario. (
  293. Seasonal Component – that part of the variations in a time series representing intra-year fluctuations that are more or less stable year after year with respect to timing, direction and magnitude. (
  294. Sensitivity Analysis (Parametric Analysis) – determines how different values of an independent variable affect a particular dependent variable under a given set of assumptions. In other words, sensitivity analyses study how various sources of uncertainty in a mathematical model contribute to the model’s overall uncertainty. This technique is used within specific boundaries that depend on one or more input variables. (
  295. Service Mechanism – the way that customers receive service once they are selected from the front of a queue. A service mechanism is also called a server (in fact, this is the more common terminology). (
  296. Shadow Price – commonly referred to as a monetary value assigned to currently unknowable or difficult-to-calculate costs. It is based on the willingness to pay principle – in the absence of market prices, the most accurate measure of the value of a good or service is what people are willing to give up in order to get it. Shadow pricing is often calculated on certain assumptions and premises. As a result, it is subjective and somewhat imprecise and inaccurate. (; an estimated price for something that is not normally priced in the market or sold in the market. Economists will often assign a shadow price to estimate the cost of negative externalities such as the pollution emitted by a firm. (
  297. Shapiro Wilk Test – a way to tell if a random sample comes from a normal distribution. The test gives you a W value; small values indicate your sample is not normally distributed (you can reject the null hypothesis that your population is normally distributed if your values are under a certain threshold). (; a test of normality in frequentist statistics. (
  298. Sign Test – a statistical method to test for consistent differences between pairs of observations, such as the weight of subjects before and after treatment. Given pairs of observations (such as weight pre- and post-treatment) for each subject, the sign test determines if one member of the pair (such as pre-treatment) tends to be greater than (or less than) the other member of the pair (such as post-treatment). The paired observations may be designated x and y. For comparisons of paired observations (x,y), the sign test is most useful if comparisons can only be expressed as x > y, x = y, or x < y. If, instead, the observations can be expressed as numeric quantities (x = 7, y = 18), or as ranks (rank of x = 1st, rank of y = 8th), then the paired t-test or the Wilcoxon signed-rank test will usually have greater power than the sign test to detect consistent differences. (
  299. Significance Tests – can be used to assess whether the difference in accuracy between two classifiers is due to chance. (
  300. Smoothing – to smooth a data set is to create an approximating function that attempts to capture important patterns in the data, while leaving out noise or other fine-scale structures/rapid phenomena. In smoothing, the data points of a signal are modified so individual points higher than the adjacent points (presumably because of noise) are reduced, and points that are lower than the adjacent points are increased leading to a smoother signal. Smoothing may be used in two important ways that can aid in data analysis (1) by being able to extract more information from the data as long as the assumption of smoothing is reasonable and (2) by being able to provide analyses that are both flexible and robust. (
  301. Simulated Annealing – a probabilistic technique for approximating the global optimum of a given function. Specifically, it is a metaheuristic to approximate global optimization in a large search space for an optimization problem. It is often used when the search space is discrete (e.g., the traveling salesman problem). For problems where finding an approximate global optimum is more important than finding a precise local optimum in a fixed amount of time, simulated annealing may be preferable to alternatives such as gradient descent. (
  302. Simulation – an approximate imitation of the operation of a process or system; that represents its operation over time. (
  303. Simulation Optimization – the process of finding the best input variable values from among all possibilities without explicitly evaluating each possibility. The objective of simulation optimization is to minimize the resources spent while maximizing the information obtained in a simulation experiment. (
  304. Sine (Sinusoidal) Wave – a mathematical curve that describes a smooth periodic oscillation. A sine wave is a continuous wave. It is named after the function sine, of which it is the graph. It occurs often in pure and applied mathematics, as well as physics, engineering, signal processing and many other fields. (
  305. Six Sigma – a disciplined, data-driven approach and methodology for eliminating defects (driving toward six standard deviations between the mean and the nearest specification limit) in any process – from manufacturing to transactional and from product to service. The statistical representation of Six Sigma describes quantitatively how a process is performing. To achieve Six Sigma — statistically — a process must not produce more than 3.4 defects per million opportunities. A Six Sigma defect is defined as anything outside of customer specifications. (
  306. Skewness – the measure of the symmetry of a distribution; in most instances the comparison is made to a normal distribution. A positively skewed distribution has relatively few large values and tails off to the right, and a negatively skewed distribution has relatively few small values and tails off to the left. Skewness values falling outside the range of -1 to +1 indicate a substantially skewed distribution. (
  307. Slack Variable – a variable that is added to an inequality constraint to transform it into an equality. Introducing a slack variable replaces an inequality constraint with an equality constraint and a non-negativity constraint on the slack variable.;Slack variables are used in particular in linear programming. (
  308. SLIQ – a decision tree classifier that can handle both numeric and categorical attributes. It uses a novel pre-sorting technique in the tree-growth phase. This sorting procedure is integrated with a breadth-first tree growing strategy to enable classification of disk-resident datasets. SLIQ also uses a new tree-pruning algorithm that is inexpensive, and results in compact and accurate trees. The combination of these techniques enables SLIQ to scale for large data sets and classify data sets irrespective of the number of classes, attributes, and examples (records), thus making it an attractive tool for data mining. (
  309. Snowball Sampling – a nonprobability sampling technique where existing study subjects recruit future subjects from among their acquaintances. Thus the sample group is said to grow like a rolling snowball. As the sample builds up, enough data are gathered to be useful for research. This sampling technique is often used in hidden populations, such as drug users or sex workers, which are difficult for researchers to access. As sample members are not selected from a sampling frame, snowball samples are subject to numerous biases. For example, people who have many friends are more likely to be recruited into the sample. (
  310. Spatial Autocorrelation – in GIS, this helps you understand the degree to which one object is similar to other nearby objects. Moran’s I (Index) measures spatial autocorrelation. Positive spatial autocorrelation is when similar values cluster together in a map. Negative spatial autocorrelation is when dissimilar values cluster together in a map. One of the main reasons why spatial auto-correlation is important is because statistics relies on observations being independent from one another. If autocorrelation exists in a map, then this violates the fact that observations are independent from one another.; Another potential application is analyzing clusters and dispersion of ecology and disease. Is the disease an isolated case? Is it clustered or spreading with dispersion? These trends can be better understood using spatial autocorrelation analysis. (
  311. Specificity – (also called the true negative rate) measures the proportion of actual negatives that are correctly identified as such (e.g., the percentage of healthy people who are correctly identified as not having the condition). (
  312. Split-Half Method – In split-half reliability, a test for a single knowledge area is split into two parts and then both parts given to one group of students at the same time. The scores from both parts of the test are correlated. A reliable test will have high correlation, indicating that a student would perform equally well (or as poorly) on both halves of the test. Split-half testing is a measure of internal consistency — how well the test components contribute to the construct that’s being measured. It is most commonly used for multiple choice tests you can theoretically use it for any type of test—even tests with essay questions. (
  313. SPRINT – a short, time-boxed period when a scrum team works to complete a set amount of work. Sprints are at the very heart of scrum and agile methodologies, and getting sprints right will help your agile team ship better software with fewer headaches. (
  314. SQL Injection – an attack that consists of insertion or “injection” of a SQL query via the input data from the client to the application. A successful SQL injection exploit can read sensitive data from the database, modify database data (Insert/Update/Delete), execute administration operations on the database (such as shutdown the DBMS), recover the content of a given file present on the DBMS file system and in some cases issue commands to the operating system. SQL injection attacks are a type of injection attack, in which SQL commands are injected into data-plane input in order to effect the execution of predefined SQL commands. (
  315. Standard Deviation – a statistic that measures the dispersion of a dataset relative to its mean and is calculated as the square root of the variance. It is calculated as the square root of variance by determining the variation between each data point relative to the mean. If the data points are further from the mean, there is a higher deviation within the data set; thus, the more spread out the data, the higher the standard deviation. (; Customarily represented by the lower-case Greek letter sigma (σ), it is considered the most useful and important measure of dispersion that has all the essential properties of the variance plus the advantage of being determined in the same units as those of the original data. Also called root mean square (RMS) deviation (
  316. Standard Deviation for Investment – a basic mathematical concept that measures volatility in the market, or the average amount by which individual data points differ from the mean. Simply put, standard deviation helps determine the spread of asset prices from their average price. (
  317. Standard Normal Distribution – a special case of the normal distribution . It is the distribution that occurs when a normal random variable has a mean of zero and a standard deviation of one. (
  318. Statistical Analysis – the science of collecting, exploring and presenting large amounts of data to discover underlying patterns and trends. Statistics are applied every day – in research, industry and government – to become more scientific about decisions that need to be made. (
  319. Statistical Inference – the process of using data analysis to deduce properties of an underlying probability distribution. Inferential statistical analysis infers properties of a population, for example by testing hypotheses and deriving estimates. It is assumed that the observed data set is sampled from a larger population. (
  320. Statistical Process Control – the use of statistical techniques to control a process or production method. SPC tools and procedures can help you monitor process behavior, discover issues in internal systems, and find solutions for production issues. Statistical process control is often used interchangeably with statistical quality control (SQC). ;A popular SPC tool is the control chart. (
  321. Statistical Significance – probability of obtaining a test result that occurs by chance and not by systematic manipulation of data (
  322. Statistics – a form of mathematical analysis that uses quantified models, representations and synopses for a given set of experimental data or real-life studies. Statistics studies methodologies to gather, review, analyze and draw conclusions from data. Some statistical measures include the following: Mean, Regression analysis, Skewness, Kurtosis, Variance, Analysis of variance, and Stem and Leaf Plot. (
  323. Stepwise Regression – a method of fitting regression models in which the choice of predictive variables is carried out by an automatic procedure. In each step, a variable is considered for addition to or subtraction from the set of explanatory variables based on some prespecified criterion. Usually, this takes the form of a sequence of F-tests or t-tests, but other techniques are possible, such as adjusted R2, Akaike information criterion, Bayesian information criterion, Mallows’s Cp, PRESS, or false discovery rate. (
  324. Stem and Leaf Plot – a device for presenting quantitative data in a graphical format, similar to a histogram, to assist in visualizing the shape of a distribution. ( ;a special table where each data value is split into a “stem” (the first digit or digits) and a “leaf” (usually the last digit). (
  325. Stochastic Analysis/Calculus – a branch of mathematics that operates on stochastic processes. It allows a consistent theory of integration to be defined for integrals of stochastic processes with respect to stochastic processes. It is used to model systems that behave randomly. (
  326. Stochastic Optimization – optimization methods that generate and use random variables. For stochastic problems, the random variables appear in the formulation of the optimization problem itself, which involves random objective functions or random constraints. Stochastic optimization methods also include methods with random iterates. Some stochastic optimization methods use random iterates to solve stochastic problems, combining both meanings of stochastic optimization. Stochastic optimization methods generalize deterministic methods for deterministic problems. (
  327. Stochastic Programming (Recourse Model) – a framework for modeling optimization problems that involve uncertainty. Whereas deterministic optimization problems are formulated with known parameters, real world problems almost invariably include some unknown parameters. When the parameters are known only within certain bounds, one approach to tackling such problems is called robust optimization. Here the goal is to find a solution which is feasible for all such data and optimal in some sense. (
  328. Stratified Random Sampling – a method of sampling that involves the division of a population into smaller groups known as strata. In stratified random sampling or stratification, the strata are formed based on members’ shared attributes or characteristics. Stratified random sampling is also called proportional random sampling or quota random sampling. (
  329. Structural Equation Modeling (SEM) – a multivariate technique combining aspects of factor analysis and multiple regression that enables the researcher to simultaneously examine a series of interrelated dependence relationships among the measured variables and latent constructs (variates) as well as between several latent constructs. (; a multivariate statistical framework that is used to model complex relationships between directly and indirectly observed (latent) variables. SEM is a general framework that involves simultaneously solving systems of linear equations and encompasses other techniques such as regression, factor analysis, path analysis, and latent growth curve modeling. Recently, SEM has gained popularity in the analysis of complex genetic traits because it can be used to better analyze the relationships between correlated variables (traits), to model genes as latent variables as a function of multiple observed genetic variants, and assess the association between multiple genetic variants and multiple correlated phenotypes of interest. (
  330. Structural Equation Modelling – a multivariate statistical analysis technique that is used to analyze structural relationships. This technique is the combination of factor analysis and multiple regression analysis, and it is used to analyze the structural relationship between measured variables and latent constructs.  This method is preferred by the researcher because it estimates the multiple and interrelated dependence in a single analysis.  In this analysis, two types of variables are used endogenous variables and exogenous variables.  Endogenous variables are equivalent to dependent variables and are equal to the independent variable. (
  331. Support Vector Machine (SVM) – an algorithm for the classification of both linear and nonlinear data. It transforms the original data into a higher dimension, from where it can find a hyperplane for data separation using essential training tuples called support vectors. (
  332. Survey Reliability – concerned with consistency or the degree to which the questions used in a survey elicit the same kind of information each time they’re asked. This is particularly important when it comes to tracking and comparing results with past internal surveys and benchmarks from external sources. Changes to wording or structure may result in different responses. (
  333. Swim Lane Diagram – Like a flowchart, it diagrams a process from start to finish, but it also divides these steps into categories to help distinguish which departments or employees are responsible for each set of actions. These lanes are columns that keep actions visually separated from others. (
  334. System Dynamics – a computer-aided approach to policy analysis and design. It applies to dynamic problems arising in complex social, managerial, economic, or ecological systems (; an approach to understanding the nonlinear behavior of complex systems over time using stocks, flows, internal feedback loops, table functions and time delays. (
  335. Test-Retest (Repeatability) – measures test consistency — the reliability of a test measured over time. In other words, give the same test twice to the same people at different times to see if the scores are the same. For example, test on a Monday, then again the following Monday. The two scores are then correlated. (
  336. Time-Series Forecasting – assumes that the factors that have influenced activities in the past and present will continue to do so in approximately the same way in the future. A trend is an overall longterm upward or downward movement in a time series. Trend is not the only component factor that can influence data in a time series. The cyclical effect depicts the up-and-down swings or movements through the series. Cyclical movements vary in length, usually lasting from 2 to 10 years. They differ in intensity and are often correlated with a business cycle. In some time periods, the values are higher than would be predicted by a trend line (i.e., they are at or near the peak of a cycle). In other time periods, the values are lower than would be predicted by a trend line (i.e., they are at or near the bottom of a cycle). Any data that do not follow the trend modified by the cyclical component are considered part of the irregular effect, or random effect. When you have monthly or quarterly data, an additional component, the seasonal effect, is considered, along with the trend, cyclical, and irregular effects. (
  337. Tolerance – an approach to sensitivity analysis in linear programming that expresses the common range that parameters can change while preserving the character of the solution (A. Holder, editor. Mathematical Programming Glossary. INFORMS Computing Society,, 2006-08. Originally authored by Harvey J. Greenberg, 1999-2006.)
  338. Trade-off Analysis – Determining the effect of decreasing one or more key factors and simultaneously increasing one or more other key factors in a decision, design, or project. (
  339. Transition Probabilities – the probabilities associated with various state changes. The process is characterized by a state space, a transition matrix describing the probabilities of particular transitions, and an initial state (or initial distribution) across the state space. By convention, we assume all possible states and transitions have been included in the definition of the process, so there is always a next state, and the process does not terminate. (
  340. Transportation Optimization – the process of determining the most efficient means of moving product to the customer while maintaining a desired service level, given a static supply chain network. The customer can be an internal component of the company or the traditional, external consumer. (
  341. Transportation Problem – a special type of linear programming problem where the objective is to minimise the cost of distributing a product from a number of sources or origins to a number of destinations. Because of its special structure the usual simplex method is not suitable for solving transportation problems. (
  342. Traveling Salesman Problem (TSP) – asks the following question: “Given a list of cities and the distances between each pair of cities, what is the shortest possible route that visits each city and returns to the origin city?” It is an NP-hard problem in combinatorial optimization, important in operations research and theoretical computer science. (
  343. Trend Component – detects either upward or downward changes in data values over a length of time. (
  344. Triangular Distribution – a continuous probability distribution with a probability density function shaped like a triangle. It is defined by three values: the minimum value a, the maximum value b, and the peak value c. This is really handy as in a real-life situation we can often estimate the maximum and minimum values, and the most likely outcome, even if we don’t know the mean and standard deviation. The triangular distribution has a definite upper and lower limit, so we avoid unwanted extreme values. In addition the triangular distribution is a good model for skewed distributions. The sum of two dice is often modelled as a discrete triangular distribution with a minimum of 2, a maximum of 12 and a peak at 7. (
  345. Triple Exponential Smoothing – applies exponential smoothing three times, which is commonly used when there are three high frequency signals to be removed from a time series under study. There are different types of seasonality: ‘multiplicative’ and ‘additive’ in nature, much like addition and multiplication are basic operations in mathematics. (
  346. Tukey HSD (Tukey’s Honest Significance Test or Tukey’s Range Test) – a single-step multiple comparison procedure and statistical test. It can be used to find means that are significantly different from each other. (
  347. Two-Sample F-Test – used to test if the variances of two populations are equal. This test can be a two-tailed test or a one-tailed test. The two-tailed version tests against the alternative that the variances are not equal. The one-tailed version only tests in one direction, that is the variance from the first population is either greater than or less than (but not both) the second population variance. The choice is determined by the problem. For example, if we are testing a new process, we may only be interested in knowing if the new process is less variable than the old process. (
  348. Two-Sample T-Test – used to determine if two population means are equal. A common application is to test if a new process or treatment is superior to a current process or treatment. (
  349. Type I Error – the rejection of a true null hypothesis (
  350. Type II Error – refers to the non-rejection of a false null hypothesis. It is used within the context of hypothesis testing. (
  351. Unbiased Estimators – an accurate statistic that’s used to approximate a population parameter. “Accurate” in this sense means that it’s neither an overestimate nor an underestimate. If an overestimate or underestimate does happen, the mean of the difference is called a “bias.” (
  352. Unbounded Solution – a situation when the optimum feasible solution cannot be determined, instead there are infinite many solutions. It is not possible to solve the problem in which this situation occurs. (
  353. Uncertainty – the estimated amount or percentage by which an observed or calculated value may differ from the true value (
  354. Uniform Distribution – a type of probability distribution in which all outcomes are equally likely; each variable has the same probability that it will be the outcome. A deck of cards has within it uniform distributions because the likelihood of drawing a heart, a club, a diamond or a spade is equally likely. A coin also has a uniform distribution because the probability of getting either heads or tails in a coin toss is the same. (
  355. Utility Function – a representation to define individual preferences for goods or services beyond the explicit monetary value of those goods or services. In other words, it is a calculation for how much someone desires something, and it is relative. For example, if someone prefers dark chocolate to milk chocolate, they are said to derive more utility from dark chocolate. (
  356. Validation (of a model) – determining how well the model depicts the real-world situation it is describing (
  357. Variability – describes how spread out or closely clustered a set of data is (
  358. Bias–vVariance Tradeoff – the property of a set of predictive models whereby models with a lower bias in parameter estimation have a higher variance of the parameter estimates across samples, and vice versa. The bias–variance dilemma or bias–variance problem is the conflict in trying to simultaneously minimize these two sources of error that prevent supervised learning algorithms from generalizing beyond their training set. (
  359. Variable Cost – a periodic cost that varies in step with the output or the sales revenue of a company. Variable costs include raw material, energy usage, labor, distribution costs, etc. (
  360. Variance – the expectation of the squared deviation of a random variable from its mean. Informally, it measures how far a set of (random) numbers are spread out from their average value. (
  361. Variance for Investment – a measurement of the degree of risk in an investment. Risk reflects the chance that an investment’s actual return, or its gain or loss over a specific period, is higher or lower than expected. There is a possibility some, or all, of the investment will be lost. high variance in a stock is associated with higher risk, along with a higher return. Low variance is associated with lower risk and a lower return. High variance stocks tend to be good for aggressive investors who are less risk-averse, while low variance stocks tend to be good for conservative investors who have less risk tolerance. (
  362. Variation Reduction – reference to process variation where reduction leads to stable and predication process results (
  363. Vehicle Routing Problem (VRP) – the goal is to find optimal routes for multiple vehicles visiting a set of locations. (When there’s only one vehicle, it reduces to the Traveling Salesman Problem.) But what do we mean by “optimal routes” for a VRP? One answer is the routes with the least total distance. However, if there are no other constraints, the optimal solution is to assign just one vehicle to visit all locations, and find a shortest route for that vehicle. This is essentially the same problem as the TSP. (
  364. Venn Diagram – a diagram that shows all possible logical relations between a finite collection of different sets. These diagrams depict elements as points in the plane, and sets as regions inside closed curves. A Venn diagram consists of multiple overlapping closed curves, usually circles, each representing a set. (
  365. Verification (of a model) – the process of confirming that it is correctly implemented with respect to the conceptual model (it matches specifications and assumptions deemed acceptable for the given purpose of application). During verification the model is tested to find and fix errors in the implementation of the model. (
  366. Web Analytics – ability to use data generated through Internet-based activities; typically used to assess customer behaviors; see also, RFM (Davenport, Enterprise Analytics, p. 49-51)
  367. Weibull Distribution – widely used in reliability and life data analysis due to its versatility. Depending on the values of the parameters, the Weibull distribution can be used to model a variety of life behaviors. (
  368. Weighted Moving Averages – assign a heavier weighting to more current data points since they are more relevant than data points in the distant past. The sum of the weighting should add up to 1 (or 100 percent). In the case of the simple moving average, the weightings are equally distributed. (
  369. Wilcoxon Signed-Rank Test – a non-parametric statistical hypothesis test used to compare two related samples, matched samples, or repeated measurements on a single sample to assess whether their population mean ranks differ (i.e. it is a paired difference test). It can be used as an alternative to the paired Student’s t-test (also known as “t-test for matched pairs” or “t-test for dependent samples”) when the distribution of the differences between the two samples cannot be assumed to be normally distributed. A Wilcoxon signed-rank test is a nonparametric test that can be used to determine whether two dependent samples were selected from populations having the same distribution. (
  370. x̅ and R chart – a type of scheme, popularly known as control chart, used to monitor the mean and range of a normally distributed variables simultaneously, when samples are collected at regular intervals from a business or industrial process. It is often used to monitor the variables data but the performance of the x and R chart may suffer when the normality assumption is not valid. (
  371. Yield – percentage of ‘good’ product in a batch; has three main components: functional (defect driven), parametric (performance driven), and production efficiency/equipment utilization (




Before you go...make sure to get our FREE EBOOK to help accelerate your data and analytics skills. Get started today before this once-in-a-lifetime opportunity expires.

You have Successfully Subscribed!