Terminology
This article defines general statistical modeling terminology, terms specific to Tealium products, and terms used in the Tealium Predict ML interface.
Audience
An audience is defined as a group of visitor profiles that share a set of attribute conditions and used to trigger vendor actions (connectors) in real-time. In Tealium Predict, the output attributes from your models are used to create one or more audiences for which to target your marketing efforts.
Confusion Matrix
In Tealum Predict, the confusion matrix, also known as an error matrix, is a performance measurement reported for a trained model in the Model Explorer that compares actual and predicted values. In industry terms, a confusion matrix uses a set of test data for which true values are known and then displays actual and predicted values in table format to allow you to visualize the performance of a given algorithm.
Data Scientist
A Data Scientist is an analytical expert that utilizes skills in technology and social science to look for trends and manage data using industry knowledge, contextual understanding, and skepticism of existing assumptions to reveal solutions to business challenges.
Deployed Model
A deployed model refers to a model that has been “trained” and then deployed to populate prediction values into your customer profiles in Tealium AudienceStream.
Machine Learning
Machine learning refers to a subfield of artificial intelligence that focuses on enabling computers to learn without human guidance by recognizing patterns. Machine learning uses a set of predetermined rules to remember the patterns, analyze output, and create a model to explain the patterns and guide the future behavior. In cases where you know what data you want, machine learning accelerates the path to acquiring the desired data. In cases where you do not know exactly what you want or a pattern to identify, machine learning can find a pattern and reveal results that you can use to move forward with acquiring the data you need.
Model
In Tealum Predict, a model represents the behavior you are predicting within a specific timeframe, such as a purchase, conversion, or any customer behavior tracked in AudienceStream. Models are created using an algorithm and the results are used to explain patterns and predict future outcomes.
Model Explorer
The Model Explorer refers to an interactive section of the product interface where you can view performance measurements for each model in each stage and fine-tune the model with actionable items from the interface, such as retrain or deploy.
Output Attribute
The numeric Visit scoped data-layer attribute that stores the Prediction value generated by a corresponding Deployed Model. The Output Attribute is created by default when a new model is created.
Prediction Timeframe
The timeframe, in days weeks, or months for the which you want to predict when the action for your target attribute occurs. For example, a user’s “likelihood to return” in the next “x” days, weeks, or months.
Prediction Value
The score generated by a Deployed Model. This score represents the likelihood that the Target Attribute Value results are set to True during the visitor’s next visit (assuming the next visit occurs within the Prediction Time Window). This value is stored in the corresponding Output Attribute for the model.
Prediction Threshold
The numeric threshold value selected to measure a Prediction value against the Target Attribute value. For example, if the Prediction Threshold chosen is 0.5 and a Prediction Value is set to 0.51, it is assumed that the Target Attribute Value result is set to True.
Probability Distribution (Returning Visitor)
The probability distribution refers to a performance graph reported for a trained model. This graph shows how well the model separates the cases where a visitor did return and perform the action of interest as compared to cases in which the visitor did not return and perform the action. In industry terms, the probability distribution is a mathematical function in which the outcome provides probabilities of the occurrence of different outcomes of an experiment, and thus the probability of a predetermined event to occur.
Receiver Operating Characteristics (ROC)
The ROC/AUC (under the curve) refers to a performance measurement reported for a trained model. In industry terms, the ROC is known as a true positive rate calculated as the number of true positives divided by the sum of the number of true positives and the number of false negatives. The ROC describes how well a model predicts the positive class when the actual outcome is positive. The true positive rate is also referred to as sensitivity.
Retrain
The prediction accuracy for a model degrades over time. When you retrain a Tealium Predict model with new data, the prediction accuracy increases and remains more accurate over a longer period of time.
Strength Scores
The model strength score provides ratings that grade the quality of each version of each model. The strength scores include the F1 Score, Recall, Precision, and Accuracy. For a detailed explanation of these scoring elements, see Model Scores and Ratings.
Target Attribute
The target attribute is AudienceStream attribute selected to define your model and represent the user action being predicted. These attributes are visit or visitor-level booleans selected to signal that an action has been performed. For example, a boolean visit attribute named “Has Purchased” signals that a purchase event has occurred during a visit.
Training
In Tealium Predict,Training refers to the stage in which a model consumes and analyzes data for a predetermined period of time to be used for predictions. The size and quality of the data used during this stage is an important factor in the accuracy of results when you deploy the model.
Trained Version
In Tealium Predict, the trained version of a model refers to a singular instance of training a model. Every machine learning model has a version and each version is trained with data used to accurately make predictions.
Visitor’s Visit
The act of a Visitor visiting a website or the triggering of one or more data layer enriched events.
This page was last updated: August 25, 2023