Terminology | Getting Started with Predict ML

Audience

An audience is defined as a group of visitor profiles that share a set of attribute conditions and used to trigger vendor actions (connectors) in real-time. In Tealium Predict, the output attributes from your models are used to create one or more audiences for which to target your marketing efforts.

Confusion Matrix

In Tealium Predict, the confusion matrix, also known as an error matrix, is a performance measurement reported for a trained model in the Model Explorer that compares actual and predicted values. In industry terms, a confusion matrix uses a set of test data for which true values are known and then displays actual and predicted values in table format to allow you to visualize the performance of a given algorithm.

Data Scientist

A Data Scientist is an analytical expert that utilizes skills in technology and social science to look for trends and manage data using industry knowledge, contextual understanding, and skepticism of existing assumptions to reveal solutions to business challenges.

Deployed Model

A deployed model refers to a model that has been “trained” and then deployed to populate prediction values into your customer profiles in Tealium AudienceStream.

Machine Learning

Machine learning refers to a subfield of artificial intelligence that focuses on enabling computers to learn without human guidance by recognizing patterns. Machine learning uses a set of predetermined rules to remember the patterns, analyze output, and create a model to explain the patterns and guide the future behavior. In cases where you know what data you want, machine learning accelerates the path to acquiring the desired data. In cases where you do not know exactly what you want or a pattern to identify, machine learning can find a pattern and reveal results that you can use to move forward with acquiring the data you need.

Model

In Tealium Predict, a model represents the behavior you are predicting within a specific timeframe, such as a purchase, conversion, or any customer behavior tracked in AudienceStream. Models are created using an algorithm and the results are used to explain patterns and predict future outcomes.

Model Explorer

The Model Explorer refers to an interactive section of the product interface where you can view performance measurements for each model in each stage and fine-tune the model with actionable items from the interface, such as retrain or deploy.

Output Attribute

The numeric Visit scoped data-layer attribute that stores the Prediction value generated by a corresponding Deployed Model. The Output Attribute is created by default when a new model is created.

Prediction Timeframe

The timeframe, in days weeks, or months for the which you want to predict when the action for your target attribute occurs. For example, a user’s “likelihood to return” in the next “x” days, weeks, or months.

Prediction Value

The score generated by a Deployed Model. This score represents the likelihood that the Target Attribute Value results are set to True during the visitor’s next visit (assuming the next visit occurs within the Prediction Time Window). This value is stored in the corresponding Output Attribute for the model.

Prediction Threshold

The numeric threshold value selected to measure a Prediction value against the Target Attribute value. For example, if the Prediction Threshold chosen is 0.5 and a Prediction Value is set to 0.51, it is assumed that the Target Attribute Value result is set to True.

Probability Distribution (Returning Visitor)

The probability distribution refers to a performance graph reported for a trained model. This graph shows how well the model separates the cases where a visitor did return and perform the action of interest as compared to cases in which the visitor did not return and perform the action. In industry terms, the probability distribution is a mathematical function in which the outcome provides probabilities of the occurrence of different outcomes of an experiment, and thus the probability of a predetermined event to occur.

Receiver Operating Characteristics (ROC)

The ROC/AUC (under the curve) refers to a performance measurement reported for a trained model. In industry terms, the ROC is known as a true positive rate calculated as the number of true positives divided by the sum of the number of true positives and the number of false negatives. The ROC describes how well a model predicts the positive class when the actual outcome is positive. The true positive rate is also referred to as sensitivity.

Retrain

The prediction accuracy for a model degrades over time. When you retrain a Tealium Predict model with new data, the prediction accuracy increases and remains more accurate over a longer period of time.

Strength Scores

The model strength score provides ratings that grade the quality of each version of each model. The strength scores include the F1 Score, Recall, Precision, and Accuracy. For a detailed explanation of these scoring elements, see Model Scores and Ratings.

Target Attribute

The target attribute is AudienceStream attribute selected to define your model and represent the user action being predicted. These attributes are visit or visitor-level booleans selected to signal that an action has been performed. For example, a boolean visit attribute named “Has Purchased” signals that a purchase event has occurred during a visit.

Training

In Tealium Predict,Training refers to the stage in which a model consumes and analyzes data for a predetermined period of time to be used for predictions. The size and quality of the data used during this stage is an important factor in the accuracy of results when you deploy the model.

Trained Version

In Tealium Predict, the trained version of a model refers to a singular instance of training a model. Every machine learning model has a version and each version is trained with data used to accurately make predictions.

Visitor’s Visit

The act of a Visitor visiting a website or the triggering of one or more data layer enriched events.