Model Schema

Model schema is a specification of input and output of a model, such as what are the features columns, prediction columns and also ground truth columns. Following are the fields in model schema:

Field	Type	Description	Mandatory
`id`	int	Unique identifier for each model schema	Not mandatory, if ID is not specified it will create new model schema otherwise it will update the model schema with corresponding ID
`model_id`	int	Model ID that correlate with the schema	Not mandatory, if not specified the SDK will assign it with the model that user set
`spec`	InferenceSchema	Detail specification for model schema	True

Detail specification is defined by using InferenceSchema class, following are the fields:

Field	Type	Description	Mandatory
`feature_types`	Dict[str, ValueType]	Mapping between feature name with the type of the feature	True
`model_prediction_output`	PredictionOutput	Prediction specification that differ between model types, e.g BinaryClassificationOutput, RegressionOutput, RankingOutput	True
`session_id_column`	str	The column name that is unique identifier for a request	True
`row_id_column`	str	The column name that is unique identifier for a row in a request	True
`tag_columns`	Optional[List[str]]	List of column names that contains additional information about prediction, you can treat it as metadata	False

From above we can see model_prediction_output field that has type PredictionOutput, this field is a specification of prediction that is generated by the model depending on it's model type. Currently we support 3 model types in the schema:

Binary Classification
Regression
Ranking

Each model type has it's own model prediction output specification.

Binary Classification

Model prediction output specification for Binary Classification type is BinaryClassificationOutput that has following fields:

Field	Type	Description	Mandatory
`prediction_score_column`	str	Column that contains prediction score value of a model. Prediction score must be between 0.0 and 1.0	True
`actual_label_column`	str	Name of the column containing the actual class	False, because not all model has the ground truth
`positive_class_label`	str	Label for positive class	True
`negative_class_label`	str	Label for negative class	True
`score_threshold`	float	Score threshold for prediction to be considered as positive class	False, if not specified it will use 0.5 as default

Regression

Model prediction output specification for Regression type is RegressionOutput that has following fields:

Field	Type	Description	Mandatory
`prediction_score_column`	str	Column that contains prediction score value of a model	True
`actual_score_column`	str	Name of the column containing the actual score	False, because not all model has the ground truth

Ranking

Model prediction output specification for Ranking type is RankingOutput that has following fields:

Field	Type	Description	Mandatory
`rank_score_column`	str	Name of the column containing the ranking score of the prediction	True
`prediction_group_id_column`	str	Name of the column containing the prediction group id	True
`relevance_score_column`	str	Name of the column containing the relevance score of the prediction	True

Define model schema

From the specification above, users can create the schema for their model. Suppose that users have binary classification model, that has 4 features

featureA that has float type
featureB that has int type
featureC that has string type
featureD that has float type

With positive class complete and negative class non_complete and the threshold for positive class is 0.75. Actual label is stored under column target, prediction_score under column score prediction_id under column prediction_id. From that specification, users can define the model schema and put it alongside version creation. Below is the example snipped code

from merlin.model_schema import ModelSchema
from merlin.observability.inference import InferenceSchema, ValueType, BinaryClassificationOutput
 model_schema = ModelSchema(spec=InferenceSchema(
        feature_types={
            "featureA": ValueType.FLOAT64,
            "featureB": ValueType.INT64,
            "featureC": ValueType.STRING,
            "featureD": ValueType.BOOLEAN
        },
        session_id_column="session_id",
        row_id_column="row_id",
        model_prediction_output=BinaryClassificationOutput(
            prediction_score_column="score",
            actual_label_column="target",
            positive_class_label="complete",
            negative_class_label="non_complete",
            score_threshold=0.75
        )
    ))
with merlin.new_model_version(model_schema=model_schema) as v:
    ....

The above snipped code will define model schema and attach it to certain model version, the reason is the schema for each version is possible to differ.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

10_model_schema.md

10_model_schema.md

Model Schema

Binary Classification

Regression

Ranking

Define model schema

Files

10_model_schema.md

Latest commit

History

10_model_schema.md

File metadata and controls

Model Schema

Binary Classification

Regression

Ranking

Define model schema