Skip to content

Latest commit

 

History

History
92 lines (74 loc) · 5.24 KB

10_model_schema.md

File metadata and controls

92 lines (74 loc) · 5.24 KB

Model Schema

Model schema is a specification of input and output of a model, such as what are the features columns, prediction columns and also ground truth columns. Following are the fields in model schema:

Field Type Description Mandatory
id int Unique identifier for each model schema Not mandatory, if ID is not specified it will create new model schema otherwise it will update the model schema with corresponding ID
model_id int Model ID that correlate with the schema Not mandatory, if not specified the SDK will assign it with the model that user set
spec InferenceSchema Detail specification for model schema True

Detail specification is defined by using InferenceSchema class, following are the fields:

Field Type Description Mandatory
feature_types Dict[str, ValueType] Mapping between feature name with the type of the feature True
model_prediction_output PredictionOutput Prediction specification that differ between model types, e.g BinaryClassificationOutput, RegressionOutput, RankingOutput True
session_id_column str The column name that is unique identifier for a request True
row_id_column str The column name that is unique identifier for a row in a request True
tag_columns Optional[List[str]] List of column names that contains additional information about prediction, you can treat it as metadata False

From above we can see model_prediction_output field that has type PredictionOutput, this field is a specification of prediction that is generated by the model depending on it's model type. Currently we support 3 model types in the schema:

  • Binary Classification
  • Regression
  • Ranking

Each model type has it's own model prediction output specification.

Binary Classification

Model prediction output specification for Binary Classification type is BinaryClassificationOutput that has following fields:

Field Type Description Mandatory
prediction_score_column str Column that contains prediction score value of a model. Prediction score must be between 0.0 and 1.0 True
actual_label_column str Name of the column containing the actual class False, because not all model has the ground truth
positive_class_label str Label for positive class True
negative_class_label str Label for negative class True
score_threshold float Score threshold for prediction to be considered as positive class False, if not specified it will use 0.5 as default

Regression

Model prediction output specification for Regression type is RegressionOutput that has following fields:

Field Type Description Mandatory
prediction_score_column str Column that contains prediction score value of a model True
actual_score_column str Name of the column containing the actual score False, because not all model has the ground truth

Ranking

Model prediction output specification for Ranking type is RankingOutput that has following fields:

Field Type Description Mandatory
rank_score_column str Name of the column containing the ranking score of the prediction True
prediction_group_id_column str Name of the column containing the prediction group id True
relevance_score_column str Name of the column containing the relevance score of the prediction True

Define model schema

From the specification above, users can create the schema for their model. Suppose that users have binary classification model, that has 4 features

  • featureA that has float type
  • featureB that has int type
  • featureC that has string type
  • featureD that has float type

With positive class complete and negative class non_complete and the threshold for positive class is 0.75. Actual label is stored under column target, prediction_score under column score prediction_id under column prediction_id. From that specification, users can define the model schema and put it alongside version creation. Below is the example snipped code

from merlin.model_schema import ModelSchema
from merlin.observability.inference import InferenceSchema, ValueType, BinaryClassificationOutput
 model_schema = ModelSchema(spec=InferenceSchema(
        feature_types={
            "featureA": ValueType.FLOAT64,
            "featureB": ValueType.INT64,
            "featureC": ValueType.STRING,
            "featureD": ValueType.BOOLEAN
        },
        session_id_column="session_id",
        row_id_column="row_id",
        model_prediction_output=BinaryClassificationOutput(
            prediction_score_column="score",
            actual_label_column="target",
            positive_class_label="complete",
            negative_class_label="non_complete",
            score_threshold=0.75
        )
    ))
with merlin.new_model_version(model_schema=model_schema) as v:
    ....

The above snipped code will define model schema and attach it to certain model version, the reason is the schema for each version is possible to differ.