Model schema is a specification of input and output of a model, such as what are the features columns, prediction columns and also ground truth columns. Following are the fields in model schema:
Field | Type | Description | Mandatory |
---|---|---|---|
id |
int | Unique identifier for each model schema | Not mandatory, if ID is not specified it will create new model schema otherwise it will update the model schema with corresponding ID |
model_id |
int | Model ID that correlate with the schema | Not mandatory, if not specified the SDK will assign it with the model that user set |
spec |
InferenceSchema | Detail specification for model schema | True |
Detail specification is defined by using InferenceSchema
class, following are the fields:
Field | Type | Description | Mandatory |
---|---|---|---|
feature_types |
Dict[str, ValueType] | Mapping between feature name with the type of the feature | True |
model_prediction_output |
PredictionOutput | Prediction specification that differ between model types, e.g BinaryClassificationOutput, RegressionOutput, RankingOutput | True |
session_id_column |
str | The column name that is unique identifier for a request | True |
row_id_column |
str | The column name that is unique identifier for a row in a request | True |
tag_columns |
Optional[List[str]] | List of column names that contains additional information about prediction, you can treat it as metadata | False |
From above we can see model_prediction_output
field that has type PredictionOutput
, this field is a specification of prediction that is generated by the model depending on it's model type. Currently we support 3 model types in the schema:
- Binary Classification
- Regression
- Ranking
Each model type has it's own model prediction output specification.
Model prediction output specification for Binary Classification type is BinaryClassificationOutput
that has following fields:
Field | Type | Description | Mandatory |
---|---|---|---|
prediction_score_column |
str | Column that contains prediction score value of a model. Prediction score must be between 0.0 and 1.0 | True |
actual_label_column |
str | Name of the column containing the actual class | False, because not all model has the ground truth |
positive_class_label |
str | Label for positive class | True |
negative_class_label |
str | Label for negative class | True |
score_threshold |
float | Score threshold for prediction to be considered as positive class | False, if not specified it will use 0.5 as default |
Model prediction output specification for Regression type is RegressionOutput
that has following fields:
Field | Type | Description | Mandatory |
---|---|---|---|
prediction_score_column |
str | Column that contains prediction score value of a model | True |
actual_score_column |
str | Name of the column containing the actual score | False, because not all model has the ground truth |
Model prediction output specification for Ranking type is RankingOutput
that has following fields:
Field | Type | Description | Mandatory |
---|---|---|---|
rank_score_column |
str | Name of the column containing the ranking score of the prediction | True |
prediction_group_id_column |
str | Name of the column containing the prediction group id | True |
relevance_score_column |
str | Name of the column containing the relevance score of the prediction | True |
From the specification above, users can create the schema for their model. Suppose that users have binary classification model, that has 4 features
- featureA that has float type
- featureB that has int type
- featureC that has string type
- featureD that has float type
With positive class complete
and negative class non_complete
and the threshold for positive class is 0.75. Actual label is stored under column target
, prediction_score
under column score
prediction_id
under column prediction_id
. From that specification, users can define the model schema and put it alongside version creation. Below is the example snipped code
from merlin.model_schema import ModelSchema
from merlin.observability.inference import InferenceSchema, ValueType, BinaryClassificationOutput
model_schema = ModelSchema(spec=InferenceSchema(
feature_types={
"featureA": ValueType.FLOAT64,
"featureB": ValueType.INT64,
"featureC": ValueType.STRING,
"featureD": ValueType.BOOLEAN
},
session_id_column="session_id",
row_id_column="row_id",
model_prediction_output=BinaryClassificationOutput(
prediction_score_column="score",
actual_label_column="target",
positive_class_label="complete",
negative_class_label="non_complete",
score_threshold=0.75
)
))
with merlin.new_model_version(model_schema=model_schema) as v:
....
The above snipped code will define model schema and attach it to certain model version, the reason is the schema for each version is possible to differ.