You're creating a Machine Learning interface class that predicts monster rarity based on attributes like Level, Health, Energy, and Sanity. This class will train a model, make predictions, save/load the model, and integrate with your API.
Before you start, make sure you have:
- β Completed Build Sprint 1 and 2
- β Your local environment set up
- β Monster data from earlier sprints
This sprint has 4 main deliverables:
You've already trained and compared 3+ models in a notebook and selected your best model.
Build the Machine class in app/machine.py with proper initialization, training, and prediction.
Add save() and open() methods so models persist between sessions.
Add an info() method that returns model details for your API.
This is the core of your sprint. We'll build the Machine class step by step.
Location: app/machine.py
Step 1: Add imports at the top of the file
from datetime import datetime
from pandas import DataFrame
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
import joblibWhat these do:
datetime: Tracks when your model was createdDataFrame: Works with your monster data tablesRandomForestClassifier: The ML algorithm (you chose this in Part A!)LabelEncoder: Converts text labels like "Common" to numbersjoblib: Saves and loads trained models
Add this below your imports:
class Machine:
"""
Machine Learning interface for monster rarity prediction.
Uses Random Forest Classifier to predict monster rarity based on attributes.
"""What this does:
- Creates a blueprint called
Machinethat you'll use to train and make predictions - The docstring describes what this class does
The __init__ method runs when you create a new Machine object. It trains your model with the monster data.
Add this method inside your class:
def __init__(self, df: DataFrame):
"""
Initialize the machine learning model with training data.
Args:
df (DataFrame): DataFrame containing monster data with features and target
"""
# Store model metadata
self.name = "Random Forest Classifier"
self.timestamp = datetime.now()
# Separate target from features
target = df["Rarity"]
features = df[["Level", "Health", "Energy", "Sanity"]]
# Encode the target (convert text to numbers)
self.label_encoder = LabelEncoder()
target_encoded = self.label_encoder.fit_transform(target)
# Train the model
self.model = RandomForestClassifier(n_estimators=100, random_state=42)
self.model.fit(features, target_encoded)
# Save feature names for later
self.feature_names = features.columns.tolist()Breaking it down line by line:
Lines 1-2: Store metadata
self.name = "Random Forest Classifier"
self.timestamp = datetime.now()- Saves the model name and creation time
- You'll use these in Part D for the
info()method
Lines 3-4: Separate data
target = df["Rarity"]
features = df[["Level", "Health", "Energy", "Sanity"]]target: What you want to predict (the "answer" column)features: The data you use to make predictions (the "input" columns)- Important: Double brackets
[[...]]when selecting multiple columns!
Lines 5-6: Encode target
self.label_encoder = LabelEncoder()
target_encoded = self.label_encoder.fit_transform(target)- ML models need numbers, not text
- This converts "Common" β 0, "Rare" β 1, "Epic" β 2
- We save
label_encoderto convert predictions back to text later
Lines 7-8: Train the model
self.model = RandomForestClassifier(n_estimators=100, random_state=42)
self.model.fit(features, target_encoded)- Creates a Random Forest with 100 decision trees
random_state=42makes results reproducible.fit()trains the model on your data
Line 9: Save feature names
self.feature_names = features.columns.tolist()- Saves
['Level', 'Health', 'Energy', 'Sanity'] - Ensures predictions use features in the same order as training
The __call__ method makes predictions on new monster data. It must return both the prediction AND the probability.
Add this method inside your class:
def __call__(self, pred_basis: DataFrame) -> tuple:
"""
Make predictions on new data.
Args:
pred_basis (DataFrame): DataFrame containing feature data for prediction
Returns:
tuple: (prediction, confidence) where prediction is the predicted rarity
and confidence is the probability of the prediction
"""
# Get features in the correct order
features = pred_basis[self.feature_names]
# Make prediction
prediction_encoded = self.model.predict(features)
prediction = self.label_encoder.inverse_transform(prediction_encoded)[0]
# Get prediction probability
probabilities = self.model.predict_proba(features)
max_prob_index = prediction_encoded[0]
confidence = probabilities[0][max_prob_index]
return prediction, confidenceBreaking it down:
Line 1: Extract features
features = pred_basis[self.feature_names]- Selects only the columns needed: Level, Health, Energy, Sanity
- Puts them in the same order as training
Lines 2-3: Make prediction
prediction_encoded = self.model.predict(features)
prediction = self.label_encoder.inverse_transform(prediction_encoded)[0]- Step 1: Model predicts a number (e.g.,
[1]) - Step 2: Convert that number back to text (e.g.,
"Rare") [0]extracts the single prediction from the list
Lines 4-6: Calculate confidence
probabilities = self.model.predict_proba(features)
max_prob_index = prediction_encoded[0]
confidence = probabilities[0][max_prob_index]predict_proba()gives probabilities for each class- Example:
[[0.1, 0.8, 0.1]]means 10% Common, 80% Rare, 10% Epic max_prob_indexis the predicted class (1 = Rare)confidenceis the probability at that index (0.8 = 80%)
Line 7: Return both values
return prediction, confidenceβ οΈ Must return BOTH the prediction and probability- Example return:
("Rare", 0.8)
These methods let you save and load trained models so you don't have to retrain every time.
Add this method inside your class:
def save(self, filepath: str) -> None:
"""
Save the trained model to a file using joblib.
Args:
filepath (str): Path where to save the model
"""
model_data = {
'model': self.model,
'label_encoder': self.label_encoder,
'feature_names': self.feature_names,
'name': self.name,
'timestamp': self.timestamp
}
joblib.dump(model_data, filepath)What this does:
Lines 1-6: Package everything
model_data = {
'model': self.model,
'label_encoder': self.label_encoder,
'feature_names': self.feature_names,
'name': self.name,
'timestamp': self.timestamp
}- Creates a dictionary with everything needed to recreate the model
- Why save all this? The model alone isn't enough - you need the encoder and feature names too
Line 7: Save to disk
joblib.dump(model_data, filepath)- Saves the dictionary to a file (usually ends in
.pkl) - Example:
machine.save("monster_model.pkl")
Add this method inside your class:
@staticmethod
def open(filepath: str) -> 'Machine':
"""
Load a saved model from a file using joblib.
Args:
filepath (str): Path to the saved model file
Returns:
Machine: Loaded machine learning model instance
"""
model_data = joblib.load(filepath)
# Create a new instance without calling __init__
instance = Machine.__new__(Machine)
# Restore all attributes
instance.model = model_data['model']
instance.label_encoder = model_data['label_encoder']
instance.feature_names = model_data['feature_names']
instance.name = model_data['name']
instance.timestamp = model_data['timestamp']
return instanceWhat this does:
Line 1: @staticmethod decorator
@staticmethod- Makes this method callable without creating an object first
- Call it like:
Machine.open("file.pkl")
Line 2: Load the file
model_data = joblib.load(filepath)- Loads the saved dictionary from disk
Line 3: Create empty object
instance = Machine.__new__(Machine)- Creates an empty Machine object
- Why not use
Machine()? That would call__init__and try to train again!
Lines 4-8: Fill in the attributes
instance.model = model_data['model']
instance.label_encoder = model_data['label_encoder']
instance.feature_names = model_data['feature_names']
instance.name = model_data['name']
instance.timestamp = model_data['timestamp']- Takes each piece from the saved file and puts it back in the object
Line 9: Return the loaded model
return instance- Returns a fully working Machine that's ready to make predictions
This part adds the info() method for your API endpoint.
Add this method inside your class:
def info(self) -> str:
"""
Get information about the model.
Returns:
str: String containing model name and initialization timestamp
"""
return f"{self.name} initialized at {self.timestamp.strftime('%Y-%m-%d %H:%M:%S')}"What this does:
- Returns a formatted string with model info
self.name: The model name ("Random Forest Classifier")self.timestamp.strftime(): Formats the timestamp nicely- Example output:
"Random Forest Classifier initialized at 2024-12-21 14:30:45"
Why this matters:
- Your API's
/modelendpoint will call this to show model details - Users can see which model is currently deployed and when it was trained
Here's your complete app/machine.py file with all parts:
from datetime import datetime
from pandas import DataFrame
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
import joblib
class Machine:
"""
Machine Learning interface for monster rarity prediction.
Uses Random Forest Classifier to predict monster rarity based on attributes.
"""
def __init__(self, df: DataFrame):
"""
Initialize the machine learning model with training data.
Args:
df (DataFrame): DataFrame containing monster data with features and target
"""
self.name = "Random Forest Classifier"
self.timestamp = datetime.now()
# Prepare target and features
target = df["Rarity"]
features = df[["Level", "Health", "Energy", "Sanity"]]
# Encode target variable
self.label_encoder = LabelEncoder()
target_encoded = self.label_encoder.fit_transform(target)
# Initialize and train the model
self.model = RandomForestClassifier(n_estimators=100, random_state=42)
self.model.fit(features, target_encoded)
# Store feature names for reference
self.feature_names = features.columns.tolist()
def __call__(self, pred_basis: DataFrame) -> tuple:
"""
Make predictions on new data.
Args:
pred_basis (DataFrame): DataFrame containing feature data for prediction
Returns:
tuple: (prediction, confidence) where prediction is the predicted rarity
and confidence is the probability of the prediction
"""
# Ensure we have the correct features
features = pred_basis[self.feature_names]
# Make prediction
prediction_encoded = self.model.predict(features)
prediction = self.label_encoder.inverse_transform(prediction_encoded)[0]
# Get prediction probability
probabilities = self.model.predict_proba(features)
max_prob_index = prediction_encoded[0]
confidence = probabilities[0][max_prob_index]
return prediction, confidence
def save(self, filepath: str) -> None:
"""
Save the trained model to a file using joblib.
Args:
filepath (str): Path where to save the model
"""
model_data = {
'model': self.model,
'label_encoder': self.label_encoder,
'feature_names': self.feature_names,
'name': self.name,
'timestamp': self.timestamp
}
joblib.dump(model_data, filepath)
@staticmethod
def open(filepath: str) -> 'Machine':
"""
Load a saved model from a file using joblib.
Args:
filepath (str): Path to the saved model file
Returns:
Machine: Loaded machine learning model instance
"""
model_data = joblib.load(filepath)
# Create a new instance
instance = Machine.__new__(Machine)
# Restore attributes
instance.model = model_data['model']
instance.label_encoder = model_data['label_encoder']
instance.feature_names = model_data['feature_names']
instance.name = model_data['name']
instance.timestamp = model_data['timestamp']
return instance
def info(self) -> str:
"""
Get information about the model.
Returns:
str: String containing model name and initialization timestamp
"""
return f"{self.name} initialized at {self.timestamp.strftime('%Y-%m-%d %H:%M:%S')}"import pandas as pd
from machine import Machine
# Create sample data (use your actual data from Sprint 2!)
training_data = pd.DataFrame({
'Level': [1, 5, 10, 15, 20, 3, 8, 12, 18, 25],
'Health': [50, 100, 200, 300, 400, 75, 150, 250, 350, 500],
'Energy': [20, 40, 60, 80, 100, 30, 50, 70, 90, 120],
'Sanity': [100, 90, 80, 70, 60, 95, 85, 75, 65, 50],
'Rarity': ['Common', 'Common', 'Rare', 'Rare', 'Epic',
'Common', 'Rare', 'Rare', 'Epic', 'Epic']
})
# Train the model
machine = Machine(training_data)
print(machine.info())
# Make a prediction
new_monster = pd.DataFrame({
'Level': [7],
'Health': [175],
'Energy': [55],
'Sanity': [88]
})
rarity, confidence = machine(new_monster)
print(f"Predicted: {rarity} with {confidence:.1%} confidence")Expected output:
Random Forest Classifier initialized at 2024-12-21 14:30:45
Predicted: Rare with 75.0% confidence
# Save the model
machine.save("monster_model.pkl")
print("β Model saved")
# Load the model
loaded = Machine.open("monster_model.pkl")
print("β Model loaded")
print(loaded.info())
# Test it works
rarity, confidence = loaded(new_monster)
print(f"β Prediction still works: {rarity}")Expected output:
β Model saved
β Model loaded
Random Forest Classifier initialized at 2024-12-21 14:30:45
β Prediction still works: Rare
Go through each requirement:
-
__init__initializes the model and stores it asself.model -
__init__properly handles target and feature data -
__call__takes a DataFrame and returns a tuple:(prediction, probability) - Predictions work correctly on new data
-
save()saves the model using joblib to the specified filepath -
open()loads a saved model from the specified filepath - Loaded models can still make predictions
-
info()returns a string with model name and timestamp - The format matches:
"Model Name initialized at YYYY-MM-DD HH:MM:SS"
Problem: Your DataFrame doesn't have the expected columns
Fix: Make sure your data has these exact columns: Level, Health, Energy, Sanity, Rarity
Problem: You're only returning the prediction
Fix: Make sure you return a tuple: return prediction, confidence
Problem: Import path is wrong
Fix: Make sure machine.py is in the app/ folder and you import with from app.machine import Machine
Problem: Features are in wrong order
Fix: Make sure you save and load self.feature_names correctly
For your video submission, walk through:
- Show your
machine.pyfile and explain each method - Run your test code showing training, prediction, save, and load
- Explain why you chose Random Forest (reference your Part A notebook)
- Show the model working on the
/modelendpoint (if integrated)
After completing this sprint:
- Commit your code to your forked repo
- Test thoroughly - all four parts must work
- Record your Loom video
- Submit in your course with repo link and video link
Need help? Post in team-labs-current or open a support ticket!
You've learned how to:
- β Build a production-ready ML interface class
- β Train and make predictions with scikit-learn
- β Serialize models for persistence
- β Integrate ML models with APIs
- β Handle feature engineering (encoding, feature selection)
This pattern works for ANY ML model - you could swap Random Forest for another algorithm and the structure stays the same!