Build Sprint 3: Machine Learning Model - Complete Implementation Guide

🎯 What You're Building

You're creating a Machine Learning interface class that predicts monster rarity based on attributes like Level, Health, Energy, and Sanity. This class will train a model, make predictions, save/load the model, and integrate with your API.

Before you start, make sure you have:

✅ Completed Build Sprint 1 and 2
✅ Your local environment set up
✅ Monster data from earlier sprints

📋 Sprint Overview

This sprint has 4 main deliverables:

Part A: Notebook Model Training & Tuning (Already Complete)

You've already trained and compared 3+ models in a notebook and selected your best model.

Part B: Machine Learning Interface Class (Main Focus)

Build the Machine class in app/machine.py with proper initialization, training, and prediction.

Part C: Model Serialization (Save & Load)

Add save() and open() methods so models persist between sessions.

Part D: API Model Integration (Info Method)

Add an info() method that returns model details for your API.

🚀 Part B: Machine Learning Interface Class

This is the core of your sprint. We'll build the Machine class step by step.

Task B1: Set Up Your File

Location: app/machine.py

Step 1: Add imports at the top of the file

from datetime import datetime
from pandas import DataFrame
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
import joblib

What these do:

datetime: Tracks when your model was created
DataFrame: Works with your monster data tables
RandomForestClassifier: The ML algorithm (you chose this in Part A!)
LabelEncoder: Converts text labels like "Common" to numbers
joblib: Saves and loads trained models

Task B2: Create the Class Structure

Add this below your imports:

class Machine:
    """
    Machine Learning interface for monster rarity prediction.
    Uses Random Forest Classifier to predict monster rarity based on attributes.
    """

What this does:

Creates a blueprint called Machine that you'll use to train and make predictions
The docstring describes what this class does

Task B3: Build the `init` Method ⭐ Key Requirement

The __init__ method runs when you create a new Machine object. It trains your model with the monster data.

Add this method inside your class:

    def __init__(self, df: DataFrame):
        """
        Initialize the machine learning model with training data.
        
        Args:
            df (DataFrame): DataFrame containing monster data with features and target
        """
        # Store model metadata
        self.name = "Random Forest Classifier"
        self.timestamp = datetime.now()
        
        # Separate target from features
        target = df["Rarity"]
        features = df[["Level", "Health", "Energy", "Sanity"]]
        
        # Encode the target (convert text to numbers)
        self.label_encoder = LabelEncoder()
        target_encoded = self.label_encoder.fit_transform(target)
        
        # Train the model
        self.model = RandomForestClassifier(n_estimators=100, random_state=42)
        self.model.fit(features, target_encoded)
        
        # Save feature names for later
        self.feature_names = features.columns.tolist()

Breaking it down line by line:

Lines 1-2: Store metadata

        self.name = "Random Forest Classifier"
        self.timestamp = datetime.now()

Saves the model name and creation time
You'll use these in Part D for the info() method

Lines 3-4: Separate data

        target = df["Rarity"]
        features = df[["Level", "Health", "Energy", "Sanity"]]

target: What you want to predict (the "answer" column)
features: The data you use to make predictions (the "input" columns)
Important: Double brackets [[...]] when selecting multiple columns!

Lines 5-6: Encode target

        self.label_encoder = LabelEncoder()
        target_encoded = self.label_encoder.fit_transform(target)

ML models need numbers, not text
This converts "Common" → 0, "Rare" → 1, "Epic" → 2
We save label_encoder to convert predictions back to text later

Lines 7-8: Train the model

        self.model = RandomForestClassifier(n_estimators=100, random_state=42)
        self.model.fit(features, target_encoded)

Creates a Random Forest with 100 decision trees
random_state=42 makes results reproducible
.fit() trains the model on your data

Line 9: Save feature names

        self.feature_names = features.columns.tolist()

Saves ['Level', 'Health', 'Energy', 'Sanity']
Ensures predictions use features in the same order as training

Task B4: Build the `call` Method ⭐ Key Requirement

The __call__ method makes predictions on new monster data. It must return both the prediction AND the probability.

Add this method inside your class:

    def __call__(self, pred_basis: DataFrame) -> tuple:
        """
        Make predictions on new data.
        
        Args:
            pred_basis (DataFrame): DataFrame containing feature data for prediction
            
        Returns:
            tuple: (prediction, confidence) where prediction is the predicted rarity
                   and confidence is the probability of the prediction
        """
        # Get features in the correct order
        features = pred_basis[self.feature_names]
        
        # Make prediction
        prediction_encoded = self.model.predict(features)
        prediction = self.label_encoder.inverse_transform(prediction_encoded)[0]
        
        # Get prediction probability
        probabilities = self.model.predict_proba(features)
        max_prob_index = prediction_encoded[0]
        confidence = probabilities[0][max_prob_index]
        
        return prediction, confidence

Breaking it down:

Line 1: Extract features

        features = pred_basis[self.feature_names]

Selects only the columns needed: Level, Health, Energy, Sanity
Puts them in the same order as training

Lines 2-3: Make prediction

        prediction_encoded = self.model.predict(features)
        prediction = self.label_encoder.inverse_transform(prediction_encoded)[0]

Step 1: Model predicts a number (e.g., [1])
Step 2: Convert that number back to text (e.g., "Rare")
[0] extracts the single prediction from the list

Lines 4-6: Calculate confidence

        probabilities = self.model.predict_proba(features)
        max_prob_index = prediction_encoded[0]
        confidence = probabilities[0][max_prob_index]

predict_proba() gives probabilities for each class
Example: [[0.1, 0.8, 0.1]] means 10% Common, 80% Rare, 10% Epic
max_prob_index is the predicted class (1 = Rare)
confidence is the probability at that index (0.8 = 80%)

Line 7: Return both values

        return prediction, confidence

⚠️ Must return BOTH the prediction and probability
Example return: ("Rare", 0.8)

🔒 Part C: Model Serialization

These methods let you save and load trained models so you don't have to retrain every time.

Task C1: Build the `save()` Method ⭐ Key Requirement

Add this method inside your class:

    def save(self, filepath: str) -> None:
        """
        Save the trained model to a file using joblib.
        
        Args:
            filepath (str): Path where to save the model
        """
        model_data = {
            'model': self.model,
            'label_encoder': self.label_encoder,
            'feature_names': self.feature_names,
            'name': self.name,
            'timestamp': self.timestamp
        }
        joblib.dump(model_data, filepath)

What this does:

Lines 1-6: Package everything

        model_data = {
            'model': self.model,
            'label_encoder': self.label_encoder,
            'feature_names': self.feature_names,
            'name': self.name,
            'timestamp': self.timestamp
        }

Creates a dictionary with everything needed to recreate the model
Why save all this? The model alone isn't enough - you need the encoder and feature names too

Line 7: Save to disk

        joblib.dump(model_data, filepath)

Saves the dictionary to a file (usually ends in .pkl)
Example: machine.save("monster_model.pkl")

Task C2: Build the `open()` Method ⭐ Key Requirement

Add this method inside your class:

    @staticmethod
    def open(filepath: str) -> 'Machine':
        """
        Load a saved model from a file using joblib.
        
        Args:
            filepath (str): Path to the saved model file
            
        Returns:
            Machine: Loaded machine learning model instance
        """
        model_data = joblib.load(filepath)
        
        # Create a new instance without calling __init__
        instance = Machine.__new__(Machine)
        
        # Restore all attributes
        instance.model = model_data['model']
        instance.label_encoder = model_data['label_encoder']
        instance.feature_names = model_data['feature_names']
        instance.name = model_data['name']
        instance.timestamp = model_data['timestamp']
        
        return instance

What this does:

Line 1: @staticmethod decorator

    @staticmethod

Makes this method callable without creating an object first
Call it like: Machine.open("file.pkl")

Line 2: Load the file

        model_data = joblib.load(filepath)

Loads the saved dictionary from disk

Line 3: Create empty object

        instance = Machine.__new__(Machine)

Creates an empty Machine object
Why not use Machine()? That would call __init__ and try to train again!

Lines 4-8: Fill in the attributes

        instance.model = model_data['model']
        instance.label_encoder = model_data['label_encoder']
        instance.feature_names = model_data['feature_names']
        instance.name = model_data['name']
        instance.timestamp = model_data['timestamp']

Takes each piece from the saved file and puts it back in the object

Line 9: Return the loaded model

        return instance

Returns a fully working Machine that's ready to make predictions

📊 Part D: API Model Integration

This part adds the info() method for your API endpoint.

Task D1: Build the `info()` Method ⭐ Key Requirement

Add this method inside your class:

    def info(self) -> str:
        """
        Get information about the model.
        
        Returns:
            str: String containing model name and initialization timestamp
        """
        return f"{self.name} initialized at {self.timestamp.strftime('%Y-%m-%d %H:%M:%S')}"

What this does:

Returns a formatted string with model info
self.name: The model name ("Random Forest Classifier")
self.timestamp.strftime(): Formats the timestamp nicely
Example output: "Random Forest Classifier initialized at 2024-12-21 14:30:45"

Why this matters:

Your API's /model endpoint will call this to show model details
Users can see which model is currently deployed and when it was trained

✅ Complete Code

Here's your complete app/machine.py file with all parts:

from datetime import datetime
from pandas import DataFrame
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
import joblib

class Machine:
    """
    Machine Learning interface for monster rarity prediction.
    Uses Random Forest Classifier to predict monster rarity based on attributes.
    """
    
    def __init__(self, df: DataFrame):
        """
        Initialize the machine learning model with training data.
        
        Args:
            df (DataFrame): DataFrame containing monster data with features and target
        """
        self.name = "Random Forest Classifier"
        self.timestamp = datetime.now()
        
        # Prepare target and features
        target = df["Rarity"]
        features = df[["Level", "Health", "Energy", "Sanity"]]
        
        # Encode target variable
        self.label_encoder = LabelEncoder()
        target_encoded = self.label_encoder.fit_transform(target)
        
        # Initialize and train the model
        self.model = RandomForestClassifier(n_estimators=100, random_state=42)
        self.model.fit(features, target_encoded)
        
        # Store feature names for reference
        self.feature_names = features.columns.tolist()
    
    def __call__(self, pred_basis: DataFrame) -> tuple:
        """
        Make predictions on new data.
        
        Args:
            pred_basis (DataFrame): DataFrame containing feature data for prediction
            
        Returns:
            tuple: (prediction, confidence) where prediction is the predicted rarity
                   and confidence is the probability of the prediction
        """
        # Ensure we have the correct features
        features = pred_basis[self.feature_names]
        
        # Make prediction
        prediction_encoded = self.model.predict(features)
        prediction = self.label_encoder.inverse_transform(prediction_encoded)[0]
        
        # Get prediction probability
        probabilities = self.model.predict_proba(features)
        max_prob_index = prediction_encoded[0]
        confidence = probabilities[0][max_prob_index]
        
        return prediction, confidence
    
    def save(self, filepath: str) -> None:
        """
        Save the trained model to a file using joblib.
        
        Args:
            filepath (str): Path where to save the model
        """
        model_data = {
            'model': self.model,
            'label_encoder': self.label_encoder,
            'feature_names': self.feature_names,
            'name': self.name,
            'timestamp': self.timestamp
        }
        joblib.dump(model_data, filepath)
    
    @staticmethod
    def open(filepath: str) -> 'Machine':
        """
        Load a saved model from a file using joblib.
        
        Args:
            filepath (str): Path to the saved model file
            
        Returns:
            Machine: Loaded machine learning model instance
        """
        model_data = joblib.load(filepath)
        
        # Create a new instance
        instance = Machine.__new__(Machine)
        
        # Restore attributes
        instance.model = model_data['model']
        instance.label_encoder = model_data['label_encoder']
        instance.feature_names = model_data['feature_names']
        instance.name = model_data['name']
        instance.timestamp = model_data['timestamp']
        
        return instance
    
    def info(self) -> str:
        """
        Get information about the model.
        
        Returns:
            str: String containing model name and initialization timestamp
        """
        return f"{self.name} initialized at {self.timestamp.strftime('%Y-%m-%d %H:%M:%S')}"

🧪 Testing Your Code

Test 1: Train and Predict

import pandas as pd
from machine import Machine

# Create sample data (use your actual data from Sprint 2!)
training_data = pd.DataFrame({
    'Level': [1, 5, 10, 15, 20, 3, 8, 12, 18, 25],
    'Health': [50, 100, 200, 300, 400, 75, 150, 250, 350, 500],
    'Energy': [20, 40, 60, 80, 100, 30, 50, 70, 90, 120],
    'Sanity': [100, 90, 80, 70, 60, 95, 85, 75, 65, 50],
    'Rarity': ['Common', 'Common', 'Rare', 'Rare', 'Epic', 
               'Common', 'Rare', 'Rare', 'Epic', 'Epic']
})

# Train the model
machine = Machine(training_data)
print(machine.info())

# Make a prediction
new_monster = pd.DataFrame({
    'Level': [7],
    'Health': [175],
    'Energy': [55],
    'Sanity': [88]
})

rarity, confidence = machine(new_monster)
print(f"Predicted: {rarity} with {confidence:.1%} confidence")

Expected output:

Random Forest Classifier initialized at 2024-12-21 14:30:45
Predicted: Rare with 75.0% confidence

Test 2: Save and Load

# Save the model
machine.save("monster_model.pkl")
print("✓ Model saved")

# Load the model
loaded = Machine.open("monster_model.pkl")
print("✓ Model loaded")
print(loaded.info())

# Test it works
rarity, confidence = loaded(new_monster)
print(f"✓ Prediction still works: {rarity}")

Expected output:

✓ Model saved
✓ Model loaded
Random Forest Classifier initialized at 2024-12-21 14:30:45
✓ Prediction still works: Rare

📝 Checklist Before Submitting

Go through each requirement:

Part B: Machine Learning Interface Class

__init__ initializes the model and stores it as self.model
__init__ properly handles target and feature data
__call__ takes a DataFrame and returns a tuple: (prediction, probability)
Predictions work correctly on new data

Part C: Model Serialization

save() saves the model using joblib to the specified filepath
open() loads a saved model from the specified filepath
Loaded models can still make predictions

Part D: API Model Integration

info() returns a string with model name and timestamp
The format matches: "Model Name initialized at YYYY-MM-DD HH:MM:SS"

🐛 Common Issues & Fixes

Issue: "KeyError: 'Rarity'"

Problem: Your DataFrame doesn't have the expected columns
Fix: Make sure your data has these exact columns: Level, Health, Energy, Sanity, Rarity

Issue: "call doesn't return probability"

Problem: You're only returning the prediction
Fix: Make sure you return a tuple: return prediction, confidence

Issue: "Can't find module 'machine'"

Problem: Import path is wrong
Fix: Make sure machine.py is in the app/ folder and you import with from app.machine import Machine

Issue: Loaded model gives different predictions

Problem: Features are in wrong order
Fix: Make sure you save and load self.feature_names correctly

🎥 Loom Video Tips

For your video submission, walk through:

Show your machine.py file and explain each method
Run your test code showing training, prediction, save, and load
Explain why you chose Random Forest (reference your Part A notebook)
Show the model working on the /model endpoint (if integrated)

🚀 Next Steps

After completing this sprint:

Commit your code to your forked repo
Test thoroughly - all four parts must work
Record your Loom video
Submit in your course with repo link and video link

Need help? Post in team-labs-current or open a support ticket!

💡 Key Takeaways

You've learned how to:

✅ Build a production-ready ML interface class
✅ Train and make predictions with scikit-learn
✅ Serialize models for persistence
✅ Integrate ML models with APIs
✅ Handle feature engineering (encoding, feature selection)

This pattern works for ANY ML model - you could swap Random Forest for another algorithm and the structure stays the same!

decagondev/ticket_3_tasks.md

Select an option

No results found

Select an option

No results found

Build Sprint 3: Machine Learning Model - Complete Implementation Guide

🎯 What You're Building

📋 Sprint Overview

Part A: Notebook Model Training & Tuning (Already Complete)

Part B: Machine Learning Interface Class (Main Focus)

Part C: Model Serialization (Save & Load)

Part D: API Model Integration (Info Method)

🚀 Part B: Machine Learning Interface Class

Task B1: Set Up Your File

Task B2: Create the Class Structure

Task B3: Build the `init` Method ⭐ Key Requirement

Task B4: Build the `call` Method ⭐ Key Requirement

🔒 Part C: Model Serialization

Task C1: Build the `save()` Method ⭐ Key Requirement

Task C2: Build the `open()` Method ⭐ Key Requirement

📊 Part D: API Model Integration

Task D1: Build the `info()` Method ⭐ Key Requirement

✅ Complete Code

🧪 Testing Your Code

Test 1: Train and Predict

Test 2: Save and Load

📝 Checklist Before Submitting

Part B: Machine Learning Interface Class

Part C: Model Serialization

Part D: API Model Integration

🐛 Common Issues & Fixes

Issue: "KeyError: 'Rarity'"

Issue: "call doesn't return probability"

Issue: "Can't find module 'machine'"

Issue: Loaded model gives different predictions

🎥 Loom Video Tips

🚀 Next Steps

💡 Key Takeaways

decagondev/ticket_3_tasks.md

Build Sprint 3: Machine Learning Model - Complete Implementation Guide

🎯 What You're Building

📋 Sprint Overview

Part A: Notebook Model Training & Tuning (Already Complete)

Part B: Machine Learning Interface Class (Main Focus)

Part C: Model Serialization (Save & Load)

Part D: API Model Integration (Info Method)

🚀 Part B: Machine Learning Interface Class

Task B1: Set Up Your File

Task B2: Create the Class Structure

Task B3: Build the __init__ Method ⭐ Key Requirement

Task B4: Build the __call__ Method ⭐ Key Requirement

🔒 Part C: Model Serialization

Task C1: Build the save() Method ⭐ Key Requirement

Task C2: Build the open() Method ⭐ Key Requirement

📊 Part D: API Model Integration

Task D1: Build the info() Method ⭐ Key Requirement

✅ Complete Code

🧪 Testing Your Code

Test 1: Train and Predict

Test 2: Save and Load

📝 Checklist Before Submitting

Part B: Machine Learning Interface Class

Part C: Model Serialization

Part D: API Model Integration

🐛 Common Issues & Fixes

Issue: "KeyError: 'Rarity'"

Issue: "call doesn't return probability"

Issue: "Can't find module 'machine'"

Issue: Loaded model gives different predictions

🎥 Loom Video Tips

🚀 Next Steps

💡 Key Takeaways

Task B3: Build the `init` Method ⭐ Key Requirement

Task B4: Build the `call` Method ⭐ Key Requirement

Task C1: Build the `save()` Method ⭐ Key Requirement

Task C2: Build the `open()` Method ⭐ Key Requirement

Task D1: Build the `info()` Method ⭐ Key Requirement