Skip to content

Instantly share code, notes, and snippets.

@decagondev
Created December 21, 2025 15:39
Show Gist options
  • Select an option

  • Save decagondev/61cadc1231458eb209ba9e0800063cb9 to your computer and use it in GitHub Desktop.

Select an option

Save decagondev/61cadc1231458eb209ba9e0800063cb9 to your computer and use it in GitHub Desktop.

Build Sprint 3: Machine Learning Model - Complete Implementation Guide

🎯 What You're Building

You're creating a Machine Learning interface class that predicts monster rarity based on attributes like Level, Health, Energy, and Sanity. This class will train a model, make predictions, save/load the model, and integrate with your API.

Before you start, make sure you have:

  • βœ… Completed Build Sprint 1 and 2
  • βœ… Your local environment set up
  • βœ… Monster data from earlier sprints

πŸ“‹ Sprint Overview

This sprint has 4 main deliverables:

Part A: Notebook Model Training & Tuning (Already Complete)

You've already trained and compared 3+ models in a notebook and selected your best model.

Part B: Machine Learning Interface Class (Main Focus)

Build the Machine class in app/machine.py with proper initialization, training, and prediction.

Part C: Model Serialization (Save & Load)

Add save() and open() methods so models persist between sessions.

Part D: API Model Integration (Info Method)

Add an info() method that returns model details for your API.


πŸš€ Part B: Machine Learning Interface Class

This is the core of your sprint. We'll build the Machine class step by step.


Task B1: Set Up Your File

Location: app/machine.py

Step 1: Add imports at the top of the file

from datetime import datetime
from pandas import DataFrame
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
import joblib

What these do:

  • datetime: Tracks when your model was created
  • DataFrame: Works with your monster data tables
  • RandomForestClassifier: The ML algorithm (you chose this in Part A!)
  • LabelEncoder: Converts text labels like "Common" to numbers
  • joblib: Saves and loads trained models

Task B2: Create the Class Structure

Add this below your imports:

class Machine:
    """
    Machine Learning interface for monster rarity prediction.
    Uses Random Forest Classifier to predict monster rarity based on attributes.
    """

What this does:

  • Creates a blueprint called Machine that you'll use to train and make predictions
  • The docstring describes what this class does

Task B3: Build the __init__ Method ⭐ Key Requirement

The __init__ method runs when you create a new Machine object. It trains your model with the monster data.

Add this method inside your class:

    def __init__(self, df: DataFrame):
        """
        Initialize the machine learning model with training data.
        
        Args:
            df (DataFrame): DataFrame containing monster data with features and target
        """
        # Store model metadata
        self.name = "Random Forest Classifier"
        self.timestamp = datetime.now()
        
        # Separate target from features
        target = df["Rarity"]
        features = df[["Level", "Health", "Energy", "Sanity"]]
        
        # Encode the target (convert text to numbers)
        self.label_encoder = LabelEncoder()
        target_encoded = self.label_encoder.fit_transform(target)
        
        # Train the model
        self.model = RandomForestClassifier(n_estimators=100, random_state=42)
        self.model.fit(features, target_encoded)
        
        # Save feature names for later
        self.feature_names = features.columns.tolist()

Breaking it down line by line:

Lines 1-2: Store metadata

        self.name = "Random Forest Classifier"
        self.timestamp = datetime.now()
  • Saves the model name and creation time
  • You'll use these in Part D for the info() method

Lines 3-4: Separate data

        target = df["Rarity"]
        features = df[["Level", "Health", "Energy", "Sanity"]]
  • target: What you want to predict (the "answer" column)
  • features: The data you use to make predictions (the "input" columns)
  • Important: Double brackets [[...]] when selecting multiple columns!

Lines 5-6: Encode target

        self.label_encoder = LabelEncoder()
        target_encoded = self.label_encoder.fit_transform(target)
  • ML models need numbers, not text
  • This converts "Common" β†’ 0, "Rare" β†’ 1, "Epic" β†’ 2
  • We save label_encoder to convert predictions back to text later

Lines 7-8: Train the model

        self.model = RandomForestClassifier(n_estimators=100, random_state=42)
        self.model.fit(features, target_encoded)
  • Creates a Random Forest with 100 decision trees
  • random_state=42 makes results reproducible
  • .fit() trains the model on your data

Line 9: Save feature names

        self.feature_names = features.columns.tolist()
  • Saves ['Level', 'Health', 'Energy', 'Sanity']
  • Ensures predictions use features in the same order as training

Task B4: Build the __call__ Method ⭐ Key Requirement

The __call__ method makes predictions on new monster data. It must return both the prediction AND the probability.

Add this method inside your class:

    def __call__(self, pred_basis: DataFrame) -> tuple:
        """
        Make predictions on new data.
        
        Args:
            pred_basis (DataFrame): DataFrame containing feature data for prediction
            
        Returns:
            tuple: (prediction, confidence) where prediction is the predicted rarity
                   and confidence is the probability of the prediction
        """
        # Get features in the correct order
        features = pred_basis[self.feature_names]
        
        # Make prediction
        prediction_encoded = self.model.predict(features)
        prediction = self.label_encoder.inverse_transform(prediction_encoded)[0]
        
        # Get prediction probability
        probabilities = self.model.predict_proba(features)
        max_prob_index = prediction_encoded[0]
        confidence = probabilities[0][max_prob_index]
        
        return prediction, confidence

Breaking it down:

Line 1: Extract features

        features = pred_basis[self.feature_names]
  • Selects only the columns needed: Level, Health, Energy, Sanity
  • Puts them in the same order as training

Lines 2-3: Make prediction

        prediction_encoded = self.model.predict(features)
        prediction = self.label_encoder.inverse_transform(prediction_encoded)[0]
  • Step 1: Model predicts a number (e.g., [1])
  • Step 2: Convert that number back to text (e.g., "Rare")
  • [0] extracts the single prediction from the list

Lines 4-6: Calculate confidence

        probabilities = self.model.predict_proba(features)
        max_prob_index = prediction_encoded[0]
        confidence = probabilities[0][max_prob_index]
  • predict_proba() gives probabilities for each class
  • Example: [[0.1, 0.8, 0.1]] means 10% Common, 80% Rare, 10% Epic
  • max_prob_index is the predicted class (1 = Rare)
  • confidence is the probability at that index (0.8 = 80%)

Line 7: Return both values

        return prediction, confidence
  • ⚠️ Must return BOTH the prediction and probability
  • Example return: ("Rare", 0.8)

πŸ”’ Part C: Model Serialization

These methods let you save and load trained models so you don't have to retrain every time.


Task C1: Build the save() Method ⭐ Key Requirement

Add this method inside your class:

    def save(self, filepath: str) -> None:
        """
        Save the trained model to a file using joblib.
        
        Args:
            filepath (str): Path where to save the model
        """
        model_data = {
            'model': self.model,
            'label_encoder': self.label_encoder,
            'feature_names': self.feature_names,
            'name': self.name,
            'timestamp': self.timestamp
        }
        joblib.dump(model_data, filepath)

What this does:

Lines 1-6: Package everything

        model_data = {
            'model': self.model,
            'label_encoder': self.label_encoder,
            'feature_names': self.feature_names,
            'name': self.name,
            'timestamp': self.timestamp
        }
  • Creates a dictionary with everything needed to recreate the model
  • Why save all this? The model alone isn't enough - you need the encoder and feature names too

Line 7: Save to disk

        joblib.dump(model_data, filepath)
  • Saves the dictionary to a file (usually ends in .pkl)
  • Example: machine.save("monster_model.pkl")

Task C2: Build the open() Method ⭐ Key Requirement

Add this method inside your class:

    @staticmethod
    def open(filepath: str) -> 'Machine':
        """
        Load a saved model from a file using joblib.
        
        Args:
            filepath (str): Path to the saved model file
            
        Returns:
            Machine: Loaded machine learning model instance
        """
        model_data = joblib.load(filepath)
        
        # Create a new instance without calling __init__
        instance = Machine.__new__(Machine)
        
        # Restore all attributes
        instance.model = model_data['model']
        instance.label_encoder = model_data['label_encoder']
        instance.feature_names = model_data['feature_names']
        instance.name = model_data['name']
        instance.timestamp = model_data['timestamp']
        
        return instance

What this does:

Line 1: @staticmethod decorator

    @staticmethod
  • Makes this method callable without creating an object first
  • Call it like: Machine.open("file.pkl")

Line 2: Load the file

        model_data = joblib.load(filepath)
  • Loads the saved dictionary from disk

Line 3: Create empty object

        instance = Machine.__new__(Machine)
  • Creates an empty Machine object
  • Why not use Machine()? That would call __init__ and try to train again!

Lines 4-8: Fill in the attributes

        instance.model = model_data['model']
        instance.label_encoder = model_data['label_encoder']
        instance.feature_names = model_data['feature_names']
        instance.name = model_data['name']
        instance.timestamp = model_data['timestamp']
  • Takes each piece from the saved file and puts it back in the object

Line 9: Return the loaded model

        return instance
  • Returns a fully working Machine that's ready to make predictions

πŸ“Š Part D: API Model Integration

This part adds the info() method for your API endpoint.


Task D1: Build the info() Method ⭐ Key Requirement

Add this method inside your class:

    def info(self) -> str:
        """
        Get information about the model.
        
        Returns:
            str: String containing model name and initialization timestamp
        """
        return f"{self.name} initialized at {self.timestamp.strftime('%Y-%m-%d %H:%M:%S')}"

What this does:

  • Returns a formatted string with model info
  • self.name: The model name ("Random Forest Classifier")
  • self.timestamp.strftime(): Formats the timestamp nicely
  • Example output: "Random Forest Classifier initialized at 2024-12-21 14:30:45"

Why this matters:

  • Your API's /model endpoint will call this to show model details
  • Users can see which model is currently deployed and when it was trained

βœ… Complete Code

Here's your complete app/machine.py file with all parts:

from datetime import datetime
from pandas import DataFrame
from sklearn.ensemble import RandomForestClassifier
from sklearn.preprocessing import LabelEncoder
import joblib

class Machine:
    """
    Machine Learning interface for monster rarity prediction.
    Uses Random Forest Classifier to predict monster rarity based on attributes.
    """
    
    def __init__(self, df: DataFrame):
        """
        Initialize the machine learning model with training data.
        
        Args:
            df (DataFrame): DataFrame containing monster data with features and target
        """
        self.name = "Random Forest Classifier"
        self.timestamp = datetime.now()
        
        # Prepare target and features
        target = df["Rarity"]
        features = df[["Level", "Health", "Energy", "Sanity"]]
        
        # Encode target variable
        self.label_encoder = LabelEncoder()
        target_encoded = self.label_encoder.fit_transform(target)
        
        # Initialize and train the model
        self.model = RandomForestClassifier(n_estimators=100, random_state=42)
        self.model.fit(features, target_encoded)
        
        # Store feature names for reference
        self.feature_names = features.columns.tolist()
    
    def __call__(self, pred_basis: DataFrame) -> tuple:
        """
        Make predictions on new data.
        
        Args:
            pred_basis (DataFrame): DataFrame containing feature data for prediction
            
        Returns:
            tuple: (prediction, confidence) where prediction is the predicted rarity
                   and confidence is the probability of the prediction
        """
        # Ensure we have the correct features
        features = pred_basis[self.feature_names]
        
        # Make prediction
        prediction_encoded = self.model.predict(features)
        prediction = self.label_encoder.inverse_transform(prediction_encoded)[0]
        
        # Get prediction probability
        probabilities = self.model.predict_proba(features)
        max_prob_index = prediction_encoded[0]
        confidence = probabilities[0][max_prob_index]
        
        return prediction, confidence
    
    def save(self, filepath: str) -> None:
        """
        Save the trained model to a file using joblib.
        
        Args:
            filepath (str): Path where to save the model
        """
        model_data = {
            'model': self.model,
            'label_encoder': self.label_encoder,
            'feature_names': self.feature_names,
            'name': self.name,
            'timestamp': self.timestamp
        }
        joblib.dump(model_data, filepath)
    
    @staticmethod
    def open(filepath: str) -> 'Machine':
        """
        Load a saved model from a file using joblib.
        
        Args:
            filepath (str): Path to the saved model file
            
        Returns:
            Machine: Loaded machine learning model instance
        """
        model_data = joblib.load(filepath)
        
        # Create a new instance
        instance = Machine.__new__(Machine)
        
        # Restore attributes
        instance.model = model_data['model']
        instance.label_encoder = model_data['label_encoder']
        instance.feature_names = model_data['feature_names']
        instance.name = model_data['name']
        instance.timestamp = model_data['timestamp']
        
        return instance
    
    def info(self) -> str:
        """
        Get information about the model.
        
        Returns:
            str: String containing model name and initialization timestamp
        """
        return f"{self.name} initialized at {self.timestamp.strftime('%Y-%m-%d %H:%M:%S')}"

πŸ§ͺ Testing Your Code

Test 1: Train and Predict

import pandas as pd
from machine import Machine

# Create sample data (use your actual data from Sprint 2!)
training_data = pd.DataFrame({
    'Level': [1, 5, 10, 15, 20, 3, 8, 12, 18, 25],
    'Health': [50, 100, 200, 300, 400, 75, 150, 250, 350, 500],
    'Energy': [20, 40, 60, 80, 100, 30, 50, 70, 90, 120],
    'Sanity': [100, 90, 80, 70, 60, 95, 85, 75, 65, 50],
    'Rarity': ['Common', 'Common', 'Rare', 'Rare', 'Epic', 
               'Common', 'Rare', 'Rare', 'Epic', 'Epic']
})

# Train the model
machine = Machine(training_data)
print(machine.info())

# Make a prediction
new_monster = pd.DataFrame({
    'Level': [7],
    'Health': [175],
    'Energy': [55],
    'Sanity': [88]
})

rarity, confidence = machine(new_monster)
print(f"Predicted: {rarity} with {confidence:.1%} confidence")

Expected output:

Random Forest Classifier initialized at 2024-12-21 14:30:45
Predicted: Rare with 75.0% confidence

Test 2: Save and Load

# Save the model
machine.save("monster_model.pkl")
print("βœ“ Model saved")

# Load the model
loaded = Machine.open("monster_model.pkl")
print("βœ“ Model loaded")
print(loaded.info())

# Test it works
rarity, confidence = loaded(new_monster)
print(f"βœ“ Prediction still works: {rarity}")

Expected output:

βœ“ Model saved
βœ“ Model loaded
Random Forest Classifier initialized at 2024-12-21 14:30:45
βœ“ Prediction still works: Rare

πŸ“ Checklist Before Submitting

Go through each requirement:

Part B: Machine Learning Interface Class

  • __init__ initializes the model and stores it as self.model
  • __init__ properly handles target and feature data
  • __call__ takes a DataFrame and returns a tuple: (prediction, probability)
  • Predictions work correctly on new data

Part C: Model Serialization

  • save() saves the model using joblib to the specified filepath
  • open() loads a saved model from the specified filepath
  • Loaded models can still make predictions

Part D: API Model Integration

  • info() returns a string with model name and timestamp
  • The format matches: "Model Name initialized at YYYY-MM-DD HH:MM:SS"

πŸ› Common Issues & Fixes

Issue: "KeyError: 'Rarity'"

Problem: Your DataFrame doesn't have the expected columns
Fix: Make sure your data has these exact columns: Level, Health, Energy, Sanity, Rarity

Issue: "call doesn't return probability"

Problem: You're only returning the prediction
Fix: Make sure you return a tuple: return prediction, confidence

Issue: "Can't find module 'machine'"

Problem: Import path is wrong
Fix: Make sure machine.py is in the app/ folder and you import with from app.machine import Machine

Issue: Loaded model gives different predictions

Problem: Features are in wrong order
Fix: Make sure you save and load self.feature_names correctly


πŸŽ₯ Loom Video Tips

For your video submission, walk through:

  1. Show your machine.py file and explain each method
  2. Run your test code showing training, prediction, save, and load
  3. Explain why you chose Random Forest (reference your Part A notebook)
  4. Show the model working on the /model endpoint (if integrated)

πŸš€ Next Steps

After completing this sprint:

  1. Commit your code to your forked repo
  2. Test thoroughly - all four parts must work
  3. Record your Loom video
  4. Submit in your course with repo link and video link

Need help? Post in team-labs-current or open a support ticket!


πŸ’‘ Key Takeaways

You've learned how to:

  • βœ… Build a production-ready ML interface class
  • βœ… Train and make predictions with scikit-learn
  • βœ… Serialize models for persistence
  • βœ… Integrate ML models with APIs
  • βœ… Handle feature engineering (encoding, feature selection)

This pattern works for ANY ML model - you could swap Random Forest for another algorithm and the structure stays the same!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment