Skip to content

Instantly share code, notes, and snippets.

@mattmanning
Created December 4, 2025 15:53
Show Gist options
  • Select an option

  • Save mattmanning/789566298ca5bcc4b5ede80bd319fed2 to your computer and use it in GitHub Desktop.

Select an option

Save mattmanning/789566298ca5bcc4b5ede80bd319fed2 to your computer and use it in GitHub Desktop.
STIMI (IHG) IOAnalysisDataset Error Investigation - 2025-12-04

STIMI (IHG) - IOAnalysisDataset Error Investigation

Date: 2025-12-04
Pipeline: prod-single-weekly-production
Job: IOAnalysisDataset
Hotel: STIMI (hotel_id: 2213)
Customer: IHG

Error Message

AttributeError: Can only use .dt accessor with datetimelike values

Traceback:

File "hosp_econ_app/components/io_analysis_dataset.py", line 404, in bound_horizon_arr
    df_input['periodsPrior'] = (df_input['arr_dt'] - df_input['as_of_dt']).dt.days

Root Cause Analysis

Immediate Cause

The rt_bucket_fcst output file contains only headers with no data rows (94 bytes):

hotel_id|los|dow|rm_grp_nm|lra_ind|arr_dt|as_of_dt|rt_bucket|fcst_ps_rem_net_rms|fcst_net_adr

When pandas reads an empty CSV with parse_dates=['arr_dt', 'as_of_dt'], those columns become object dtype (not datetime) because there's no data to parse. The subsequent .dt.days accessor then fails.

Underlying Data Issues

1. OTB (On-the-Books) Data Problem

  • Location: s3://ra-hosp-econ-prod-us-east-1/ihg/inputs/opt_on_the_books/hotel_cd=STIMI/
  • Issue: Only 1 day of data (2026-05-14) with empty seg_nm and sub_seg_nm columns
  • File size: 305 bytes
hotel_id|stay_dt|rm_typ_nm|grp_otb_net_rms|trn_otb_net_rms|...|seg_nm|sub_seg_nm
2213|2026-05-14|KVUN|2.0|0|0|0|0|1||
2213|2026-05-14|TOTN|8.0|0|0|0|0|1||

2. Stale Run List

  • Location: s3://ra-hosp-econ-prod-us-east-1/ihg/outputs/current/run_list/net_rms/hotel_cd=STIMI/
  • Last Updated: September 30, 2025 (over 2 months old)
  • File size: 98 bytes

3. Hotel-Specific pbf_algorithm_set

  • Hotel has a custom pbf_algorithm_set.csv with only paramId=106 (LastSeason model)
  • Missing short_history_flag values

Data Flow & Cascade Effect

ra_forecast (has values: sum=12,546, max=85)
    ↓ [LTF values incorrectly set to 1.0 for many rows]
sys_fcst (ALL ZEROS - 371 rows)
    ↓
sys_fcst_adj (ALL ZEROS)
    ↓
seg_lvl_sys_fcst (ALL ZEROS - 2,597 rows)
    ↓
arr_los_fcst (ALL ZEROS - 17,885 rows)
    ↓
rt_bucket_fcst_debug (EMPTY - 0 rows)
    ↓
rt_bucket_fcst (EMPTY - headers only)
    ↓
IOAnalysisDataset FAILS

Key Findings

sys_fcst_debug Analysis

Column Value
ltf_ra_fcst_net_rms sum=13,070, 351 non-zero
otb_net_rms ALL ZEROS
pct_cur 184 NaN, 187 zeros
total_bkg_pct sum=43.43, 337 non-zero
fcst_rem_net_rms ALL ZEROS
fcst_net_rms ALL ZEROS

Manual Calculation Test

Using the formula fcst_net_rms = otb + (1 - total_bkg_pct) * ltf:

  • Expected: sum=12,762.96
  • Actual: 0

The LTF values in sys_fcst_debug show ltf_ra_fcst_net_rms = 1.0 for many rows where ra_forecast shows actual values (0-85), indicating a data transformation issue.

Historical Data Status

The hotel has adequate historical data:

  • ad_clipped: 1,316 data points
  • Date Range: 2022-04-28 to 2025-12-03 (187.9 weeks / ~3.6 years)
  • net_rms: sum=52,161, max=85.32

Recommendations

  1. Investigate OTB Data Feed

    • Why is OTB data missing segment names (seg_nm, sub_seg_nm)?
    • Why only 1 future date has OTB data?
  2. Update Run List

    • Run list is 2+ months stale
    • May need to re-run RollingPickBest for this hotel
  3. Check Hotel Status

    • Is this hotel newly onboarded, being deactivated, or having data feed issues?
    • Review hotel configuration in portfolio settings
  4. Code Improvement (Optional)

    • Add defensive datetime conversion in bound_horizon_arr() similar to bound_horizon_sn():
    # Current (line 404):
    df_input['periodsPrior'] = (df_input['arr_dt'] - df_input['as_of_dt']).dt.days
    
    # More defensive:
    df_input['periodsPrior'] = (
        (df_input['arr_dt'] - df_input['as_of_dt']).astype('timedelta64[s]').dt.days
    )

Files Investigated

File Location Status
rt_bucket_fcst outputs/current/rt_bucket_fcst/hotel_cd=STIMI/ Empty (94 bytes, headers only)
rt_bucket_fcst_debug outputs/current/rt_bucket_fcst_debug/hotel_cd=STIMI/ Empty (0 rows)
rt_bucket_input_debug outputs/current/rt_bucket_input_debug/hotel_cd=STIMI/ Has data (1.6M rows)
arr_los_fcst outputs/current/arr_los_fcst/hotel_cd=STIMI/ All zeros
seg_lvl_sys_fcst outputs/current/seg_lvl_sys_fcst/hotel_cd=STIMI/ All zeros
sys_fcst outputs/current/sys_fcst/hotel_cd=STIMI/ All zeros
ra_forecast outputs/current/ra_forecast/net_rms/hotel_cd=STIMI/ Has values
ad_clipped outputs/current/ad_clipped/net_rms/hotel_cd=STIMI/ Has values
run_list outputs/current/run_list/net_rms/hotel_cd=STIMI/ Stale (Sep 30)
opt_on_the_books inputs/opt_on_the_books/hotel_cd=STIMI/ Missing segments
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment