Skip to content

Instantly share code, notes, and snippets.

@dstreefkerk
Created December 11, 2025 04:56
Show Gist options
  • Select an option

  • Save dstreefkerk/8cbbc9b005975e94ea173487f4c9a76d to your computer and use it in GitHub Desktop.

Select an option

Save dstreefkerk/8cbbc9b005975e94ea173487f4c9a76d to your computer and use it in GitHub Desktop.

KQL Best Practices for LLM Query Development

Core Performance Principle

"Query performance depends directly on the amount of data processed" - Always minimize data processing through strategic filtering and early reduction.

1. Data Reduction First

Time Filtering (Critical)

// ✅ GOOD: Time filter first (leverages indexing)
DeviceProcessEvents
| where TimeGenerated >= ago(1h)
| where ProcessName == "powershell.exe"

// ❌ BAD: Other filters before time
DeviceProcessEvents
| where ProcessName == "powershell.exe"
| where TimeGenerated >= ago(1h)

Filter Ordering & Selectivity

// ✅ GOOD: Most selective filters first
SecurityEvent
| where TimeGenerated >= ago(24h)        // Time filter first
| where EventID == 4624                  // Specific EventID (high selectivity)
| where LogonType == 3                   // Further filtering
| where Account has "admin"              // String search last

// ❌ BAD: Generic filters first
SecurityEvent
| where Account has "admin"              // String search first (slow)
| where TimeGenerated >= ago(24h)

Selectivity Testing Strategy

⚠️ Important: Filter selectivity assumptions can be wrong! Always test different orderings.

Methodology for Determining Selectivity:

// Step 1: Test individual filter selectivity
DeviceProcessEvents | where TimeGenerated >= ago(30d) | where FolderPath has_any ("powershell.exe") | count
DeviceProcessEvents | where TimeGenerated >= ago(30d) | where ProcessCommandLine contains ".Insert(" | count

// Step 2: Compare execution times with different orderings
// Version A: Executable filter first
DeviceProcessEvents
| where TimeGenerated >= ago(30d)
| where FolderPath has_any ("powershell.exe", "pwsh.exe")  // High selectivity: fewer PowerShell processes
| where ProcessCommandLine contains ".Insert("             // Lower selectivity within PS subset

// Version B: Content filter first
DeviceProcessEvents
| where TimeGenerated >= ago(30d)
| where ProcessCommandLine contains ".Insert("             // Low selectivity: scan all command lines
| where FolderPath has_any ("powershell.exe", "pwsh.exe") // Applied to larger result set

Real-World Selectivity Findings:

Obfuscated PowerShell Detection Case Study:

  • Executable filtering first: 1,204ms ✅ (Winner)
  • Command content filtering first: 3,053ms ❌ (29% slower)
  • Lesson: PowerShell executable filtering was more selective than obfuscation patterns

2. String Operations

Use Efficient String Operators

// ✅ GOOD: Use 'has' for word searches
| where CommandLine has "powershell"

// ❌ BAD: Use 'contains' unnecessarily
| where CommandLine contains "powershell"

// ✅ GOOD: Case-sensitive when possible
| where FileName == "cmd.exe"

// ❌ BAD: Case-insensitive when not needed
| where FileName =~ "cmd.exe"

String Search Hierarchy (Fastest to Slowest)

  1. == (exact match)
  2. has / has_cs (word boundary, indexed for 3+ chars)
  3. has_any / in (multiple value matching)
  4. startswith / endswith
  5. contains
  6. matches regex

Advanced String Optimization Techniques

Use has_any() for Multiple File Searches

// ✅ EXCELLENT: Single has_any operation (49% faster)
| where FolderPath has_any ("powershell.exe", "pwsh.exe", "cmd.exe")

// ❌ INEFFICIENT: Multiple endswith operations
| where FolderPath endswith @"\powershell.exe"
    or FolderPath endswith @"\pwsh.exe"
    or FolderPath endswith @"\cmd.exe"

Eliminate Contains Operations Completely

// 🏆 BEST: Zero contains operations (Sentinel UI compliant)
| where ProcessCommandLine has_cs "UILevel"        // Fast indexed search
  and ProcessCommandLine has "Insert"              // Term match without punctuation
  and ProcessCommandLine has_cs "-ComObject"       // Fast parameter search
  and ProcessCommandLine has "InstallProduct"      // Term match for method name

// ✅ GOOD: Mix has_cs and contains strategically
| where ProcessCommandLine has_cs "UILevel"        // Fast indexed search
  and ProcessCommandLine contains ".Insert("       // Necessary for punctuation
  and ProcessCommandLine has_cs "-ComObject"       // Fast parameter search

// ❌ SLOWER: Use contains for everything
| where ProcessCommandLine contains "UILevel"
  and ProcessCommandLine contains ".Insert("
  and ProcessCommandLine contains "-ComObject"

Contains Elimination Strategy

Key Insight: Often you can match the core term without punctuation:

  • contains ".Insert("has "Insert"
  • contains "InstallProduct("has "InstallProduct"
  • contains "function("has "function"

When Contains is Still Necessary:

  • Complex patterns: contains "cmd /c"
  • Substring matching: contains "ttp"
  • Special character sequences: contains "://"

Case Sensitivity Performance Strategy

// ✅ FASTER: Case-sensitive when content is predictable
| where ProcessCommandLine has_cs "InstallProduct"  // Method names are consistent
| where ProcessCommandLine has_cs "-ComObject"      // Parameters are case-specific

// ✅ ROBUST: Case-insensitive for user content
| where ProcessCommandLine has_any ("-comobject", "-ComObject")  // Handle variations

Real-World Performance Example

Obfuscated PowerShell MSI Detection Optimization:

// 🏆 FINAL OPTIMIZED: 1,768ms execution (25% improvement, zero contains)
DeviceProcessEvents
| where TimeGenerated >= ago(30d)
| where FolderPath has_any ("powershell_ise.exe", "powershell.exe", "pwsh.exe")
    or ProcessVersionInfoOriginalFileName in~ ("PowerShell_ISE.EXE", "PowerShell.EXE", "pwsh.dll")
| where ProcessCommandLine has_cs "UILevel"
  and ProcessCommandLine has "Insert"
  and ProcessCommandLine has_cs "-ComObject"
  and ProcessCommandLine has "InstallProduct"

// ✅ INTERMEDIATE: 1,204ms execution (49% improvement, some contains)
DeviceProcessEvents
| where TimeGenerated >= ago(30d)
| where FolderPath has_any ("powershell_ise.exe", "powershell.exe", "pwsh.exe")
    or ProcessVersionInfoOriginalFileName in~ ("PowerShell_ISE.EXE", "PowerShell.EXE", "pwsh.dll")
| where ProcessCommandLine has_cs "UILevel"
  and ProcessCommandLine contains ".Insert("
  and ProcessCommandLine has_cs "-ComObject"
  and ProcessCommandLine contains "InstallProduct("

// ❌ BASELINE: 2,370ms execution (original approach)
DeviceProcessEvents
| where TimeGenerated >= ago(30d)
| where (
    FolderPath endswith @"\powershell_ise.exe" or
    FolderPath endswith @"\powershell.exe" or
    FolderPath endswith @"\pwsh.exe" or
    ProcessVersionInfoOriginalFileName in~ ("PowerShell_ISE.EXE", "PowerShell.EXE", "pwsh.dll")
  )
| where ProcessCommandLine contains "-ComObject"
  and ProcessCommandLine contains "InstallProduct("
  and ProcessCommandLine contains ".Insert("
  and ProcessCommandLine contains "UILevel"

Optimization Evolution Summary

Version Execution Time Key Insight
Baseline 2,370ms Multiple endswith + all contains
Intermediate 1,204ms has_any() + strategic has_cs
🏆 Final 1,768ms Zero contains operations

Key Learning: Sentinel UI warnings guide you to the most scalable solution!

3. Column Selection

Project Early and Specifically

// ✅ GOOD: Select only needed columns
DeviceProcessEvents
| where TimeGenerated >= ago(1h)
| where InitiatingProcessFileName == "explorer.exe"
| project TimeGenerated, DeviceName, FileName, CommandLine

// ❌ BAD: Select all columns
DeviceProcessEvents
| where TimeGenerated >= ago(1h)
| where InitiatingProcessFileName == "explorer.exe"
// No project statement = all columns returned

4. Join Operations

Table Ordering

// ✅ GOOD: Smaller table on left
SmallTable
| join kind=inner LargeTable on CommonKey

// ❌ BAD: Larger table on left
LargeTable
| join kind=inner SmallTable on CommonKey

Use 'in' for Simple Filtering

// ✅ GOOD: Use 'in' for single column filtering
let SuspiciousIPs = SecurityEvent | where EventID == 4625 | distinct IpAddress;
NetworkConnectionEvents
| where RemoteIP in (SuspiciousIPs)

// ❌ BAD: Use join for simple filtering
let SuspiciousIPs = SecurityEvent | where EventID == 4625 | project IpAddress;
NetworkConnectionEvents
| join kind=inner SuspiciousIPs on $left.RemoteIP == $right.IpAddress

5. Aggregation Optimization

Reduce Before Summarizing

// ✅ GOOD: Filter before summarize
DeviceProcessEvents
| where TimeGenerated >= ago(24h)
| where InitiatingProcessFileName != ""
| summarize ProcessCount = count() by DeviceName, InitiatingProcessFileName

// ❌ BAD: Summarize all data then filter
DeviceProcessEvents
| summarize ProcessCount = count() by DeviceName, InitiatingProcessFileName
| where ProcessCount > 10

Use Appropriate Aggregation Functions

// ✅ GOOD: Use dcount for unique counts
| summarize UniqueDevices = dcount(DeviceName)

// ❌ BAD: Use count(distinct())
| summarize UniqueDevices = count(distinct(DeviceName))

6. Detection Rule Specific Best Practices

Time Boundaries for Detection Rules

// ✅ GOOD: Explicit time range matching rule frequency
DeviceProcessEvents
| where TimeGenerated >= ago(5m)  // Matches 5-minute rule frequency
| where ActionType == "ProcessCreated"

Windows Path Handling

// ✅ GOOD: Use raw strings for Windows paths
| where FolderPath =~ @"C:\Windows\System32\cmd.exe"

// ✅ GOOD: Case-insensitive for Windows paths
| where FileName in~ ("cmd.exe", "powershell.exe", "pwsh.exe")

Null Handling

// ✅ GOOD: Explicit null checks
| where isnotempty(CommandLine)
| where ProcessName != ""

// ❌ BAD: Implicit null handling
| where CommandLine != ""  // May not handle nulls properly

7. Common Anti-Patterns to Avoid

Don't Use These Patterns

// ❌ BAD: Wildcard searches without filters
| where * has "suspicious"

// ❌ BAD: Complex regex without early filtering
| where CommandLine matches regex @".*[a-z]{20,}.*"

// ❌ BAD: Multiple table unions without time limits
union SecurityEvent, DeviceProcessEvents, DeviceNetworkEvents

// ❌ BAD: Nested queries without materialization
| where SomeField in (
    SomeOtherTable
    | where ComplexCondition
    | project SomeField
)

8. Advanced Optimizations

Use Materialization for Repeated Subqueries

// ✅ GOOD: Materialize complex subqueries
let SuspiciousProcesses = materialize(
    DeviceProcessEvents
    | where TimeGenerated >= ago(1h)
    | where FileName in~ ("cmd.exe", "powershell.exe")
    | project DeviceName, ProcessId, FileName
);
SuspiciousProcesses
| join kind=inner (
    DeviceNetworkEvents
    | where TimeGenerated >= ago(1h)
) on DeviceName

Performance Hints

// For large aggregations with high cardinality
| summarize count() by HighCardinalityField
| hint.shufflekey = HighCardinalityField

// For small table broadcasts in joins
SmallTable
| join hint.strategy=broadcast (LargeTable) on Key

9. Testing and Validation

Limit Results During Development

// ✅ GOOD: Use limit during testing
DeviceProcessEvents
| where TimeGenerated >= ago(1h)
| where ProcessName has "suspicious"
| limit 100  // Remove in production

Performance Benchmarking Methodology

// Step 1: Baseline measurement
// Run original query multiple times and record execution times

// Step 2: Test optimization variants
// Test each optimization strategy independently

// Step 3: Compare results
// Document execution times and percentage improvements

// Performance Targets:
// Detection rules: < 30 seconds (ideally < 5 seconds)
// Dashboards: < 5 seconds (ideally < 2 seconds)
// Investigation queries: < 60 seconds

Optimization Testing Framework

// Template for performance testing
let TestQuery = (strategy: string) {
    print strcat("Testing strategy: ", strategy);
    // Your query here
    // Measure execution_time_ms from results
};

// Test multiple strategies systematically
TestQuery("baseline");
TestQuery("string_optimization");
TestQuery("selectivity_reorder");
TestQuery("case_sensitivity");

✅ KQL Best Practices Checklist

Before Writing Any Query:

  • Time filter first: Always start with TimeGenerated >= ago(X)
  • Test filter selectivity: Don't assume - verify which filters are most selective
  • Optimize string operations: Use has_any(), has_cs, and strategic contains
  • Specific column selection: Use project to limit returned columns

For Detection Rules:

  • Time range matches frequency: Query period aligns with rule schedule
  • Explicit null handling: Use isnotempty() and != "" appropriately
  • Case-insensitive Windows paths: Use =~ and in~ for file paths
  • Raw strings for paths: Use @"C:\Path\file.exe" format
  • Early exclusions: Filter out known good before complex operations

For Performance:

  • String operator optimization: Use has_any() instead of multiple endswith
  • Case sensitivity strategy: Use has_cs when content is predictable
  • Benchmark different approaches: Test filter ordering and string operators
  • No wildcard searches: Avoid where * has "value"
  • Limited regex usage: Use regex only when necessary
  • Proper join ordering: Smaller table on left side
  • Use 'in' over join: For simple filtering operations
  • Materialize subqueries: For repeated complex expressions

For Large Datasets:

  • Limit during testing: Use limit for development/testing
  • Aggregation hints: Consider hint.shufflekey for high cardinality
  • Broadcast hints: Use for small/large table joins
  • Early data reduction: Filter before summarize operations

Quality Checks:

  • Syntax validation: Query passes validation
  • Performance target: Executes in < 30 seconds for detection rules
  • Result verification: Returns expected data format
  • Null handling: Properly handles missing/empty values

Security Considerations:

  • No sensitive data exposure: Query doesn't reveal credentials/secrets
  • Appropriate time windows: Balance detection vs. performance
  • False positive consideration: Logic accounts for legitimate scenarios
  • Entity extraction: Includes relevant fields for investigation

Key Remember: Start with time filtering, be selective early, and always test performance with realistic data volumes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment