Browse documents

Phase 10 — Chiller Plant Management System (CPMS)

Status: Planned · Owner: CPMS Lead (HVAC + Controls SME + Backend) · Duration: 4 weeks · Gate: G10

1. Overview

Phase 10 builds the Chiller Plant Management System (CPMS) as a standalone module within Atlas + SFMS. It targets the largest single energy consumer in most commercial buildings — the chilled-water plant — and gives operators real-time plant visibility, equipment-level performance analytics, AI-driven optimisation (sequencing, staging, setpoint reset), and Fault Detection & Diagnostics (FDD). CPMS is plant-agnostic (centrifugal, screw, scroll, heat-pump chillers; cooling towers; primary/secondary/tertiary loops) and aligns to ASHRAE Guideline 36 for control sequences and AHRI Standard 550/590 for performance reporting (IPLV / NPLV).

This module is a direct fit for the KTC fixture, where 121 CHH (heat-pump chiller) units, related primary/secondary loops (PH/PC, HHS/HCS), and condenser pumps (RP) are already mapped via BACnet.

2. Objectives

  • O10.1 — Live chiller-plant schematic with equipment-level telemetry, alarms, and control state.
  • O10.2 — Plant performance KPIs in real time: plant kW/RT, per-chiller kW/RT, IPLV / NPLV, COP, delta-T, approach, ΔP, flow, condenser/chilled-water reset state.
  • O10.3 — Chiller sequencing & staging optimisation (lead/lag/standby; rotation; demand-based stage up/down).
  • O10.4 — Setpoint reset advisor: chilled-water reset, condenser-water reset, differential-pressure reset.
  • O10.5 — Fault Detection & Diagnostics (FDD) for the standard fault library (short cycling, surge, low ΔT syndrome, refrigerant under/over-charge symptoms, condenser fouling, cooling-tower drift, sensor fault).
  • O10.6 — AI optimisation engine producing per-day operational recommendations and explaining the rationale.
  • O10.7 — What-if simulator: change weather / load / setpoints and compare plant kWh and kW/RT.
  • O10.8 — CPMS-specific dashboards, widgets, workflows, and approval flows for plant control actions (with safety gates).

3. Scope

3.1 In-scope

  • CPMS entity model layered on Atlas (Asset → System → Plant) with CPMS-specific equipment classes: Chiller, CoolingTower, Pump, Valve, HeatExchanger, ExpansionTank, Filter, ChilledWaterLoop, CondenserWaterLoop, Plant.
  • Plant schematic builder (drag-drop on a canvas) + auto-generation from asset relationships.
  • Per-equipment performance curves (factory + measured) and library of typical models.
  • Real-time KPI engine for kW/RT and derived metrics.
  • Sequencing & staging engine (rule-based + AI-augmented).
  • Setpoint reset recommender (ASHRAE Guideline 36 strategies).
  • FDD engine with a launch fault library (≥30 rules) plus tenant-customisable rules.
  • AI optimisation engine (Phase 6 LLM router + classical ML for forecasting).
  • What-if simulator (closed-form regression model + monte-carlo bands).
  • Plant-control actions are advisory by default; tenants can enable closed-loop control with safety gates (workflow approval + circuit breaker).
  • M&V baseline storage (used by Phase 11 EMS for measurement & verification).
  • CPMS widgets (≥10) registered into Phase 5 dashboard.
  • CPMS workflows registered into Phase 7 studio (e.g., "if plant kW/RT exceeds 0.85 for 30 min then create a P2 work order and email FM lead").
  • Reports: Daily Plant Report, Weekly Performance, Monthly IPLV, Annual Comparison.

3.2 Out-of-scope (this phase)

  • Refrigerant leak detection hardware (sensor-data ingestion only).
  • AHU / VAV / Terminal-unit optimisation (separate "Air-Side Optimisation" module; post-launch).
  • Pumping system head-curve optimisation across multiple buildings (single-plant only at v1).
  • Chiller-plant commissioning tooling (CxA module; post-launch).

4. Dependencies

  • Phase 3 Data Platform (canonical model, telemetry, connectors, tagging).
  • Phase 4 SFMS (work orders, locations).
  • Phase 5 Dashboards (widget registration).
  • Phase 6 AI Second Brain (LLM + forecasting infra).
  • Phase 7 Workflow Studio (control actions, approvals).
  • Brick Schema extensions for chiller-plant taxonomy.

5. Architecture & Design

5.1 Plant model

Plant
 ├─ ChilledWaterLoop
 │   ├─ Chillers[ ]            (CHH-1, CHH-2, …)
 │   ├─ ChilledWaterPumps[ ]   (Primary, Secondary)
 │   ├─ Headers[ ]             (Common supply / return)
 │   └─ Sensors[ ]             (CHWS, CHWR, ΔP, Flow, BTU)
 ├─ CondenserWaterLoop
 │   ├─ Condenser pumps[ ]
 │   ├─ CoolingTowers[ ]       (fans, basin level, makeup, blowdown)
 │   └─ Sensors[ ]             (CWS, CWR, Approach, WBT)
 ├─ Heat-recovery (optional)
 ├─ Thermal storage (optional)
 └─ Plant-level controls

5.2 Performance model

For each equipment item, CPMS computes:

  • Chiller: capacity (RT), input kW, kW/RT, evap ΔT, condenser ΔT, approach, IPLV (rolling), runtime hours, starts.
  • Cooling tower: approach (CWR − WBT), range (CWS − CWR), fan power, water consumption (makeup − blowdown), kWh / RT-h removed.
  • Pump: head, flow, power, η (efficiency), VFD speed.
  • Loop: ΔT, ΔP, flow balance.
  • Plant: aggregate kW/RT including all parasitic loads (chiller + pumps + tower).

5.3 Sequencing & staging engine

Two layers:

  1. Rule-based (deterministic, auditable): standard sequences from ASHRAE Guideline 36 — chiller lead/lag/standby with equal-runtime rotation, demand-based stage up/down with hold timers, surge / hunting prevention.
  2. AI-augmented (Phase 6): given forecast load + weather + tariff + equipment performance, recommend optimal stage selection and setpoints. Recommendations show predicted kWh vs current baseline for transparency.

Outputs from staging engine can be:

  • Advisory (default) — surfaced in the dashboard with one-click "send to BMS via workflow".
  • Automated (opt-in, gated) — workflow with approval circuit-breaker; reverts to manual on anomaly.

5.4 Setpoint reset recommender

Strategies implemented:

  • Chilled-water supply temperature reset (based on outside air temperature, load, or worst-case zone).
  • Condenser-water reset (based on wet-bulb temperature with approach minimum).
  • Differential-pressure reset (based on critical-zone valve position).

Each strategy publishes a target setpoint + confidence + expected savings to the dashboard and (optionally) executes via workflow.

5.5 FDD engine

Launch fault library:

CategoryFaults
Chiller efficiencykW/RT > threshold (per equipment baseline + age), surge symptoms, fouling indicator (condenser approach drift), low Δ-T syndrome, refrigerant abnormality (suction/discharge pressure drift).
Cooling towerApproach drift, fan VFD imbalance, basin level anomaly, drift / overflow, makeup-water runaway.
PumpsLow η, runtime imbalance across redundant pumps, VFD hunting, cavitation indicator (NPSH margin).
LoopsDecoupler reverse flow, low ΔT, bypass flow excessive.
SensorsStuck values, drift, range violation, redundancy mismatch.
OperationsShort cycling, simultaneous heating + cooling, after-hours runtime, holiday-schedule violation.

Each fault produces a Finding record with severity, evidence, recommended action; can auto-create a Work Order via workflow.

5.6 AI optimisation engine

Two AI capabilities, both via Phase 6's LLM router + classical ML:

  • Load forecaster (next 24 h, next 7 days) — gradient-boosted regressor over weather + occupancy + historical; updated nightly.
  • Optimisation advisor — LLM agent with read tools (telemetry, weather, tariff, performance curves) + structured-output recommendation; no write tools by default (write requires explicit cpms.write.* permission).

5.7 What-if simulator

  • Closed-form regression model per equipment (validated against measured data).
  • Plant-level rollup with parasitic loads.
  • Inputs: weather (manual or forecast), load (manual or forecast), setpoints (per-loop), equipment selection.
  • Outputs: kWh, kW/RT, peak kW, alarm risk; with bounds.

5.8 Safety: closed-loop control

When tenant enables closed-loop control:

  • Every control action runs through a Phase 7 workflow.
  • Circuit breaker — if any safety-critical telemetry deviates (chilled-water temp drifts > X, pressure exceeds limit), system reverts to last known good state and notifies on-call.
  • Two-person rule for first-time activation of any new action type.
  • Audit — every action logged with operator, rationale, before/after, override path.

6. Detailed Specifications

6.1 API surface (Phase 10 additions)

# Plant model
GET    /api/v1/cpms/plants
POST   /api/v1/cpms/plants
GET    /api/v1/cpms/plants/:id
PATCH  /api/v1/cpms/plants/:id
GET    /api/v1/cpms/plants/:id/schematic
POST   /api/v1/cpms/plants/:id/schematic

# Performance & KPIs
GET    /api/v1/cpms/plants/:id/kpis?range=...
GET    /api/v1/cpms/equipment/:id/curve
GET    /api/v1/cpms/equipment/:id/baseline
POST   /api/v1/cpms/equipment/:id/baseline           (set baseline period)

# Sequencing & staging
GET    /api/v1/cpms/plants/:id/staging/state
GET    /api/v1/cpms/plants/:id/staging/recommendation
POST   /api/v1/cpms/plants/:id/staging/apply          (workflow-gated)

# Setpoint reset
GET    /api/v1/cpms/plants/:id/reset/recommendations
POST   /api/v1/cpms/plants/:id/reset/apply

# FDD
GET    /api/v1/cpms/findings?status=open&plantId=...
PATCH  /api/v1/cpms/findings/:id                      (acknowledge / close)
POST   /api/v1/cpms/findings/:id/create-work-order

# AI optimisation
POST   /api/v1/cpms/ai/forecast                       (load forecast)
POST   /api/v1/cpms/ai/recommend                      (LLM agent)
GET    /api/v1/cpms/ai/runs/:id

# What-if simulator
POST   /api/v1/cpms/whatif/run
GET    /api/v1/cpms/whatif/runs/:id

6.2 Permissions added

cpms.read cpms.write
cpms.plant.read cpms.plant.update cpms.plant.schematic.update
cpms.staging.recommend cpms.staging.apply
cpms.reset.recommend cpms.reset.apply
cpms.fdd.read cpms.fdd.update
cpms.ai.run
cpms.whatif.run
cpms.control.enable                  (per tenant; gate for any write to plant)

6.3 KPI catalogue (CPMS)

KPIDefinitionUnit
Plant kW/RT(chiller + pumps + tower kW) / plant cooling outputkW/RT
Per-chiller kW/RTinput / outputkW/RT
IPLV / NPLVweighted part-load efficiencykW/RT
Plant COPcooling output / total input
Approach (chiller)LWT − refrigerant evap tempK
Approach (CT)CWR − wet-bulbK
Range (CT)CWS − CWRK
Loop ΔTsupply − returnK
Specific pump powerkW / (L/s)kW·s/L
% runtime above efficient bandrolling%
FDD open count by severityrollingcount
Predicted savings vs baselinelast 7 / 30 dayskWh / %
Carbon intensity of plantkWh × grid emission factorkgCO2e

6.4 Dashboard widgets (CPMS)

  1. Plant Schematic Live
  2. Plant kW/RT Live Tile
  3. Chiller Performance Card
  4. Cooling Tower Performance Card
  5. Staging State + Recommendation
  6. Setpoint Reset Recommendation
  7. FDD Open Findings
  8. What-if Simulator (mini)
  9. AI Daily Recommendation Card
  10. Energy + Carbon vs Baseline

6.5 Workflow templates (CPMS)

  • High kW/RT responseplant kW/RT > 0.85 for 30 min → create P2 work order, notify on-call, attach 24-h telemetry chart.
  • Daily optimisation digestdaily 06:00 → run AI recommend → email FM lead.
  • Approval-gated setpoint apply — recommend → human approves → workflow writes setpoint → verify → revert on anomaly.
  • FDD escalation — Sev-1 finding open > 4h → escalate to manager.

6.6 Brick Schema extension (CPMS)

  • Chiller_Heat_Pump, Centrifugal_Chiller, Screw_Chiller, Scroll_Chiller
  • Open_Cooling_Tower, Closed_Cooling_Tower
  • Primary_CHW_Pump, Secondary_CHW_Pump, Tertiary_CHW_Pump, Condenser_Water_Pump
  • CHW_Supply_Temperature, CHW_Return_Temperature, CHW_Differential_Pressure, CHW_Flow, Chiller_Active_Power, Chiller_Status, Chiller_Capacity_Command, Cooling_Tower_Fan_Speed, Wet_Bulb_Temperature

7. Implementation Tasks

Epic 10.A — CPMS data model

  • 10.A.1 Equipment classes + plant model + schematic JSON schema.
  • 10.A.2 Brick extension seeding.
  • 10.A.3 Performance-curve store; library of vendor curves.
  • 10.A.4 Baseline period storage.

Epic 10.B — Performance & KPI engine

  • 10.B.1 kW/RT and derived KPI service.
  • 10.B.2 Rolling IPLV / NPLV computation.
  • 10.B.3 Approach / range / ΔT / ΔP / pump-η calculators.
  • 10.B.4 KPI persistence + time-series rollups.

Epic 10.C — Sequencing & staging

  • 10.C.1 Rule-based sequencer (ASHRAE GL36-aligned).
  • 10.C.2 AI-augmented advisor (Phase 6 hook).
  • 10.C.3 Advisory + opt-in automated paths.

Epic 10.D — Setpoint reset recommender

  • 10.D.1 Three reset strategies.
  • 10.D.2 Confidence + expected-savings computation.

Epic 10.E — FDD

  • 10.E.1 Rule engine (composable predicates).
  • 10.E.2 Launch fault library (≥30 rules).
  • 10.E.3 Finding lifecycle (open → acknowledge → close → work-order link).

Epic 10.F — AI optimisation

  • 10.F.1 Load forecaster (gradient-boosted regressor; nightly retrain).
  • 10.F.2 Optimisation advisor (LLM agent with read tools).
  • 10.F.3 Structured output + UI surfacing.

Epic 10.G — What-if simulator

  • 10.G.1 Regression models per equipment.
  • 10.G.2 Plant rollup.
  • 10.G.3 UI with sliders and comparison.

Epic 10.H — Control safety

  • 10.H.1 Workflow gates for write actions.
  • 10.H.2 Circuit breaker watchdog.
  • 10.H.3 Two-person rule for new action types.

Epic 10.I — Widgets, workflows, reports

  • 10.I.1 Register CPMS widgets in Phase 5.
  • 10.I.2 Register workflow templates in Phase 7.
  • 10.I.3 CPMS reports + scheduled delivery.

8. Acceptance Criteria

  • AC10.1 — A plant defined from the KTC chiller-plant fixture renders a live schematic with values updating from telemetry within 5 s of source change.
  • AC10.2 — Plant kW/RT card matches manual spreadsheet calculation to within ±2% on a representative day.
  • AC10.3 — Staging engine produces a recommendation under each of three test scenarios (low load, design load, partial load) consistent with ASHRAE GL36.
  • AC10.4 — Setpoint reset recommendation moves CHWS setpoint up at low load with documented expected savings ≥ 3%.
  • AC10.5 — FDD engine raises ≥ 20 of 30 launch rules against a seeded fault data set; false-positive rate < 10% over a 7-day baseline.
  • AC10.6 — AI load forecast MAPE ≤ 12% on day-ahead next-day on the seeded data set.
  • AC10.7 — What-if simulator returns plant kWh estimate within ±5% of measured baseline for a known week.
  • AC10.8 — Closed-loop write to BMS is denied without cpms.control.enable; with permission and the safety workflow, write succeeds and is auditable.
  • AC10.9 — All CPMS widgets render in three view modes (Standard / TV / War Room) without layout breakage.

9. Test Requirements

  • Unit: ≥80% on KPI engine, sequencer, reset, FDD predicates.
  • Integration: against a simulated plant (digital twin script) producing telemetry across operating modes.
  • e2e: KTC-fixture plant — schematic, KPIs, staging recommendation, FDD findings.
  • Safety: kill-switch and circuit-breaker tests for control actions.
  • AI: forecast MAPE on hold-out; FDD precision/recall.

10. Documentation Requirements

  • docs/cpms/overview.md
  • docs/cpms/plant_model.md
  • docs/cpms/kpis.md
  • docs/cpms/sequencing.md (ASHRAE GL36 alignment)
  • docs/cpms/fdd_library.md (each launch rule with rationale + test data)
  • docs/cpms/control_safety.md
  • docs/cpms/workflow_templates.md
  • ADR-025: CPMS data model
  • ADR-026: AI vs rule-based control split

11. Sign-off Criteria (Gate G10)

  • All Acceptance Criteria met.
  • Live demo on the KTC chiller-plant fixture (CHH family, primary/secondary loops, condenser pumps).
  • HVAC SME review of FDD library + sequencing engine.
  • CPMS Lead, AI Lead, Engineering Lead, Product Owner sign _gates/Gate_G10_signoff.md.
  • Tagged phase-10-v1.0.

12. Risks & Mitigations

RiskLIMitigation
Customer plants don't match Brick taxonomy33Mapping templates per vendor; manual override per equipment.
Closed-loop control causes a safety event15Advisory-only default; workflow + circuit breaker; two-person rule for new actions.
FDD false positives erode trust34Per-equipment thresholds; learning baselines; user feedback loop.
Load-forecast MAPE drift after seasonal change33Auto-retrain monthly; alert on MAPE breach.
KTC data sparse for some chiller modes32Use synthetic augmentation; document confidence per recommendation.