Phase 10 — Chiller Plant Management System (CPMS)
Status: Planned · Owner: CPMS Lead (HVAC + Controls SME + Backend) · Duration: 4 weeks · Gate: G10
1. Overview
Phase 10 builds the Chiller Plant Management System (CPMS) as a standalone module within Atlas + SFMS. It targets the largest single energy consumer in most commercial buildings — the chilled-water plant — and gives operators real-time plant visibility, equipment-level performance analytics, AI-driven optimisation (sequencing, staging, setpoint reset), and Fault Detection & Diagnostics (FDD). CPMS is plant-agnostic (centrifugal, screw, scroll, heat-pump chillers; cooling towers; primary/secondary/tertiary loops) and aligns to ASHRAE Guideline 36 for control sequences and AHRI Standard 550/590 for performance reporting (IPLV / NPLV).
This module is a direct fit for the KTC fixture, where 121 CHH (heat-pump chiller) units, related primary/secondary loops (PH/PC, HHS/HCS), and condenser pumps (RP) are already mapped via BACnet.
2. Objectives
- O10.1 — Live chiller-plant schematic with equipment-level telemetry, alarms, and control state.
- O10.2 — Plant performance KPIs in real time: plant kW/RT, per-chiller kW/RT, IPLV / NPLV, COP, delta-T, approach, ΔP, flow, condenser/chilled-water reset state.
- O10.3 — Chiller sequencing & staging optimisation (lead/lag/standby; rotation; demand-based stage up/down).
- O10.4 — Setpoint reset advisor: chilled-water reset, condenser-water reset, differential-pressure reset.
- O10.5 — Fault Detection & Diagnostics (FDD) for the standard fault library (short cycling, surge, low ΔT syndrome, refrigerant under/over-charge symptoms, condenser fouling, cooling-tower drift, sensor fault).
- O10.6 — AI optimisation engine producing per-day operational recommendations and explaining the rationale.
- O10.7 — What-if simulator: change weather / load / setpoints and compare plant kWh and kW/RT.
- O10.8 — CPMS-specific dashboards, widgets, workflows, and approval flows for plant control actions (with safety gates).
3. Scope
3.1 In-scope
- CPMS entity model layered on Atlas (Asset → System → Plant) with CPMS-specific equipment classes:
Chiller,CoolingTower,Pump,Valve,HeatExchanger,ExpansionTank,Filter,ChilledWaterLoop,CondenserWaterLoop,Plant. - Plant schematic builder (drag-drop on a canvas) + auto-generation from asset relationships.
- Per-equipment performance curves (factory + measured) and library of typical models.
- Real-time KPI engine for kW/RT and derived metrics.
- Sequencing & staging engine (rule-based + AI-augmented).
- Setpoint reset recommender (ASHRAE Guideline 36 strategies).
- FDD engine with a launch fault library (≥30 rules) plus tenant-customisable rules.
- AI optimisation engine (Phase 6 LLM router + classical ML for forecasting).
- What-if simulator (closed-form regression model + monte-carlo bands).
- Plant-control actions are advisory by default; tenants can enable closed-loop control with safety gates (workflow approval + circuit breaker).
- M&V baseline storage (used by Phase 11 EMS for measurement & verification).
- CPMS widgets (≥10) registered into Phase 5 dashboard.
- CPMS workflows registered into Phase 7 studio (e.g., "if plant kW/RT exceeds 0.85 for 30 min then create a P2 work order and email FM lead").
- Reports: Daily Plant Report, Weekly Performance, Monthly IPLV, Annual Comparison.
3.2 Out-of-scope (this phase)
- Refrigerant leak detection hardware (sensor-data ingestion only).
- AHU / VAV / Terminal-unit optimisation (separate "Air-Side Optimisation" module; post-launch).
- Pumping system head-curve optimisation across multiple buildings (single-plant only at v1).
- Chiller-plant commissioning tooling (CxA module; post-launch).
4. Dependencies
- Phase 3 Data Platform (canonical model, telemetry, connectors, tagging).
- Phase 4 SFMS (work orders, locations).
- Phase 5 Dashboards (widget registration).
- Phase 6 AI Second Brain (LLM + forecasting infra).
- Phase 7 Workflow Studio (control actions, approvals).
- Brick Schema extensions for chiller-plant taxonomy.
5. Architecture & Design
5.1 Plant model
Plant
├─ ChilledWaterLoop
│ ├─ Chillers[ ] (CHH-1, CHH-2, …)
│ ├─ ChilledWaterPumps[ ] (Primary, Secondary)
│ ├─ Headers[ ] (Common supply / return)
│ └─ Sensors[ ] (CHWS, CHWR, ΔP, Flow, BTU)
├─ CondenserWaterLoop
│ ├─ Condenser pumps[ ]
│ ├─ CoolingTowers[ ] (fans, basin level, makeup, blowdown)
│ └─ Sensors[ ] (CWS, CWR, Approach, WBT)
├─ Heat-recovery (optional)
├─ Thermal storage (optional)
└─ Plant-level controls
5.2 Performance model
For each equipment item, CPMS computes:
- Chiller: capacity (RT), input kW, kW/RT, evap ΔT, condenser ΔT, approach, IPLV (rolling), runtime hours, starts.
- Cooling tower: approach (CWR − WBT), range (CWS − CWR), fan power, water consumption (makeup − blowdown), kWh / RT-h removed.
- Pump: head, flow, power, η (efficiency), VFD speed.
- Loop: ΔT, ΔP, flow balance.
- Plant: aggregate kW/RT including all parasitic loads (chiller + pumps + tower).
5.3 Sequencing & staging engine
Two layers:
- Rule-based (deterministic, auditable): standard sequences from ASHRAE Guideline 36 — chiller lead/lag/standby with equal-runtime rotation, demand-based stage up/down with hold timers, surge / hunting prevention.
- AI-augmented (Phase 6): given forecast load + weather + tariff + equipment performance, recommend optimal stage selection and setpoints. Recommendations show predicted kWh vs current baseline for transparency.
Outputs from staging engine can be:
- Advisory (default) — surfaced in the dashboard with one-click "send to BMS via workflow".
- Automated (opt-in, gated) — workflow with approval circuit-breaker; reverts to manual on anomaly.
5.4 Setpoint reset recommender
Strategies implemented:
- Chilled-water supply temperature reset (based on outside air temperature, load, or worst-case zone).
- Condenser-water reset (based on wet-bulb temperature with approach minimum).
- Differential-pressure reset (based on critical-zone valve position).
Each strategy publishes a target setpoint + confidence + expected savings to the dashboard and (optionally) executes via workflow.
5.5 FDD engine
Launch fault library:
| Category | Faults |
|---|---|
| Chiller efficiency | kW/RT > threshold (per equipment baseline + age), surge symptoms, fouling indicator (condenser approach drift), low Δ-T syndrome, refrigerant abnormality (suction/discharge pressure drift). |
| Cooling tower | Approach drift, fan VFD imbalance, basin level anomaly, drift / overflow, makeup-water runaway. |
| Pumps | Low η, runtime imbalance across redundant pumps, VFD hunting, cavitation indicator (NPSH margin). |
| Loops | Decoupler reverse flow, low ΔT, bypass flow excessive. |
| Sensors | Stuck values, drift, range violation, redundancy mismatch. |
| Operations | Short cycling, simultaneous heating + cooling, after-hours runtime, holiday-schedule violation. |
Each fault produces a Finding record with severity, evidence, recommended action; can auto-create a Work Order via workflow.
5.6 AI optimisation engine
Two AI capabilities, both via Phase 6's LLM router + classical ML:
- Load forecaster (next 24 h, next 7 days) — gradient-boosted regressor over weather + occupancy + historical; updated nightly.
- Optimisation advisor — LLM agent with read tools (telemetry, weather, tariff, performance curves) + structured-output recommendation; no write tools by default (write requires explicit
cpms.write.*permission).
5.7 What-if simulator
- Closed-form regression model per equipment (validated against measured data).
- Plant-level rollup with parasitic loads.
- Inputs: weather (manual or forecast), load (manual or forecast), setpoints (per-loop), equipment selection.
- Outputs: kWh, kW/RT, peak kW, alarm risk; with bounds.
5.8 Safety: closed-loop control
When tenant enables closed-loop control:
- Every control action runs through a Phase 7 workflow.
- Circuit breaker — if any safety-critical telemetry deviates (chilled-water temp drifts > X, pressure exceeds limit), system reverts to last known good state and notifies on-call.
- Two-person rule for first-time activation of any new action type.
- Audit — every action logged with operator, rationale, before/after, override path.
6. Detailed Specifications
6.1 API surface (Phase 10 additions)
# Plant model
GET /api/v1/cpms/plants
POST /api/v1/cpms/plants
GET /api/v1/cpms/plants/:id
PATCH /api/v1/cpms/plants/:id
GET /api/v1/cpms/plants/:id/schematic
POST /api/v1/cpms/plants/:id/schematic
# Performance & KPIs
GET /api/v1/cpms/plants/:id/kpis?range=...
GET /api/v1/cpms/equipment/:id/curve
GET /api/v1/cpms/equipment/:id/baseline
POST /api/v1/cpms/equipment/:id/baseline (set baseline period)
# Sequencing & staging
GET /api/v1/cpms/plants/:id/staging/state
GET /api/v1/cpms/plants/:id/staging/recommendation
POST /api/v1/cpms/plants/:id/staging/apply (workflow-gated)
# Setpoint reset
GET /api/v1/cpms/plants/:id/reset/recommendations
POST /api/v1/cpms/plants/:id/reset/apply
# FDD
GET /api/v1/cpms/findings?status=open&plantId=...
PATCH /api/v1/cpms/findings/:id (acknowledge / close)
POST /api/v1/cpms/findings/:id/create-work-order
# AI optimisation
POST /api/v1/cpms/ai/forecast (load forecast)
POST /api/v1/cpms/ai/recommend (LLM agent)
GET /api/v1/cpms/ai/runs/:id
# What-if simulator
POST /api/v1/cpms/whatif/run
GET /api/v1/cpms/whatif/runs/:id
6.2 Permissions added
cpms.read cpms.write
cpms.plant.read cpms.plant.update cpms.plant.schematic.update
cpms.staging.recommend cpms.staging.apply
cpms.reset.recommend cpms.reset.apply
cpms.fdd.read cpms.fdd.update
cpms.ai.run
cpms.whatif.run
cpms.control.enable (per tenant; gate for any write to plant)
6.3 KPI catalogue (CPMS)
| KPI | Definition | Unit |
|---|---|---|
| Plant kW/RT | (chiller + pumps + tower kW) / plant cooling output | kW/RT |
| Per-chiller kW/RT | input / output | kW/RT |
| IPLV / NPLV | weighted part-load efficiency | kW/RT |
| Plant COP | cooling output / total input | — |
| Approach (chiller) | LWT − refrigerant evap temp | K |
| Approach (CT) | CWR − wet-bulb | K |
| Range (CT) | CWS − CWR | K |
| Loop ΔT | supply − return | K |
| Specific pump power | kW / (L/s) | kW·s/L |
| % runtime above efficient band | rolling | % |
| FDD open count by severity | rolling | count |
| Predicted savings vs baseline | last 7 / 30 days | kWh / % |
| Carbon intensity of plant | kWh × grid emission factor | kgCO2e |
6.4 Dashboard widgets (CPMS)
- Plant Schematic Live
- Plant kW/RT Live Tile
- Chiller Performance Card
- Cooling Tower Performance Card
- Staging State + Recommendation
- Setpoint Reset Recommendation
- FDD Open Findings
- What-if Simulator (mini)
- AI Daily Recommendation Card
- Energy + Carbon vs Baseline
6.5 Workflow templates (CPMS)
- High kW/RT response —
plant kW/RT > 0.85 for 30 min→ create P2 work order, notify on-call, attach 24-h telemetry chart. - Daily optimisation digest —
daily 06:00→ run AI recommend → email FM lead. - Approval-gated setpoint apply — recommend → human approves → workflow writes setpoint → verify → revert on anomaly.
- FDD escalation — Sev-1 finding open > 4h → escalate to manager.
6.6 Brick Schema extension (CPMS)
Chiller_Heat_Pump,Centrifugal_Chiller,Screw_Chiller,Scroll_ChillerOpen_Cooling_Tower,Closed_Cooling_TowerPrimary_CHW_Pump,Secondary_CHW_Pump,Tertiary_CHW_Pump,Condenser_Water_PumpCHW_Supply_Temperature,CHW_Return_Temperature,CHW_Differential_Pressure,CHW_Flow,Chiller_Active_Power,Chiller_Status,Chiller_Capacity_Command,Cooling_Tower_Fan_Speed,Wet_Bulb_Temperature
7. Implementation Tasks
Epic 10.A — CPMS data model
- 10.A.1 Equipment classes + plant model + schematic JSON schema.
- 10.A.2 Brick extension seeding.
- 10.A.3 Performance-curve store; library of vendor curves.
- 10.A.4 Baseline period storage.
Epic 10.B — Performance & KPI engine
- 10.B.1 kW/RT and derived KPI service.
- 10.B.2 Rolling IPLV / NPLV computation.
- 10.B.3 Approach / range / ΔT / ΔP / pump-η calculators.
- 10.B.4 KPI persistence + time-series rollups.
Epic 10.C — Sequencing & staging
- 10.C.1 Rule-based sequencer (ASHRAE GL36-aligned).
- 10.C.2 AI-augmented advisor (Phase 6 hook).
- 10.C.3 Advisory + opt-in automated paths.
Epic 10.D — Setpoint reset recommender
- 10.D.1 Three reset strategies.
- 10.D.2 Confidence + expected-savings computation.
Epic 10.E — FDD
- 10.E.1 Rule engine (composable predicates).
- 10.E.2 Launch fault library (≥30 rules).
- 10.E.3 Finding lifecycle (open → acknowledge → close → work-order link).
Epic 10.F — AI optimisation
- 10.F.1 Load forecaster (gradient-boosted regressor; nightly retrain).
- 10.F.2 Optimisation advisor (LLM agent with read tools).
- 10.F.3 Structured output + UI surfacing.
Epic 10.G — What-if simulator
- 10.G.1 Regression models per equipment.
- 10.G.2 Plant rollup.
- 10.G.3 UI with sliders and comparison.
Epic 10.H — Control safety
- 10.H.1 Workflow gates for write actions.
- 10.H.2 Circuit breaker watchdog.
- 10.H.3 Two-person rule for new action types.
Epic 10.I — Widgets, workflows, reports
- 10.I.1 Register CPMS widgets in Phase 5.
- 10.I.2 Register workflow templates in Phase 7.
- 10.I.3 CPMS reports + scheduled delivery.
8. Acceptance Criteria
- AC10.1 — A plant defined from the KTC chiller-plant fixture renders a live schematic with values updating from telemetry within 5 s of source change.
- AC10.2 — Plant kW/RT card matches manual spreadsheet calculation to within ±2% on a representative day.
- AC10.3 — Staging engine produces a recommendation under each of three test scenarios (low load, design load, partial load) consistent with ASHRAE GL36.
- AC10.4 — Setpoint reset recommendation moves CHWS setpoint up at low load with documented expected savings ≥ 3%.
- AC10.5 — FDD engine raises ≥ 20 of 30 launch rules against a seeded fault data set; false-positive rate < 10% over a 7-day baseline.
- AC10.6 — AI load forecast MAPE ≤ 12% on day-ahead next-day on the seeded data set.
- AC10.7 — What-if simulator returns plant kWh estimate within ±5% of measured baseline for a known week.
- AC10.8 — Closed-loop write to BMS is denied without
cpms.control.enable; with permission and the safety workflow, write succeeds and is auditable. - AC10.9 — All CPMS widgets render in three view modes (Standard / TV / War Room) without layout breakage.
9. Test Requirements
- Unit: ≥80% on KPI engine, sequencer, reset, FDD predicates.
- Integration: against a simulated plant (digital twin script) producing telemetry across operating modes.
- e2e: KTC-fixture plant — schematic, KPIs, staging recommendation, FDD findings.
- Safety: kill-switch and circuit-breaker tests for control actions.
- AI: forecast MAPE on hold-out; FDD precision/recall.
10. Documentation Requirements
docs/cpms/overview.mddocs/cpms/plant_model.mddocs/cpms/kpis.mddocs/cpms/sequencing.md(ASHRAE GL36 alignment)docs/cpms/fdd_library.md(each launch rule with rationale + test data)docs/cpms/control_safety.mddocs/cpms/workflow_templates.md- ADR-025: CPMS data model
- ADR-026: AI vs rule-based control split
11. Sign-off Criteria (Gate G10)
- All Acceptance Criteria met.
- Live demo on the KTC chiller-plant fixture (CHH family, primary/secondary loops, condenser pumps).
- HVAC SME review of FDD library + sequencing engine.
- CPMS Lead, AI Lead, Engineering Lead, Product Owner sign
_gates/Gate_G10_signoff.md. - Tagged
phase-10-v1.0.
12. Risks & Mitigations
| Risk | L | I | Mitigation |
|---|---|---|---|
| Customer plants don't match Brick taxonomy | 3 | 3 | Mapping templates per vendor; manual override per equipment. |
| Closed-loop control causes a safety event | 1 | 5 | Advisory-only default; workflow + circuit breaker; two-person rule for new actions. |
| FDD false positives erode trust | 3 | 4 | Per-equipment thresholds; learning baselines; user feedback loop. |
| Load-forecast MAPE drift after seasonal change | 3 | 3 | Auto-retrain monthly; alert on MAPE breach. |
| KTC data sparse for some chiller modes | 3 | 2 | Use synthetic augmentation; document confidence per recommendation. |