01 — Overview
Over five milestones in CMPUT 402 (Software Quality), our team hardened the open-source Tartan Smart Home system, a rules-based home automation platform, through unit testing, coverage analysis, live A/B experimentation, static analysis, and technical debt auditing. I was responsible for test authoring, coverage instrumentation, bug discovery and fix documentation, and the A/B experiment infrastructure.
Starting from untested code, our black-box and white-box test suite reached 79% line coverage and 66% branch coverage on the core rules evaluator.
White-box analysis via JaCoCo uncovered a dead else if branch in the lighting rule, a
latent bug that would have silently misfired in production.
SonarQube tracked how our feature additions grew technical debt from 3d 6h (baseline) to 5d 1h; quantifying the cost of moving fast.
Built and ran a real A/B experiment in production Docker containers showing that cost-framed reports reduced light usage more than time-framed ones.
02 — Testing
The system enforces five home-automation rules. My job was to break them; and then prove they were fixed. Each rule got a black-box test suite first (what should happen), then white-box analysis via JaCoCo to find the branches our tests couldn't reach.
| Metric | Result | Coverage |
|---|---|---|
| Instruction Coverage | 77% | |
| Branch Coverage | 66% | |
| Line Coverage | 79% |
Each rule was tested with a named suite (R1A, R1B…). JaCoCo reports drove targeted additions, tests weren't added to inflate numbers, but to hit real uncovered branches.
The system auto-forced lights on when occupied, preventing residents from switching them off at night. Fix: lights only auto-on when occupant first enters (door-open event), not any time the house is occupied.
JaCoCo flagged an uncovered branch where an alarm triggered while occupant was still preparing to leave. Added R2E to cover the "door open, alarm enabled, house still occupied" state.
Two branches flagged: the system attempted to close an already-closed door, and the away-timer deactivation path was unreachable. Added R3D and R3E to expose both states.
else if JaCoCo showed else if (lightState) never fired, it was logically equivalent to the preceding
if, making the second block unreachable. Replaced with a plain else. Without
coverage tooling this would have shipped.
03 — In-Production Experiment
We instrumented the running system to track real usage, then split houses into groups receiving different feedback. A genuine A/B test, not a simulation. The question: does seeing your energy bill change how you use the smart home?
Houses in Group 1 received general energy usage summaries, total minutes the lights and HVAC ran, without cost framing or behavioural prompts.
Houses in Group 2 received the same data translated into estimated electricity costs. Same underlying numbers, different framing but designed to create personal stakes by quantifying usage in dollars.
The experiment was controlled via a groupExperiment flag set randomly per house in
config.docker.yml. Every interaction was group-tagged. Switching a house's
treatment required one config line change and no code deploy to simplify rollbacks.
Results were queried directly from MySQL using a time-buckets and then cleaned in Excel for charting. The data showed cost framing reduced average light usage, though HVAC usage was less responsive to either report type, likely because HVAC behaviour is more habitual and less in-the-moment.
Cost-framed feedback correlated with measurably lower light-on minutes compared to the neutral baseline group.
HVAC behaviour was similar across groups, suggesting habitual use is less responsive to feedback framing than discretionary actions like lighting.
04 — Static Analysis & Tech Debt
Running SpotBugs, PMD, and ErrorProne against the same codebase revealed how different tools catch different things and how their outputs often overlap, conflict, or mislead without a centralised view.
SonarQube gave us what the individual tools couldn't: a single number for the cost of our decisions. Adding features to the system grew technical debt by nearly two full working days.
The main takeaway: three separate tools with separate outputs is harder to act on than one consolidated view. Our recommendation was to integrate SonarQube directly into GitHub Actions, retire PMD (too much overlap, harder to parse), keep SpotBugs and ErrorProne for build-time feedback, and set strict security gate thresholds so debt accumulation is caught before merge, not after.
05 — Reflection
This wasn't a toy testing exercise, it was working on a real codebase with real bugs, real coverage gaps, and a real A/B experiment running in production containers. The gap between "tests pass" and "software is correct" became very concrete.
77% instruction coverage sounds good. But a 66% branch rate meant there were whole decision paths we couldn't verify. JaCoCo didn't just tell me what was covered; it told me where to look next.
PMD found 170 issues. SpotBugs found 42. They overlap on some things and diverge wildly on others. Running all three and cross-referencing taught me that no single tool has the full picture, and false positives are part of the work.
The A/B feature was only as useful as the config flag that controlled it. Designing the
groupExperiment propagation so it touched the DB schema, the service layer, and the UI in one
pass meant we could change treatments without redeploying.
Seeing "1 day 19 hours of new debt" attached to our pull requests made abstract quality concerns concrete. SonarQube's debt estimation isn't perfect, but it's a far better conversation starter than "this code is messy."