Tartan Smart Home — Faiza Rahman

01 — Overview

A production-grade testing pipeline

Over five milestones in CMPUT 402 (Software Quality), our team hardened the open-source Tartan Smart Home system, a rules-based home automation platform, through unit testing, coverage analysis, live A/B experimentation, static analysis, and technical debt auditing. I was responsible for test authoring, coverage instrumentation, bug discovery and fix documentation, and the A/B experiment infrastructure.

Role

Test Engineer; unit tests, coverage analysis, bug fixes, CI integration

Team

4 developers

Deliverables

Testing Tartan Smart Home; Java/Dropwizard rules engine for IoT home automation

79%

Line Coverage achieved

Starting from untested code, our black-box and white-box test suite reached 79% line coverage and 66% branch coverage on the core rules evaluator.

1

Logic bug discovered & fixed

White-box analysis via JaCoCo uncovered a dead else if branch in the lighting rule, a latent bug that would have silently misfired in production.

3→5

Days of tech debt discovered

SonarQube tracked how our feature additions grew technical debt from 3d 6h (baseline) to 5d 1h; quantifying the cost of moving fast.

A/B

Live experiment shipped

Built and ran a real A/B experiment in production Docker containers showing that cost-framed reports reduced light usage more than time-framed ones.

02 — Testing

From zero coverage to 79%

The system enforces five home-automation rules. My job was to break them; and then prove they were fixed. Each rule got a black-box test suite first (what should happen), then white-box analysis via JaCoCo to find the branches our tests couldn't reach.

Metric	Result	Coverage
Instruction Coverage	77%
Branch Coverage	66%
Line Coverage	79%

Rules tested & bugs found

Each rule was tested with a named suite (R1A, R1B…). JaCoCo reports drove targeted additions, tests weren't added to inflate numbers, but to hit real uncovered branches.

R1: Vacancy & Lights

Bug found: lights couldn't be turned off at home

The system auto-forced lights on when occupied, preventing residents from switching them off at night. Fix: lights only auto-on when occupant first enters (door-open event), not any time the house is occupied.

R2: Alarm & Door

Edge case: alarm with occupant still present

JaCoCo flagged an uncovered branch where an alarm triggered while occupant was still preparing to leave. Added R2E to cover the "door open, alarm enabled, house still occupied" state.

R3: Vacancy & Door

Dead code found in door-close logic

Two branches flagged: the system attempted to close an already-closed door, and the away-timer deactivation path was unreachable. Added R3D and R3E to expose both states.

R1: White Box

Dead `else if`

JaCoCo showed else if (lightState) never fired, it was logically equivalent to the preceding if, making the second block unreachable. Replaced with a plain else. Without coverage tooling this would have shipped.

Testing approach by milestone

G1

Unit Tests

Wrote named JUnit tests for each system rule using only the public interface.
Set up GitHub Actions CI so every branch ran the test suite automatically.

G2 Coverage

JaCoCo white-box analysis & bug fixes
Instrumented the build with JaCoCo to generate branch-level coverage reports.
Discovered and fixed the dead code

G3 In Production

Implemented usage tracking (light minutes, HVAC uptime) and a group-based reporting system.
A/B experiment shipped in live Docker containers

G4 Static Analysis

SpotBugs + PMD + ErrorProne — 280+ issues surfaced
Ran three static analysers and cross-referenced their output to identify overlaps, unique catches, and false positives.
Identified the 10 highest-priority issues by category and actionability.

G5 Tech Debt

Ran SonarQube on both the baseline and our modified branch.
Quantified codebase's technical debt growth
Proposed a concrete CI pipeline upgrade

03 — In-Production Experiment

Does framing change behaviour?

We instrumented the running system to track real usage, then split houses into groups receiving different feedback. A genuine A/B test, not a simulation. The question: does seeing your energy bill change how you use the smart home?

Control Group

Neutral baseline report

Houses in Group 1 received general energy usage summaries, total minutes the lights and HVAC ran, without cost framing or behavioural prompts.

Experimental Group

Cost-projected feedback

Houses in Group 2 received the same data translated into estimated electricity costs. Same underlying numbers, different framing but designed to create personal stakes by quantifying usage in dollars.

Infrastructure

The experiment was controlled via a groupExperiment flag set randomly per house in config.docker.yml. Every interaction was group-tagged. Switching a house's treatment required one config line change and no code deploy to simplify rollbacks.

Results were queried directly from MySQL using a time-buckets and then cleaned in Excel for charting. The data showed cost framing reduced average light usage, though HVAC usage was less responsive to either report type, likely because HVAC behaviour is more habitual and less in-the-moment.

↓

Light usage - cost group

Cost-framed feedback correlated with measurably lower light-on minutes compared to the neutral baseline group.

≈

HVAC usage - both groups

HVAC behaviour was similar across groups, suggesting habitual use is less responsive to feedback framing than discretionary actions like lighting.

04 — Static Analysis & Tech Debt

Three tools, 280+ issues

Running SpotBugs, PMD, and ErrorProne against the same codebase revealed how different tools catch different things and how their outputs often overlap, conflict, or mislead without a centralised view.

SpotBugs

42

warnings, 0 errors

PMD

170

errors, 0 warnings

ErrorProne

69

1 error, 68 warnings

Technical Debt - Baseline vs. Our Branch

SonarQube gave us what the individual tools couldn't: a single number for the cost of our decisions. Adding features to the system grew technical debt by nearly two full working days.

Baseline (original Tartan code)

3d 6h

185 maintainability issues · SonarQube quality gate: ✓ pass

Main branch (with our features)

5d 1h

↑ 313 issues (+128) · 1 blocker added · Gate: ✓ still passes

Proposed pipeline improvements

The main takeaway: three separate tools with separate outputs is harder to act on than one consolidated view. Our recommendation was to integrate SonarQube directly into GitHub Actions, retire PMD (too much overlap, harder to parse), keep SpotBugs and ErrorProne for build-time feedback, and set strict security gate thresholds so debt accumulation is caught before merge, not after.

05 — Reflection

What I actually learned

This wasn't a toy testing exercise, it was working on a real codebase with real bugs, real coverage gaps, and a real A/B experiment running in production containers. The gap between "tests pass" and "software is correct" became very concrete.

Coverage numbers lie without interpretation

77% instruction coverage sounds good. But a 66% branch rate meant there were whole decision paths we couldn't verify. JaCoCo didn't just tell me what was covered; it told me where to look next.

Static analysis tools disagree on purpose

PMD found 170 issues. SpotBugs found 42. They overlap on some things and diverge wildly on others. Running all three and cross-referencing taught me that no single tool has the full picture, and false positives are part of the work.

Experiment infrastructure matters as much as the experiment

The A/B feature was only as useful as the config flag that controlled it. Designing the groupExperiment propagation so it touched the DB schema, the service layer, and the UI in one pass meant we could change treatments without redeploying.

Technical debt is a real unit of time

Seeing "1 day 19 hours of new debt" attached to our pull requests made abstract quality concerns concrete. SonarQube's debt estimation isn't perfect, but it's a far better conversation starter than "this code is messy."

TartanSmart Home