There is never a dull moment when working in the multifamily industry. Many years ago, my first job right out of college was for a property management company who managed about 1,000 doors.
It was during my time there that I unknowingly built my first anomaly detection system.
It all started with a task that had been ping ponging around the office for the better part of a year. It was a billing headache. But it became a lesson in data-driven thinking.
In the face of the 2008 recession, the company I worked for had laid off over 50 people and was struggling to survive. Under financial pressure, they did everything to save a dime. One of those was taking over the water billing—often done by a third-party biller.
My manager figured it would be easy enough for someone in the office to do the bills. What he didn’t know was the task was excruciatingly boring.
In six months, three different people had done the task and were begging to be released of their duties.
At the time, I had already established my creative problem solving abilities. And when no one wanted to do it, it landed on my desk.
There were no procedures attached to the task. No one had done it enough to even be able to standardize it. The only training I had was a quick overview of how the utilities work.
Multifamily Utilities
One of the most common ways landlords charge residents is by using a system called the Ratio Utility Billing System, commonly referred to as RUBS. This is predominantly in older buildings.
At a basic level, the total water and sewer bill is divided among tenants using a formula. At a deeper level, the formula takes into account several key elements, including square footage and the number of people residing in the unit.
The other method is submetering. Each individual apartment was equipped with a submeter, allowing the landlord to track and charge residents based on the readings from it.
In my case, the company had a little over 500 units with submeters. Those were the bills I tackled.
Unexpected Pattern
As things quiet down a bit, I was able to sit down and look through the problematic units.
When looking at all the readings, I noticed an unexpected pattern. Rather than consistently high usage—which would indicate lifestyle factors or occupancy differences—I observed sharp, isolated spikes. I was intrigued.
In every single complaint the pattern was the same. They were anomalous events.
That led me to a deeper root cause analysis.
As I dug deeper, my investigation took an interesting turn. One of my high-usage observations was in a unit that showed no reading for several days. Then, two unusually high days followed, and usage returned to zero.
After looking a little closer, the system showed the unit as vacant. The manager confirmed it. It was a mystery.
A week later, I was at the property for an unrelated reason. I was walking with the maintenance supervisor, and he made a quick stop at that same vacant unit to use the bathroom.
After flushing, I noticed the tank didn’t stop running.
The flapper was stuck open, causing the water to run continuously. A new hypothesis was formed:
Building an Early Warning System
I faced a reactive detection problem. By the time residents received their bill, the damage was done. I had to figure out a way to avert the situation.
If my hypothesis was correct, this was caused by toilets. The solution was to replace the guts in the toilet tank. My initial count was around 800 toilets. I was not getting buy-in from anyone on that.
The issue bugged me for days. There had to be a way to avoid all this chaos.
One morning, while drinking a cup of coffee on my way to work, I had this great idea. What if I were to do the water readings every Friday? I would be able to tell if anyone had astronomical usage during that week and warn management. This would be my “early warning system.”
When I first proposed to my manager, he shut it down.
He expressed the difficulty in determining what would be an anomaly. First, how could I determine that was abnormal usage or if it was a get together where multiple people were at the apartment?
Also, a 2-bedroom at complex A is very different than one in complex B. Furthermore, there was no way to determine how many people lived in the units or their life styles, which could substantially alter the bill. Not to mention the amount of time I would spend doing that.
I asked for a week to work on a solution. Despite his reservations, he obliged. And that is when I stumbled upon measures of central tendency and variability.
Exploratory Data Analysis
This was my first exposure to the power of statistical thinking.
The goal was to distinguish between legitimate consumption variability and anomalous behavior.
First, I needed to figure out what was legitimate consumption. I began by retrieving historical meter data from the previous 12 months. The daily .txt files contained only 4 pieces of usable information: the complex name, apartment number, date, and the reading for the day.
Enriching the data, I added the number of bedrooms, bathrooms, and square footage.
For the purposes of finding what was considered “normal” consumption, I stratify by bedroom and bathroom count. Naturally, a 3-bedroom unit consumed a lot more than a one-bedroom unit.
My first findings led me to remove some noise from the dataset. The building location, square footage, and bathroom count showed no statistically significant impact on consumption (p > 0.15, Kruskal-Wallis test). This allowed me to pool data across properties for larger sample sizes.
The goal was to determine usage, so I eliminated anything with no usage. Determining the reason for the lack of consumption was a task for another time.
The Tukey Fence Method
Given the right-skewed distribution and presence of extreme outliers, the very thing I was trying to detect, I selected robust statistical measures rather than methods that assume normality.
For each apartment type, I calculated the median monthly consumption (Q2), the IQR (Q3-Q1) and the sample size (n=45).
Instead of assuming the data followed a perfect bell curve, I used a method robust to outliers — the Tukey fence method. It uses medians and quartiles rather than averages.
The reason behind this decision was simple. A two-bedroom with a family of 4 uses more water and a 2-bedroom with a single resident. Thus, having a range Q2 was more appropriate than the average.
Upper threshold = Q3 + (1.5 * IQR)
The 1.5 * IQR multiplier is a standard statistical definition for “outlier” in boxplot analysis. For consumption data, this typically captures values > 99th percentile while minimizing false positives. It is a little more aggressive than 3σ rule, which would miss the outliers, but it was easier to explain to non-technical people.
Implementation and Validation
For the weekly analysis, I evenly divided the IQR results to make it “weekly.” With that information in hand, I began pulling the 7-day consumption totals for all units and comparing them. My heart dropped. The results were less than promising.
I continued digging and found a crucial mistake.
The IQR was based on the annual consumption. But when I examined them separately, month by month, I noticed something that I had completely missed at first: seasonality.
Residents used more water in the summer than in the winter. With that, I had to adjust my data. I was aware that my data pool had diminished drastically. I was no longer working with 12 months of data, but a single month. I continued on.
Using coefficient of variation < 0.08, I paired the months based on consumption patterns that were statistically stable. Again, this reduced my sample size per threshold calculation but improved accuracy.
Working with a smaller sample was a bold move. But the trade-off was acceptable. The IQR method remained robust, and false positive rates dropped to nearly zero.
To validate my theory, I applied this methodology to the previous six months. I was able to capture 96% of eventual complaint cases before they were billed.
In the following months, the billing disputes decreased from 15 to nearly non-existent. That was an incredible achievement.
The algorithm was very successful. Aside from the complaints, my commitment time to the task dropped from eight to two hours per week. And most of that time was spent putting maintenance tickets in for the problematic units.
I was aware that my algorithm was not perfect. It had a limitation of 4%. But, it was not until much later that I found out those counted for intermittent leaks—dripping faucets and showers. Those increased consumption by 10 to 20% and did not exceed the 1.5*IQR threshold.
The precise business impact metrics were not formally tracked at the time—something I regret today. The outcomes described are more observational improvements rather than measured KPIs.
Conclusion
This project fundamentally shaped my approach to operational analytics. What began as a billing dispute problem became an exercise in statistical process control—identifying normal variation, detecting true anomalies, and implementing automated monitoring systems.
Key takeaways:
Domain knowledge is critical: Understanding toilet flapper failure modes informed the entire detection strategy.
Exploratory analysis drives methodology: Discovering seasonality required mid-stream adaptation.
Simplicity scales: The Tukey fence method was both statistically sound and explainable to non-technical stakeholders.
Validation builds trust: Retrospective testing provided the evidence needed for management buy-in.
The anomaly detection framework I developed is transferable across domains: predictive maintenance, fraud detection, quality control, or any operational context requiring early identification of deviations from expected behavior.