Running the Numbers

I love digging into data, and while I am a certified data analyst, my day job is more in the data structure and usefulness. I’m just someone who appreciates the power of numbers and the insights they can reveal. I also rely heavily on other smart people who share their data science knowledge online, which means there’s always a chance I’m misunderstanding, misapplying, or just missing a better way to do things. But, in the spirit of transparency, here’s how I analyzed my home’s furnace runtime using R.

Step 1: Loading and Cleaning the Data

First, I loaded my dataset into R. My data came from my Nest thermostat (tracking furnace runtime) and Weather Underground (tracking temperature data). Here’s the basic cleaning process:

library(tidyverse)
library(lubridate)
library(janitor)

df <- read_csv("~/Desktop/Gas_and_Electric_Use.csv") %>%
  clean_names() %>%
  mutate(date = as.Date(date, format = "%m/%d/%y")) %>%
  filter(furnace_total_time > 0)  # Remove days where furnace didn't run

glimpse(df)  # Quick check of the data structure

This ensures that my column names are cleaned, dates are properly formatted, and I’m only analyzing days where the furnace actually ran.

Step 2: Checking Correlation Between Temperature and Runtime

I wanted to see how well temperature predicts furnace runtime, so I calculated correlation coefficients:

cor_daily  <- cor(df$average_temp_for_the_day, df$furnace_total_time, use = "complete.obs")
cor_2day   <- cor(df$x2_day_average_rounded, df$furnace_total_time, use = "complete.obs")

print(paste("Correlation (Daily Avg Temp vs Furnace Runtime):", round(cor_daily, 2)))
print(paste("Correlation (2-day Rolling Avg Temp vs Furnace Runtime):", round(cor_2day, 2)))

The result? The 2-day rolling average temperature had a stronger correlation with furnace runtime than the daily average temperature, confirming that my home’s heat retention plays a role.

Step 3: Visualizing the Relationship

Now, time to plot the relationship between temperature and runtime. First, with the daily average temperature:

p1 <- ggplot(df, aes(x = average_temp_for_the_day, y = furnace_total_time)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE, color = "blue") +
  labs(title = paste("Daily Avg Temp vs Furnace Runtime\nCorrelation:", round(cor_daily, 2)),
       x = "Daily Avg Temp (°F)",
       y = "Furnace Runtime (hours)") +
  theme_minimal()
p1

Then, with the 2-day rolling average temperature:

p2 <- ggplot(df, aes(x = x2_day_average_rounded, y = furnace_total_time)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = FALSE, color = "blue") +
  labs(title = paste("2-day Rolling Avg Temp vs Furnace Runtime\nCorrelation:", round(cor_2day, 2)),
       x = "2-day Rolling Avg Temp (°F)",
       y = "Furnace Runtime (hours)") +
  theme_minimal()
p2

Step 4: Testing for Exponential Energy Use

The next question: does furnace runtime increase exponentially as temperatures drop? I ran both a linear model and an exponential model:

fit_exp <- lm(log(furnace_total_time) ~ x2_day_average_rounded, data = df)
fit_lin <- lm(furnace_total_time ~ x2_day_average_rounded, data = df)

I then plotted both models to see how they compare:

df <- df %>%
  mutate(pred_exp = exp(predict(fit_exp, .)),
         pred_lin = predict(fit_lin, .))

p3 <- ggplot(df, aes(x = x2_day_average_rounded, y = furnace_total_time)) +
  geom_point(alpha = 0.6) +
  geom_line(aes(y = pred_lin), color = "blue", size = 1, linetype = "dashed") +
  geom_line(aes(y = pred_exp), color = "red", size = 1) +
  labs(title = "Furnace Runtime vs 2-day Rolling Avg Temp with Trendlines",
       subtitle = "Blue Dashed: Linear Fit | Red: Exponential Fit",
       x = "2-day Rolling Avg Temp (°F)",
       y = "Furnace Runtime (hours)") +
  theme_minimal()
p3

Step 5: Reviewing the Results

The final output summarized everything:

exp_summary <- capture.output(summary(fit_exp))
lin_summary <- capture.output(summary(fit_lin))

analysis_summary <- c(
  "Analysis Summary:",
  "-----------------",
  paste("Correlation (Daily Avg Temp vs Furnace Runtime):", round(cor_daily, 2)),
  paste("Correlation (2-day Rolling Avg Temp vs Furnace Runtime):", round(cor_2day, 2)),
  "",
  "Exponential Model Summary:",
  exp_summary,
  "",
  "Linear Model Summary:",
  lin_summary
)

cat(paste(analysis_summary, collapse = "\n"))

What I Learned (and What I’m Probably Getting Wrong)

  1. The 2-day rolling average temperature is a better predictor of furnace runtime than the daily average. This makes sense—my house holds heat, so one day’s temperature isn’t the full story.
  2. Furnace runtime doesn’t appear to increase exponentially as temperatures drop. That was unexpected! I expected a curve, but the data suggests a pretty steady linear relationship.
  3. I lean heavily on other people’s insights. There’s a good chance there are better ways to approach this analysis, and I’m always open to feedback.

This was a fun exercise, and it gave me real, data-backed insights into how my home uses energy. But the next big question remains: how would a heat pump perform under the same conditions?