Running the Numbers
I love digging into data, and while I am a certified data analyst, my day job is more in the data structure and usefulness. I’m just someone who appreciates the power of numbers and the insights they can reveal. I also rely heavily on other smart people who share their data science knowledge online, which means there’s always a chance I’m misunderstanding, misapplying, or just missing a better way to do things. But, in the spirit of transparency, here’s how I analyzed my home’s furnace runtime using R.
Step 1: Loading and Cleaning the Data
First, I loaded my dataset into R. My data came from my Nest thermostat (tracking furnace runtime) and Weather Underground (tracking temperature data). Here’s the basic cleaning process:
library(tidyverse)
library(lubridate)
library(janitor)
df <- read_csv("~/Desktop/Gas_and_Electric_Use.csv") %>%
clean_names() %>%
mutate(date = as.Date(date, format = "%m/%d/%y")) %>%
filter(furnace_total_time > 0) # Remove days where furnace didn't run
glimpse(df) # Quick check of the data structure
This ensures that my column names are cleaned, dates are properly formatted, and I’m only analyzing days where the furnace actually ran.
Step 2: Checking Correlation Between Temperature and Runtime
I wanted to see how well temperature predicts furnace runtime, so I calculated correlation coefficients:
cor_daily <- cor(df$average_temp_for_the_day, df$furnace_total_time, use = "complete.obs")
cor_2day <- cor(df$x2_day_average_rounded, df$furnace_total_time, use = "complete.obs")
print(paste("Correlation (Daily Avg Temp vs Furnace Runtime):", round(cor_daily, 2)))
print(paste("Correlation (2-day Rolling Avg Temp vs Furnace Runtime):", round(cor_2day, 2)))
The result? The 2-day rolling average temperature had a stronger correlation with furnace runtime than the daily average temperature, confirming that my home’s heat retention plays a role.
Step 3: Visualizing the Relationship
Now, time to plot the relationship between temperature and runtime. First, with the daily average temperature:
p1 <- ggplot(df, aes(x = average_temp_for_the_day, y = furnace_total_time)) +
geom_point(alpha = 0.6) +
geom_smooth(method = "lm", se = FALSE, color = "blue") +
labs(title = paste("Daily Avg Temp vs Furnace Runtime\nCorrelation:", round(cor_daily, 2)),
x = "Daily Avg Temp (°F)",
y = "Furnace Runtime (hours)") +
theme_minimal()
p1
Then, with the 2-day rolling average temperature:
p2 <- ggplot(df, aes(x = x2_day_average_rounded, y = furnace_total_time)) +
geom_point(alpha = 0.6) +
geom_smooth(method = "lm", se = FALSE, color = "blue") +
labs(title = paste("2-day Rolling Avg Temp vs Furnace Runtime\nCorrelation:", round(cor_2day, 2)),
x = "2-day Rolling Avg Temp (°F)",
y = "Furnace Runtime (hours)") +
theme_minimal()
p2
Step 4: Testing for Exponential Energy Use
The next question: does furnace runtime increase exponentially as temperatures drop? I ran both a linear model and an exponential model:
fit_exp <- lm(log(furnace_total_time) ~ x2_day_average_rounded, data = df)
fit_lin <- lm(furnace_total_time ~ x2_day_average_rounded, data = df)
I then plotted both models to see how they compare:
df <- df %>%
mutate(pred_exp = exp(predict(fit_exp, .)),
pred_lin = predict(fit_lin, .))
p3 <- ggplot(df, aes(x = x2_day_average_rounded, y = furnace_total_time)) +
geom_point(alpha = 0.6) +
geom_line(aes(y = pred_lin), color = "blue", size = 1, linetype = "dashed") +
geom_line(aes(y = pred_exp), color = "red", size = 1) +
labs(title = "Furnace Runtime vs 2-day Rolling Avg Temp with Trendlines",
subtitle = "Blue Dashed: Linear Fit | Red: Exponential Fit",
x = "2-day Rolling Avg Temp (°F)",
y = "Furnace Runtime (hours)") +
theme_minimal()
p3
Step 5: Reviewing the Results
The final output summarized everything:
exp_summary <- capture.output(summary(fit_exp))
lin_summary <- capture.output(summary(fit_lin))
analysis_summary <- c(
"Analysis Summary:",
"-----------------",
paste("Correlation (Daily Avg Temp vs Furnace Runtime):", round(cor_daily, 2)),
paste("Correlation (2-day Rolling Avg Temp vs Furnace Runtime):", round(cor_2day, 2)),
"",
"Exponential Model Summary:",
exp_summary,
"",
"Linear Model Summary:",
lin_summary
)
cat(paste(analysis_summary, collapse = "\n"))
What I Learned (and What I’m Probably Getting Wrong)
- The 2-day rolling average temperature is a better predictor of furnace runtime than the daily average. This makes sense—my house holds heat, so one day’s temperature isn’t the full story.
- Furnace runtime doesn’t appear to increase exponentially as temperatures drop. That was unexpected! I expected a curve, but the data suggests a pretty steady linear relationship.
- I lean heavily on other people’s insights. There’s a good chance there are better ways to approach this analysis, and I’m always open to feedback.
This was a fun exercise, and it gave me real, data-backed insights into how my home uses energy. But the next big question remains: how would a heat pump perform under the same conditions?