Home > Coding > Predicting Stock Prices with a Custom Linear Regression Model (Version 5)

Predicting Stock Prices with a Custom Linear Regression Model (Version 5)

As a data enthusiast, I’ve been experimenting with stock price prediction using Python and machine learning. My latest project combines technical indicators into a LinearRegression model to forecast the next 14 business days of stock prices, averaging results over 100 simulations for stability. I tested it on Tesla (TSLA), and here’s how it works, what indicators I used, and what to expect from its accuracy.

The Model at a Glance

This model fetches historical stock data from Yahoo Finance (via yfinance), calculates a set of technical indicators, trains a linear regression model, and predicts future prices with a bit of randomness to mimic market noise. It’s built to run non-interactively, saving a plot to a file for easy review. I’ve hardcoded it for TSLA, but you can swap in any ticker.

The Indicators Powering the Prediction

I carefully selected five indicators, plus the stock’s closing price, to capture trend, momentum, volatility, and volume dynamics. Here’s what each one does:

  1. Closing Price (Close_{ticker})
    • What it is: The stock’s daily closing price.
    • Why it’s here: It’s the foundation—everything else builds off this raw price data.
  2. 20-Day Moving Average (MA20)
    • What it is: The average closing price over the past 20 days.
    • Why it’s useful: Smooths out short-term noise to reveal the underlying trend. If the price is above MA20, it might signal a bullish trend, and below could hint at bearishness.
  3. Relative Strength Index (RSI)
    • What it is: A momentum oscillator (0-100) based on the average gains vs. losses over 14 days.
    • Why it’s useful: Flags overbought (>70) or oversold (<30) conditions. It helps the model sense when a reversal might be near.
  4. Moving Average Convergence Divergence (MACD)
    • What it is: The difference between the 12-day and 26-day exponential moving averages (EMAs).
    • Why it’s useful: Tracks momentum shifts. A rising MACD suggests strengthening bullish momentum, while a drop might warn of a downturn.
  5. Average True Range (ATR)
    • What it is: A 14-day average of the stock’s daily price range (accounting for gaps).
    • Why it’s useful: Measures volatility. Higher ATR means bigger swings, which the model uses to adjust prediction variability.
  6. On-Balance Volume (OBV)
    • What it is: A running total of volume, adding volume on up days and subtracting on down days.
    • Why it’s useful: Gauges buying or selling pressure. Rising OBV with a flat price might predict an upcoming breakout.

How the Model Works

  1. Data Fetch: Pulls daily Close, High, Low, and Volume for TSLA from 2024 onward.
  2. Indicator Calculation: Computes MA20, RSI, MACD, ATR, and OBV from the raw data.
  3. Training: Fits a LinearRegression model to predict the next day’s closing price based on these six features, using historical data up to April 9, 2025 (today’s date).
  4. Forecasting: Steps forward 14 business days, predicting each day’s price and updating indicators iteratively. I add noise (based on historical volatility) to simulate real-world randomness.
  5. Simulation: Runs this 100 times and averages the results for a smoother forecast.
  6. Output: Prints the predictions and saves a plot comparing historical and predicted prices.

Python:

import yfinance as yf
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
import holidays

us_holidays = holidays.US()

# Fetch stock data
def fetch_data(ticker):
    try:
        stock = yf.download(ticker, start="2024-01-01", end=datetime.today().strftime('%Y-%m-%d'))
        if stock.empty:
            raise ValueError(f"Failed to fetch data for {ticker}")
        df = pd.DataFrame(index=stock.index)
        df[f'Close_{ticker}'] = stock['Close']
        df['High'] = stock['High']
        df['Low'] = stock['Low']
        df['Volume'] = stock['Volume']
        df = df.dropna()
        return df
    except Exception as e:
        print(f"Error fetching data: {e}")
        return None

# Calculate RSI
def calculate_rsi(data, periods=14):
    delta = data.diff()
    gain = (delta.where(delta > 0, 0)).rolling(window=periods).mean()
    loss = (-delta.where(delta < 0, 0)).rolling(window=periods).mean()
    rs = gain / loss
    rsi = 100 - (100 / (1 + rs))
    return rsi

# Calculate MACD
def calculate_macd(df, ticker, short_period=12, long_period=26):
    short_ema = df[f'Close_{ticker}'].ewm(span=short_period, adjust=False).mean()
    long_ema = df[f'Close_{ticker}'].ewm(span=long_period, adjust=False).mean()
    macd = short_ema - long_ema
    return macd

# Calculate ATR
def calculate_atr(df, ticker, period=14):
    high_low = df['High'] - df['Low']
    high_prev_close = abs(df['High'] - df[f'Close_{ticker}'].shift(1))
    low_prev_close = abs(df['Low'] - df[f'Close_{ticker}'].shift(1))
    true_range = pd.concat([high_low, high_prev_close, low_prev_close], axis=1).max(axis=1)
    atr = true_range.rolling(window=period).mean()
    return atr

# Calculate OBV
def calculate_obv(df, ticker):
    obv = [0]
    for i in range(1, len(df)):
        if df[f'Close_{ticker}'].iloc[i] > df[f'Close_{ticker}'].iloc[i-1]:
            obv.append(obv[-1] + df['Volume'].iloc[i])
        elif df[f'Close_{ticker}'].iloc[i] < df[f'Close_{ticker}'].iloc[i-1]:
            obv.append(obv[-1] - df['Volume'].iloc[i])
        else:
            obv.append(obv[-1])
    return pd.Series(obv, index=df.index)

# Prepare data with combined indicators
def prepare_data(df, ticker):
    df = df.copy()
    stock_col = f'Close_{ticker}'
    df['MA20'] = df[stock_col].rolling(window=20).mean()
    df['RSI'] = calculate_rsi(df[stock_col])
    df['MACD'] = calculate_macd(df, ticker)
    df['ATR'] = calculate_atr(df, ticker)
    df['OBV'] = calculate_obv(df, ticker)
    df = df.dropna()
    df['Target'] = df[stock_col].shift(-1)
    df = df[:-1]
    df['Returns'] = df[stock_col].pct_change()
    volatility = df['Returns'].std()
    return df, volatility

# Train the model
def train_model(df, ticker):
    X = df[[f'Close_{ticker}', 'MA20', 'RSI', 'MACD', 'ATR', 'OBV']]
    y = df['Target']
    model = LinearRegression()
    model.fit(X, y)
    return model

# Check if a date is a business day
def is_business_day(date):
    return date.weekday() < 5 and date not in us_holidays

# Predict next 14 business days (single run)
def predict_next_14_business_days(model, df, volatility, ticker):
    feature_names = [f'Close_{ticker}', 'MA20', 'RSI', 'MACD', 'ATR', 'OBV']
    last_data = df.tail(1)[feature_names].values[0]
    predictions = []
    future_dates = []
    days_ahead = 0
    business_days_count = 0

    while business_days_count < 14:
        next_date = datetime.today() + timedelta(days=days_ahead + 1)
        if is_business_day(next_date):
            input_data = pd.DataFrame([last_data], columns=feature_names)
            next_price = model.predict(input_data)[0]
            noise = np.random.normal(0, 0.5 * volatility * last_data[0])
            next_price += noise
            predictions.append(max(next_price, 0))
            future_dates.append(next_date)

            # Approximate new indicator values
            new_close = next_price
            new_ma20 = (last_data[1] * 19 + new_close) / 20
            delta = new_close - last_data[0]
            new_rsi = min(max(last_data[2] + (delta * 2), 0), 100)
            new_macd = last_data[3] + (delta * 0.1)  # Rough approximation
            new_atr = last_data[4]  # Assume stable
            new_obv = last_data[5] + (100000 if new_close > last_data[0] else -100000 if new_close < last_data[0] else 0)

            last_data = [new_close, new_ma20, new_rsi, new_macd, new_atr, new_obv]
            business_days_count += 1
        days_ahead += 1

    return future_dates, predictions

# Run simulation 100 times and average predictions
def simulate_and_average_predictions(model, df, volatility, ticker, num_runs=100):
    all_predictions = []
    future_dates = None

    for _ in range(num_runs):
        dates, preds = predict_next_14_business_days(model, df, volatility, ticker)
        all_predictions.append(preds)
        if future_dates is None:
            future_dates = dates

    avg_predictions = np.mean(all_predictions, axis=0)
    return future_dates, avg_predictions

# Plot results
def plot_results(historical_df, future_dates, predictions, ticker):
    plt.figure(figsize=(12, 6))
    plt.plot(historical_df.index, historical_df[f'Close_{ticker}'], label=f'Historical {ticker} Close', color='blue')
    plt.plot(future_dates, predictions, label=f'Predicted {ticker} Close (Avg of 100 Runs)', color='red', linestyle='--')
    plt.title(f'{ticker} Stock Price Prediction (Next 14 Business Days)')
    plt.xlabel('Date')
    plt.ylabel('Price (USD)')
    plt.legend()
    plt.grid()
    plt.savefig(f'{ticker}_prediction.png')
    plt.close()

# Main execution
if __name__ == "__main__":
    try:
        ticker = "TSLA"
        print(f"Starting prediction for {ticker}...")
        print(f"Step 1: Fetching data for {ticker}...")
        data = fetch_data(ticker)
        if data is None:
            print("Failed to fetch data. Exiting.")
            exit(1)
        
        print("Step 2: Preparing data...")
        prepared_data, volatility = prepare_data(data, ticker)
        if prepared_data.empty:
            print("Prepared data is empty. Exiting.")
            exit(1)
        
        print("Step 3: Training model...")
        model = train_model(prepared_data, ticker)
        
        print("Step 4: Simulating predictions (100 runs)...")
        future_dates, avg_predictions = simulate_and_average_predictions(model, prepared_data, volatility, ticker, num_runs=100)
        
        print(f"\nPredicted {ticker} Stock Prices for the Next 14 Business Days (Average of 100 Runs):")
        for date, price in zip(future_dates, avg_predictions):
            print(f"{date.strftime('%Y-%m-%d')}: ${price:.2f}")
        
        print("Step 5: Generating plot...")
        plot_results(prepared_data, future_dates, avg_predictions, ticker)
        print(f"Plot saved as '{ticker}_prediction.png' in the current directory.")
        
    except Exception as e:
        print(f"Script failed with error: {e}")
        exit(1)

Sample Output for TSLA

Here’s what it predicted starting April 9, 2025 (numbers are illustrative—run the code for current results):

Predicted TSLA Stock Prices for the Next 14 Business Days (Average of 100 Runs):
2025-04-09: $235.64
2025-04-10: $234.12
2025-04-11: $232.89
2025-04-14: $231.45
...

The plot (TSLA_prediction.png) shows historical prices in blue and the forecast in a red dashed line.

How Accurate Is It?

Let’s be real—predicting stock prices is tough, even with fancy indicators. Here’s my take on its accuracy:

  • Short-Term (1-5 Days): It’s decently reliable for spotting trends, especially if TSLA’s recent behavior aligns with MA20 or MACD signals. Linear regression assumes relationships stay consistent, so it might catch a continuation or mild reversal. I’d guess it’s within 5-10% of actual prices half the time, based on backtesting vibes (I haven’t formally validated it yet).
  • Long-Term (14 Days): Accuracy drops as predictions stretch out. The iterative updates for indicators like MACD and OBV are rough approximations (no future volume data!), and market shocks (earnings, news) can throw it off. Expect bigger errors here—maybe 15-20% off by day 14.
  • Limitations: It’s a linear model, so it misses complex patterns (e.g., sudden crashes). The noise I add helps, but it’s not a crystal ball. Plus, it’s trained on 2024-2025 data—older trends might not apply.

Using It Wisely

This isn’t financial advice—think of it as a fun experiment! Here’s how to use it:

  • Run It: Grab the code [link to your GitHub/repo if you share it], install dependencies (yfinance, pandas, numpy, sklearn, matplotlib, holidays), and tweak the ticker in the script.
  • Interpret: If MA20 and MACD align with the prediction (e.g., both rising), it might have more weight. Cross-check with news or X posts about TSLA.
  • Test: Backtest it by shifting the “today” date back a month and comparing predictions to actuals. Adjust the noise (0.5 * volatility) if it’s too wild.

Final Thoughts

This model blends trend (MA20), momentum (RSI, MACD), volatility (ATR), and volume (OBV) into a simple yet insightful tool. It won’t make you rich, but it’s a cool way to explore stock data. I might try a Random Forest next for non-linear patterns—what do you think? Drop a comment if you test it out!

Leave a Comment