Stock price prediction is a fascinating way to learn machine learning, combining data science with real-world applications. In this post, we’ll build a simple Python program to predict stock prices using historical data and a Linear Regression model. You’ll learn how to fetch data, engineer features, train a model, and visualize predictions. The code is beginner-friendly, complete, and ready to run!
Libraries Used: A Detailed Overview
Before diving into the code, let’s explore the Python libraries we’ll use. Each serves a specific purpose in our stock prediction pipeline.
1. NumPy
- Purpose: Numerical computing.
- Description: NumPy (
numpy
) is the backbone of scientific computing in Python. It provides support for large, multi-dimensional arrays and matrices, along with a vast collection of mathematical functions. In this project, we use NumPy to handle arrays for feature scaling and data manipulation, ensuring fast computations. For example, we reshape arrays for compatibility with our machine learning model. - Key Features:
- Efficient array operations (e.g.,
np.array
,np.reshape
). - Mathematical functions for scaling and metrics.
- Memory-efficient and fast compared to Python lists.
- Efficient array operations (e.g.,
- Why We Use It: Stock data is numerical, and NumPy’s array operations make it easy to preprocess data for our model.
2. Pandas
- Purpose: Data manipulation and analysis.
- Description: Pandas (
pandas
) is a powerful library for handling structured data, like tables or time series. It provides theDataFrame
object, which is like a spreadsheet in Python, perfect for stock price data with dates and prices. We use Pandas to load, clean, and create features (e.g., moving averages) from stock data. - Key Features:
- DataFrame for tabular data manipulation.
- Time series support (e.g.,
df['Close'].rolling()
for moving averages). - Easy handling of missing data with
dropna()
.
- Why We Use It: Stock data from Yahoo Finance is time-series data, and Pandas simplifies tasks like filtering, grouping, and feature engineering.
3. Matplotlib
- Purpose: Data visualization.
- Description: Matplotlib (
matplotlib.pyplot
) is a plotting library for creating static, interactive, and animated visualizations. We use it to plot actual vs. predicted stock prices, helping us visually assess our model’s performance. - Key Features:
- Line plots, scatter plots, and more (e.g.,
plt.plot
). - Customizable figures with titles, labels, and legends.
- Support for time series visualization.
- Line plots, scatter plots, and more (e.g.,
- Why We Use It: Visualizing stock price trends and predictions is crucial to understand how well our model performs.
4. yfinance
- Purpose: Fetching financial data.
- Description: The
yfinance
library allows us to download historical stock market data from Yahoo Finance. It’s simple to use and provides access to stock prices, volumes, and other metrics. We use it to fetch daily closing prices for our chosen stock. - Key Features:
- Easy API for stock data (e.g.,
yf.download
). - Supports multiple tickers and date ranges.
- Free and open-source.
- Easy API for stock data (e.g.,
- Why We Use It: It provides reliable, real-time stock data without needing a paid API.
5. Scikit-Learn (sklearn)
- Purpose: Machine learning and data preprocessing.
- Description: Scikit-Learn (
sklearn
) is a robust library for machine learning, offering tools for data preprocessing, model training, and evaluation. We use itsLinearRegression
for prediction,MinMaxScaler
for scaling data, andmean_squared_error
for evaluating model performance. - Key Features:
- Simple Linear Regression model (
LinearRegression
). - Scaling tools like
MinMaxScaler
to normalize data. - Metrics like RMSE (
mean_squared_error
) for evaluation.
- Simple Linear Regression model (
- Why We Use It: It provides an easy-to-use Linear Regression model and preprocessing tools, ideal for beginners.
Prerequisites
- Install Python 3.9 or later.
- Install required libraries:
pip install yfinance pandas numpy scikit-learn matplotlib
- Basic Python knowledge (e.g., variables, functions).
- A code editor like PyCharm, VS Code, or Jupyter Notebook.
The Complete Code
Below is the full Python script to predict stock prices. It prompts you to enter a stock ticker (e.g., AAPL for Apple), fetches historical data, trains a Linear Regression model, and predicts the next day’s price.