
Alternative Data Regressor: V1
A Python Program to attain a linear regression of some alternative data against financial asset prices . A CSV file is the input. The output is the regression results.
The provided Python program is designed to process time series data from a CSV file and execute a series of analytical steps based on a predefined decision tree. Key functionalities include:
Reading a CSV File: The user inputs the path to a CSV file, which the program reads into a DataFrame.
Stationarity Testing: It tests the time series data for stationarity using the Augmented Dickey-Fuller test.
Adjusting for Non-Stationarity: If the data is non-stationary, it applies a log transformation to stabilize the time series.
Re-testing for Stationarity: After transformation, it retests the data for stationarity.
Significance Testing: Conducts an Ordinary Least Squares (OLS) regression to test the significance of the relationship between the time series and a dependent variable.
Model Development and Evaluation: If a significant relationship is found, the program proceeds to develop a baseline regression model, which is then refined and evaluated based on its R-squared value.
Output: The program outputs the results of the stationarity tests, significance tests, and the R-squared value of the regression model.
import pandas as pd
import numpy as np
from statsmodels.tsa.stattools import adfuller
from statsmodels.regression.linear_model import OLS
import statsmodels.api as sm
from scipy import stats
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score
def test_stationarity(timeseries):
# Perform Dickey-Fuller test:
dftest = adfuller(timeseries, autolag='AIC')
return dftest[1] # p-value
def adjust_non_stationarity(data):
# Adjusting for non-stationarity (example: log transformation)
return np.log(data)
def significance_testing(X, y):
# Perform significance testing (example: OLS regression)
X = sm.add_constant(X) # adding a constant
model = OLS(y, X).fit()
return model.pvalues
def main():
# Load data
file_path = input("Enter the path to your CSV file: ")
df = pd.read_csv(file_path)
# Assuming the time series column is named 'timeseries'
timeseries = df['timeseries']
# Step 1: Test for Stationarity
if test_stationarity(timeseries) > 0.05:
# Step 2: Adjust Data for Non-Stationarity
timeseries = adjust_non_stationarity(timeseries)
# Step 3: Re-test for Stationarity
if test_stationarity(timeseries) > 0.05:
print("Data is still non-stationary after transformation. Ending process.")
return
else:
print("Data is stationary after transformation. Proceeding with analysis.")
else:
print("Data is stationary. Proceeding with analysis.")
# Step 4: Significance Testing
# Assuming another column 'dependent_var' as the dependent variable
pvalues = significance_testing(df[['timeseries']], df['dependent_var'])
if any(pval < 0.05 for pval in pvalues[1:]): # Ignoring the constant's p-value
print("Significant correlation found. Proceeding to model development.")
else:
print("No significant correlation found. Ending process.")
return
# Steps 5, 6, 7: Develop, Refine, and Evaluate Regression Model
# This is a simplified example using OLS regression
X_train, X_test, y_train, y_test = train_test_split(df[['timeseries']], df['dependent_var'], test_size=0.2, random_state=0)
model = OLS(y_train, sm.add_constant(X_train)).fit()
predictions = model.predict(sm.add_constant(X_test))
print("Model R-squared:", r2_score(y_test, predictions))
# Step 8: Interpret the Regression Line
# This step is more analytical and depends on the specific model and data
# Step 9: Comparative Analysis
if __name__ == "__main__":
main()
