Long time lurker first time poster.

I have data that roughly follows a y=sin(time) distribution, but also depends on other variables than time. In terms of correlations, since the target y-variable oscillates there is almost zero statistical correlation with time, but y obviously depends very strongly on time.

The goal is to predict the future values of the target variable. I want to avoid using an explicit assumption of the model, and instead rely on data driven models and machine learning, so I have tried using regression methods from sklearn.

I have tried the following methods (the parameters were blindly copied from examples and other threads):

```
LogisticRegression()
QDA()
GridSearchCV(SVR(kernel='rbf', gamma=0.1), cv=5,
param_grid={"C": [1e0, 1e1, 1e2, 1e3],
"gamma": np.logspace(-2, 2, 5)})
GridSearchCV(KernelRidge(kernel='rbf', gamma=0.1), cv=5,
param_grid={"alpha": [1e0, 0.1, 1e-2, 1e-3],
"gamma": np.logspace(-2, 2, 5)})
GradientBoostingRegressor(loss='quantile', alpha=0.95,
n_estimators=250, max_depth=3,
learning_rate=.1, min_samples_leaf=9,
min_samples_split=9)
DecisionTreeRegressor(max_depth=4)
AdaBoostRegressor(DecisionTreeRegressor(max_depth=4),
n_estimators=300, random_state=rng)
RandomForestRegressor(n_estimators=10, min_samples_split=2, n_jobs=-1)
```

The results fall into two different categories of failure:

- The time field is having no effect, probably due to the absence of correlation from the oscillatory behaviour of the target variable. However, secondary effects from other variables allow a modest predictive capability for future time ranges (these other variables have a simple correlation with the target variable)
- The when applying predict() to the training time range the prediction is near perfect with respect to the observations, but when given the future time range (for which data was not used in training) the predicted value stays constant.

Below is how I performed the training and testing:

```
weather_df.index = pd.to_datetime(weather_df.index,unit='D')
weather_df['Days'] = (weather_df.index-datetime.datetime(2005,1,1)).days
ts = pd.DataFrame({'Temperature':weather_df['Mean TemperatureC'].ix[:'2015-1-1'],
'Humidity':weather_df[' Mean Humidity'].ix[:'2015-1-1'],
'Visibility':weather_df[' Mean VisibilityKm'].ix[:'2015-1-1'],
'Wind':weather_df[' Mean Wind SpeedKm/h'].ix[:'2015-1-1'],
'Time':weather_df['Days'].ix[:'2015-1-1']
})
start_test = datetime.datetime(2012,1,1)
ts_train = ts[ts.index < start_test]
ts_test = ts
data_train = np.array(ts_train.Humidity, ts_test.Time)[np.newaxis]
data_target = np.array(ts_train.Temperature)[np.newaxis].ravel()
model.fit(data_train.T, data_target.T)
data_test = np.array(ts_test.Humidity, ts_test.Time)[np.newaxis]
pred = model.predict(data_test.T)
ts_test['Pred'] = pred
```

Is there a regression model I could/should use for this problem, and if so what would be appropriate options and parameters?

(Also, my treatment of the time objects in sklearn is far from elegant, so I am gladly taking advice there.)