[1]:
# Import to be able to import python package from src
import sys
sys.path.insert(0, '../src')
[2]:
import pandas as pd
import numpy as np
import ontime as on
The `LightGBM` module could not be imported. To enable LightGBM support in Darts, follow the detailed instructions in the installation guide: https://github.com/unit8co/darts/blob/master/INSTALL.md
The `Prophet` module could not be imported. To enable Prophet support in Darts, follow the detailed instructions in the installation guide: https://github.com/unit8co/darts/blob/master/INSTALL.md
Time Series - Data Loading Helpers#
TimeSeries can load data from different sources. Below are some examples of how to proceed.
From Darts to onTime#
You can convert a TimeSeries from Darts’ TimeSeries object type to onTime’s TimeSeries type. For example, let’s load one of Darts’ dataset.
[3]:
from darts import datasets as dd
dartsAirPassengers = dd.AirPassengersDataset().load()
print(type(dartsAirPassengers))
dartsAirPassengers.head(3)
<class 'darts.timeseries.TimeSeries'>
[3]:
<TimeSeries (DataArray) (Month: 3, component: 1, sample: 1)> array([[[112.]], [[118.]], [[132.]]]) Coordinates: * Month (Month) datetime64[ns] 1949-01-01 1949-02-01 1949-03-01 * component (component) object '#Passengers' Dimensions without coordinates: sample Attributes: static_covariates: None hierarchy: None
And use the .from_darts()
function
[4]:
ontimeAirPassengers = on.TimeSeries.from_darts(dartsAirPassengers)
print(type(ontimeAirPassengers))
ontimeAirPassengers.head(3)
<class 'ontime.core.time_series.time_series.TimeSeries'>
[4]:
<TimeSeries (DataArray) (Month: 3, component: 1, sample: 1)> array([[[112.]], [[118.]], [[132.]]]) Coordinates: * Month (Month) datetime64[ns] 1949-01-01 1949-02-01 1949-03-01 * component (component) object '#Passengers' Dimensions without coordinates: sample Attributes: static_covariates: None hierarchy: None
From Pandas to onTime#
You can also convert a Pandas DataFrame, for instance coming from OpenMl, to onTime’s TimeSeries format provided the DataFrame’s index is time-based. Let’s load such a dataset as an example.
[5]:
from sklearn.datasets import fetch_openml
pandasAmd = fetch_openml("AMD-Stock-Prices-Historical-Data", version=1, as_frame=True, parser="pandas").frame
print(type(pandasAmd))
# set index to be compliant with TimeEval's canonical format
pandasAmd.index= pandasAmd['Date'].astype('datetime64[ns]')
del pandasAmd['Date']
pandasAmd.head()
<class 'pandas.core.frame.DataFrame'>
[5]:
Open | High | Low | Close | Adj_Close | Volume | |
---|---|---|---|---|---|---|
Date | ||||||
1980-03-17 | 0.0 | 3.302083 | 3.125000 | 3.145833 | 3.145833 | 219600 |
1980-03-18 | 0.0 | 3.125000 | 2.937500 | 3.031250 | 3.031250 | 727200 |
1980-03-19 | 0.0 | 3.083333 | 3.020833 | 3.041667 | 3.041667 | 295200 |
1980-03-20 | 0.0 | 3.062500 | 3.010417 | 3.010417 | 3.010417 | 159600 |
1980-03-21 | 0.0 | 3.020833 | 2.906250 | 2.916667 | 2.916667 | 130800 |
And use the .from_pandas()
function
[6]:
ontimeAmd = on.TimeSeries.from_pandas(pandasAmd, freq='D')
ontimeAmd.head(3)
[6]:
<TimeSeries (DataArray) (Date: 3, component: 6, sample: 1)> array([[[0.00000000e+00], [3.30208302e+00], [3.12500000e+00], [3.14583302e+00], [3.14583302e+00], [2.19600000e+05]], [[0.00000000e+00], [3.12500000e+00], [2.93750000e+00], [3.03125000e+00], [3.03125000e+00], [7.27200000e+05]], [[0.00000000e+00], [3.08333302e+00], [3.02083302e+00], [3.04166698e+00], [3.04166698e+00], [2.95200000e+05]]]) Coordinates: * Date (Date) datetime64[ns] 1980-03-17 1980-03-18 1980-03-19 * component (component) object 'Open' 'High' 'Low' ... 'Adj_Close' 'Volume' Dimensions without coordinates: sample Attributes: static_covariates: None hierarchy: None
From .csv files to onTime#
You can convert data from a .csv file to onTime’s TimeSeries format, provided the index is time-based and correctly formatted. As an example, let’s quickly generate and save a time series in .csv.
[7]:
# let's generate some sample data
ts = on.generators.random_walk().generate(start=pd.Timestamp('2022-01-01'), end=pd.Timestamp('2022-12-31'))
ts.pd_dataframe().to_csv('sample_series.csv')
The data now exist on our file system and we can load it as done below.
[8]:
# load that sample data
ts = on.TimeSeries.from_csv('sample_series.csv', index_col='time')
ts.head(3)
[8]:
<TimeSeries (DataArray) (time: 3, component: 1, sample: 1)> array([[[0.44888864]], [[1.71377365]], [[2.81188623]]]) Coordinates: * time (time) datetime64[ns] 2022-01-01 2022-01-02 2022-01-03 * component (component) object 'random_walk' Dimensions without coordinates: sample Attributes: static_covariates: None hierarchy: None
From regular data (e.g. Numpy) to onTime#
Also, if your data is in Numpy or Python structure, you can assemble a TimeSeries if you have a data table and column names, provided the index is time-based and correctly formatted. Let’s go through an example with generated data.
[22]:
# generate some data
from datetime import datetime
data = np.random.rand(200,3)
index = pd.date_range(datetime.today(), periods=200).tolist()
columns = ['col1', 'col2', 'col3']
Now, we can import this time series in onTime.
[23]:
# assemble into a timeseries
ts = on.TimeSeries.from_data(data, index = index, columns = columns)
ts.head(3)
[23]:
<TimeSeries (DataArray) (time: 3, component: 3, sample: 1)> array([[[0.00966967], [0.27866962], [0.56073953]], [[0.0105793 ], [0.80519664], [0.74771133]], [[0.40498576], [0.76542589], [0.82304112]]]) Coordinates: * time (time) datetime64[ns] 2024-02-24T22:00:53.645121 ... 2024-02-2... * component (component) object 'col1' 'col2' 'col3' Dimensions without coordinates: sample Attributes: static_covariates: None hierarchy: None
[ ]: