[1]:
# Import to be able to import python package from src
import sys
sys.path.insert(0, '../src')
[2]:
import pandas as pd
import numpy as np
import ontime as on

Time Series - Data Loading Helpers#

TimeSeries can load data from different sources. Below are some examples of how to proceed.

From Darts to onTime#

You can convert a TimeSeries from Darts’ TimeSeries object type to onTime’s TimeSeries type. For example, let’s load one of Darts’ dataset.

[3]:
from darts import datasets as dd

dartsAirPassengers = dd.AirPassengersDataset().load()
print(type(dartsAirPassengers))
dartsAirPassengers.head(3)
<class 'darts.timeseries.TimeSeries'>
[3]:
<TimeSeries (DataArray) (Month: 3, component: 1, sample: 1)> Size: 24B
array([[[112.]],

       [[118.]],

       [[132.]]])
Coordinates:
  * Month      (Month) datetime64[ns] 24B 1949-01-01 1949-02-01 1949-03-01
  * component  (component) object 8B '#Passengers'
Dimensions without coordinates: sample
Attributes:
    static_covariates:  None
    hierarchy:          None

And use the .from_darts() function

[4]:
ontimeAirPassengers = on.TimeSeries.from_darts(dartsAirPassengers)
print(type(ontimeAirPassengers))
ontimeAirPassengers.head(3)
<class 'ontime.core.time_series.time_series.TimeSeries'>
[4]:
<TimeSeries (DataArray) (Month: 3, component: 1, sample: 1)> Size: 24B
array([[[112.]],

       [[118.]],

       [[132.]]])
Coordinates:
  * Month      (Month) datetime64[ns] 24B 1949-01-01 1949-02-01 1949-03-01
  * component  (component) object 8B '#Passengers'
Dimensions without coordinates: sample
Attributes:
    static_covariates:  None
    hierarchy:          None

From Pandas to onTime#

You can also convert a Pandas DataFrame, for instance coming from OpenMl, to onTime’s TimeSeries format provided the DataFrame’s index is time-based. Let’s load such a dataset as an example.

[5]:
from sklearn.datasets import fetch_openml

pandasAmd = fetch_openml("AMD-Stock-Prices-Historical-Data", version=1, as_frame=True, parser="pandas").frame
print(type(pandasAmd))
# set index to be compliant with TimeEval's canonical format
pandasAmd.index= pandasAmd['Date'].astype('datetime64[ns]')
del pandasAmd['Date']
pandasAmd.head()
<class 'pandas.core.frame.DataFrame'>
[5]:
Open High Low Close Adj_Close Volume
Date
1980-03-17 0.0 3.302083 3.125000 3.145833 3.145833 219600
1980-03-18 0.0 3.125000 2.937500 3.031250 3.031250 727200
1980-03-19 0.0 3.083333 3.020833 3.041667 3.041667 295200
1980-03-20 0.0 3.062500 3.010417 3.010417 3.010417 159600
1980-03-21 0.0 3.020833 2.906250 2.916667 2.916667 130800

And use the .from_pandas() function

[6]:
ontimeAmd = on.TimeSeries.from_pandas(pandasAmd, freq='D')
ontimeAmd.head(3)
[6]:
<TimeSeries (DataArray) (Date: 3, component: 6, sample: 1)> Size: 144B
array([[[0.00000000e+00],
        [3.30208302e+00],
        [3.12500000e+00],
        [3.14583302e+00],
        [3.14583302e+00],
        [2.19600000e+05]],

       [[0.00000000e+00],
        [3.12500000e+00],
        [2.93750000e+00],
        [3.03125000e+00],
        [3.03125000e+00],
        [7.27200000e+05]],

       [[0.00000000e+00],
        [3.08333302e+00],
        [3.02083302e+00],
        [3.04166698e+00],
        [3.04166698e+00],
        [2.95200000e+05]]])
Coordinates:
  * Date       (Date) datetime64[ns] 24B 1980-03-17 1980-03-18 1980-03-19
  * component  (component) object 48B 'Open' 'High' ... 'Adj_Close' 'Volume'
Dimensions without coordinates: sample
Attributes:
    static_covariates:  None
    hierarchy:          None

From .csv files to onTime#

You can convert data from a .csv file to onTime’s TimeSeries format, provided the index is time-based and correctly formatted. As an example, let’s quickly generate and save a time series in .csv.

[7]:
# let's generate some sample data
ts = on.generators.random_walk().generate(start=pd.Timestamp('2022-01-01'), end=pd.Timestamp('2022-12-31'))
ts.pd_dataframe().to_csv('sample_series.csv')

The data now exist on our file system and we can load it as done below.

[8]:
# load that sample data
ts = on.TimeSeries.from_csv('sample_series.csv', index_col='time')
ts.head(3)
[8]:
<TimeSeries (DataArray) (time: 3, component: 1, sample: 1)> Size: 24B
array([[[ 0.87897332]],

       [[ 1.656084  ]],

       [[-0.3796217 ]]])
Coordinates:
  * time       (time) datetime64[ns] 24B 2022-01-01 2022-01-02 2022-01-03
  * component  (component) object 8B 'random_walk'
Dimensions without coordinates: sample
Attributes:
    static_covariates:  None
    hierarchy:          None

From regular data (e.g. Numpy) to onTime#

Also, if your data is in Numpy or Python structure, you can assemble a TimeSeries if you have a data table and column names, provided the index is time-based and correctly formatted. Let’s go through an example with generated data.

[9]:
# generate some data
from datetime import datetime

data = np.random.rand(200,3)
index = pd.date_range(datetime.today(), periods=200).tolist()
columns = ['col1', 'col2', 'col3']

Now, we can import this time series in onTime.

[10]:
# assemble into a timeseries
ts = on.TimeSeries.from_data(data, index = index, columns = columns)
ts.head(3)
[10]:
<TimeSeries (DataArray) (time: 3, component: 3, sample: 1)> Size: 72B
array([[[0.41514682],
        [0.92232062],
        [0.64871718]],

       [[0.06535843],
        [0.6104363 ],
        [0.41561002]],

       [[0.04148376],
        [0.59704985],
        [0.23086672]]])
Coordinates:
  * time       (time) datetime64[ns] 24B 2025-01-06T16:19:02.138076 ... 2025-...
  * component  (component) object 24B 'col1' 'col2' 'col3'
Dimensions without coordinates: sample
Attributes:
    static_covariates:  None
    hierarchy:          None