[1]:

# Import to be able to import python package from src
import sys
sys.path.insert(0, '../src')

[2]:

import pandas as pd
import numpy as np
import ontime as on

Time Series - Data Loading Helpers#

TimeSeries can load data from different sources. Below are some examples of how to proceed.

From Darts to onTime#

You can convert a TimeSeries from Darts’ TimeSeries object type to onTime’s TimeSeries type. For example, let’s load one of Darts’ dataset.

[3]:

from darts import datasets as dd

dartsAirPassengers = dd.AirPassengersDataset().load()
print(type(dartsAirPassengers))
dartsAirPassengers.head(3)

<class 'darts.timeseries.TimeSeries'>

[3]:

<TimeSeries (DataArray) (Month: 3, component: 1, sample: 1)> Size: 24B
array([[[112.]],

       [[118.]],

       [[132.]]])
Coordinates:
  * Month      (Month) datetime64[ns] 24B 1949-01-01 1949-02-01 1949-03-01
  * component  (component) object 8B '#Passengers'
Dimensions without coordinates: sample
Attributes:
    static_covariates:  None
    hierarchy:          None

TimeSeries (DataArray)

Month: 3
component: 1
sample: 1

112.0 118.0 132.0

array([[[112.]],

       [[118.]],

       [[132.]]])

Coordinates: (2)

Month

(Month)

datetime64[ns]

1949-01-01 1949-02-01 1949-03-01

array(['1949-01-01T00:00:00.000000000', '1949-02-01T00:00:00.000000000',
       '1949-03-01T00:00:00.000000000'], dtype='datetime64[ns]')

component
(component)
object
'#Passengers'
```
array(['#Passengers'], dtype=object)
```

Indexes: (2)

Month

PandasIndex

PandasIndex(DatetimeIndex(['1949-01-01', '1949-02-01', '1949-03-01'], dtype='datetime64[ns]', name='Month', freq='MS'))

component

PandasIndex

PandasIndex(Index(['#Passengers'], dtype='object', name='component'))

Attributes: (2)
static_covariates :
None
hierarchy :
None

And use the .from_darts() function

[4]:

ontimeAirPassengers = on.TimeSeries.from_darts(dartsAirPassengers)
print(type(ontimeAirPassengers))
ontimeAirPassengers.head(3)

<class 'ontime.core.time_series.time_series.TimeSeries'>

[4]:

<TimeSeries (DataArray) (Month: 3, component: 1, sample: 1)> Size: 24B
array([[[112.]],

       [[118.]],

       [[132.]]])
Coordinates:
  * Month      (Month) datetime64[ns] 24B 1949-01-01 1949-02-01 1949-03-01
  * component  (component) object 8B '#Passengers'
Dimensions without coordinates: sample
Attributes:
    static_covariates:  None
    hierarchy:          None

TimeSeries (DataArray)

Month: 3
component: 1
sample: 1

112.0 118.0 132.0

array([[[112.]],

       [[118.]],

       [[132.]]])

Coordinates: (2)

Month

(Month)

datetime64[ns]

1949-01-01 1949-02-01 1949-03-01

array(['1949-01-01T00:00:00.000000000', '1949-02-01T00:00:00.000000000',
       '1949-03-01T00:00:00.000000000'], dtype='datetime64[ns]')

component
(component)
object
'#Passengers'
```
array(['#Passengers'], dtype=object)
```

Indexes: (2)

Month

PandasIndex

PandasIndex(DatetimeIndex(['1949-01-01', '1949-02-01', '1949-03-01'], dtype='datetime64[ns]', name='Month', freq='MS'))

component

PandasIndex

PandasIndex(Index(['#Passengers'], dtype='object', name='component'))

Attributes: (2)
static_covariates :
None
hierarchy :
None

From Pandas to onTime#

You can also convert a Pandas DataFrame, for instance coming from OpenMl, to onTime’s TimeSeries format provided the DataFrame’s index is time-based. Let’s load such a dataset as an example.

[5]:

from sklearn.datasets import fetch_openml

pandasAmd = fetch_openml("AMD-Stock-Prices-Historical-Data", version=1, as_frame=True, parser="pandas").frame
print(type(pandasAmd))
# set index to be compliant with TimeEval's canonical format
pandasAmd.index= pandasAmd['Date'].astype('datetime64[ns]')
del pandasAmd['Date']
pandasAmd.head()

<class 'pandas.core.frame.DataFrame'>

[5]:

	Open	High	Low	Close	Adj_Close	Volume
Date
1980-03-17	0.0	3.302083	3.125000	3.145833	3.145833	219600
1980-03-18	0.0	3.125000	2.937500	3.031250	3.031250	727200
1980-03-19	0.0	3.083333	3.020833	3.041667	3.041667	295200
1980-03-20	0.0	3.062500	3.010417	3.010417	3.010417	159600
1980-03-21	0.0	3.020833	2.906250	2.916667	2.916667	130800

And use the .from_pandas() function

[6]:

ontimeAmd = on.TimeSeries.from_pandas(pandasAmd, freq='D')
ontimeAmd.head(3)

[6]:

<TimeSeries (DataArray) (Date: 3, component: 6, sample: 1)> Size: 144B
array([[[0.00000000e+00],
        [3.30208302e+00],
        [3.12500000e+00],
        [3.14583302e+00],
        [3.14583302e+00],
        [2.19600000e+05]],

       [[0.00000000e+00],
        [3.12500000e+00],
        [2.93750000e+00],
        [3.03125000e+00],
        [3.03125000e+00],
        [7.27200000e+05]],

       [[0.00000000e+00],
        [3.08333302e+00],
        [3.02083302e+00],
        [3.04166698e+00],
        [3.04166698e+00],
        [2.95200000e+05]]])
Coordinates:
  * Date       (Date) datetime64[ns] 24B 1980-03-17 1980-03-18 1980-03-19
  * component  (component) object 48B 'Open' 'High' ... 'Adj_Close' 'Volume'
Dimensions without coordinates: sample
Attributes:
    static_covariates:  None
    hierarchy:          None

TimeSeries (DataArray)

Date: 3
component: 6
sample: 1

0.0 3.302 3.125 3.146 3.146 ... 3.083 3.021 3.042 3.042 2.952e+05

array([[[0.00000000e+00],
        [3.30208302e+00],
        [3.12500000e+00],
        [3.14583302e+00],
        [3.14583302e+00],
        [2.19600000e+05]],

       [[0.00000000e+00],
        [3.12500000e+00],
        [2.93750000e+00],
        [3.03125000e+00],
        [3.03125000e+00],
        [7.27200000e+05]],

       [[0.00000000e+00],
        [3.08333302e+00],
        [3.02083302e+00],
        [3.04166698e+00],
        [3.04166698e+00],
        [2.95200000e+05]]])

Coordinates: (2)

Date

(Date)

datetime64[ns]

1980-03-17 1980-03-18 1980-03-19

array(['1980-03-17T00:00:00.000000000', '1980-03-18T00:00:00.000000000',
       '1980-03-19T00:00:00.000000000'], dtype='datetime64[ns]')

component

(component)

object

'Open' 'High' ... 'Volume'

array(['Open', 'High', 'Low', 'Close', 'Adj_Close', 'Volume'], dtype=object)

Indexes: (2)

Date

PandasIndex

PandasIndex(DatetimeIndex(['1980-03-17', '1980-03-18', '1980-03-19'], dtype='datetime64[ns]', name='Date', freq='D'))

component

PandasIndex

PandasIndex(Index(['Open', 'High', 'Low', 'Close', 'Adj_Close', 'Volume'], dtype='object', name='component'))

Attributes: (2)
static_covariates :
None
hierarchy :
None

From .csv files to onTime#

You can convert data from a .csv file to onTime’s TimeSeries format, provided the index is time-based and correctly formatted. As an example, let’s quickly generate and save a time series in .csv.

[7]:

# let's generate some sample data
ts = on.generators.random_walk().generate(start=pd.Timestamp('2022-01-01'), end=pd.Timestamp('2022-12-31'))
ts.pd_dataframe().to_csv('sample_series.csv')

The data now exist on our file system and we can load it as done below.

[8]:

# load that sample data
ts = on.TimeSeries.from_csv('sample_series.csv', index_col='time')
ts.head(3)

[8]:

<TimeSeries (DataArray) (time: 3, component: 1, sample: 1)> Size: 24B
array([[[ 0.87897332]],

       [[ 1.656084  ]],

       [[-0.3796217 ]]])
Coordinates:
  * time       (time) datetime64[ns] 24B 2022-01-01 2022-01-02 2022-01-03
  * component  (component) object 8B 'random_walk'
Dimensions without coordinates: sample
Attributes:
    static_covariates:  None
    hierarchy:          None

TimeSeries (DataArray)

time: 3
component: 1
sample: 1

0.879 1.656 -0.3796

array([[[ 0.87897332]],

       [[ 1.656084  ]],

       [[-0.3796217 ]]])

Coordinates: (2)

time

(time)

datetime64[ns]

2022-01-01 2022-01-02 2022-01-03

array(['2022-01-01T00:00:00.000000000', '2022-01-02T00:00:00.000000000',
       '2022-01-03T00:00:00.000000000'], dtype='datetime64[ns]')

component
(component)
object
'random_walk'
```
array(['random_walk'], dtype=object)
```

Indexes: (2)

time

PandasIndex

PandasIndex(DatetimeIndex(['2022-01-01', '2022-01-02', '2022-01-03'], dtype='datetime64[ns]', name='time', freq='D'))

component

PandasIndex

PandasIndex(Index(['random_walk'], dtype='object', name='component'))

Attributes: (2)
static_covariates :
None
hierarchy :
None

From regular data (e.g. Numpy) to onTime#

Also, if your data is in Numpy or Python structure, you can assemble a TimeSeries if you have a data table and column names, provided the index is time-based and correctly formatted. Let’s go through an example with generated data.

[9]:

# generate some data
from datetime import datetime

data = np.random.rand(200,3)
index = pd.date_range(datetime.today(), periods=200).tolist()
columns = ['col1', 'col2', 'col3']

Now, we can import this time series in onTime.

[10]:

# assemble into a timeseries
ts = on.TimeSeries.from_data(data, index = index, columns = columns)
ts.head(3)

[10]:

<TimeSeries (DataArray) (time: 3, component: 3, sample: 1)> Size: 72B
array([[[0.41514682],
        [0.92232062],
        [0.64871718]],

       [[0.06535843],
        [0.6104363 ],
        [0.41561002]],

       [[0.04148376],
        [0.59704985],
        [0.23086672]]])
Coordinates:
  * time       (time) datetime64[ns] 24B 2025-01-06T16:19:02.138076 ... 2025-...
  * component  (component) object 24B 'col1' 'col2' 'col3'
Dimensions without coordinates: sample
Attributes:
    static_covariates:  None
    hierarchy:          None

TimeSeries (DataArray)

time: 3
component: 3
sample: 1

0.4151 0.9223 0.6487 0.06536 0.6104 0.4156 0.04148 0.597 0.2309

array([[[0.41514682],
        [0.92232062],
        [0.64871718]],

       [[0.06535843],
        [0.6104363 ],
        [0.41561002]],

       [[0.04148376],
        [0.59704985],
        [0.23086672]]])

Coordinates: (2)

time

(time)

datetime64[ns]

2025-01-06T16:19:02.138076 ... 2...

array(['2025-01-06T16:19:02.138076000', '2025-01-07T16:19:02.138076000',
       '2025-01-08T16:19:02.138076000'], dtype='datetime64[ns]')

component
(component)
object
'col1' 'col2' 'col3'
```
array(['col1', 'col2', 'col3'], dtype=object)
```

Indexes: (2)

time

PandasIndex

PandasIndex(DatetimeIndex(['2025-01-06 16:19:02.138076', '2025-01-07 16:19:02.138076',
               '2025-01-08 16:19:02.138076'],
              dtype='datetime64[ns]', name='time', freq='D'))

component

PandasIndex

PandasIndex(Index(['col1', 'col2', 'col3'], dtype='object', name='component'))

Attributes: (2)
static_covariates :
None
hierarchy :
None