{ "cells": [ { "cell_type": "code", "execution_count": 1, "id": "52af59bb-083c-46c6-989a-bd4c65137a1a", "metadata": { "papermill": { "duration": null, "end_time": null, "exception": null, "start_time": null, "status": "completed" }, "tags": [] }, "outputs": [], "source": [ "# Import to be able to import python package from src\n", "import sys\n", "sys.path.insert(0, '../src')" ] }, { "cell_type": "code", "execution_count": 2, "id": "d6fc731f-3f50-4e9a-a24c-b2ab01d4fa31", "metadata": { "papermill": { "duration": null, "end_time": null, "exception": null, "start_time": null, "status": "completed" }, "tags": [] }, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "The `LightGBM` module could not be imported. To enable LightGBM support in Darts, follow the detailed instructions in the installation guide: https://github.com/unit8co/darts/blob/master/INSTALL.md\n", "The `Prophet` module could not be imported. To enable Prophet support in Darts, follow the detailed instructions in the installation guide: https://github.com/unit8co/darts/blob/master/INSTALL.md\n" ] } ], "source": [ "import pandas as pd\n", "import numpy as np\n", "import ontime as on" ] }, { "cell_type": "markdown", "id": "670316b8-460c-4009-a5da-94278f4ac9a9", "metadata": { "papermill": { "duration": null, "end_time": null, "exception": null, "start_time": null, "status": "completed" }, "tags": [] }, "source": [ "# Time Series - Data Loading Helpers\n", "\n", "TimeSeries can load data from different sources. Below are some examples of how to proceed.\n", "\n", "## From Darts to onTime\n", "\n", "You can convert a TimeSeries from Darts' TimeSeries object type to onTime's TimeSeries type. For example, let's load one of Darts' dataset." ] }, { "cell_type": "code", "execution_count": 3, "id": "fe6b4821-9365-4356-b53d-cd5f1fb222ef", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<TimeSeries (DataArray) (Month: 3, component: 1, sample: 1)>\n",
       "array([[[112.]],\n",
       "\n",
       "       [[118.]],\n",
       "\n",
       "       [[132.]]])\n",
       "Coordinates:\n",
       "  * Month      (Month) datetime64[ns] 1949-01-01 1949-02-01 1949-03-01\n",
       "  * component  (component) object '#Passengers'\n",
       "Dimensions without coordinates: sample\n",
       "Attributes:\n",
       "    static_covariates:  None\n",
       "    hierarchy:          None
" ], "text/plain": [ "\n", "array([[[112.]],\n", "\n", " [[118.]],\n", "\n", " [[132.]]])\n", "Coordinates:\n", " * Month (Month) datetime64[ns] 1949-01-01 1949-02-01 1949-03-01\n", " * component (component) object '#Passengers'\n", "Dimensions without coordinates: sample\n", "Attributes:\n", " static_covariates: None\n", " hierarchy: None" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from darts import datasets as dd\n", "\n", "dartsAirPassengers = dd.AirPassengersDataset().load()\n", "print(type(dartsAirPassengers))\n", "dartsAirPassengers.head(3)" ] }, { "cell_type": "markdown", "id": "da53e060-a779-4300-bca3-67cd7e1efa95", "metadata": {}, "source": [ "And use the `.from_darts()` function" ] }, { "cell_type": "code", "execution_count": 4, "id": "dd88e23a", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<TimeSeries (DataArray) (Month: 3, component: 1, sample: 1)>\n",
       "array([[[112.]],\n",
       "\n",
       "       [[118.]],\n",
       "\n",
       "       [[132.]]])\n",
       "Coordinates:\n",
       "  * Month      (Month) datetime64[ns] 1949-01-01 1949-02-01 1949-03-01\n",
       "  * component  (component) object '#Passengers'\n",
       "Dimensions without coordinates: sample\n",
       "Attributes:\n",
       "    static_covariates:  None\n",
       "    hierarchy:          None
" ], "text/plain": [ "\n", "array([[[112.]],\n", "\n", " [[118.]],\n", "\n", " [[132.]]])\n", "Coordinates:\n", " * Month (Month) datetime64[ns] 1949-01-01 1949-02-01 1949-03-01\n", " * component (component) object '#Passengers'\n", "Dimensions without coordinates: sample\n", "Attributes:\n", " static_covariates: None\n", " hierarchy: None" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ontimeAirPassengers = on.TimeSeries.from_darts(dartsAirPassengers)\n", "print(type(ontimeAirPassengers))\n", "ontimeAirPassengers.head(3)" ] }, { "cell_type": "markdown", "id": "15ac77fc-681f-4087-a816-aa17841b3273", "metadata": {}, "source": [ "## From Pandas to onTime\n", "\n", "You can also convert a Pandas DataFrame, for instance coming from OpenMl, to onTime's TimeSeries format provided the DataFrame's index is time-based. Let's load such a dataset as an example." ] }, { "cell_type": "code", "execution_count": 5, "id": "3de25cf0-c333-48e1-b34b-c82724fc5ec9", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
OpenHighLowCloseAdj_CloseVolume
Date
1980-03-170.03.3020833.1250003.1458333.145833219600
1980-03-180.03.1250002.9375003.0312503.031250727200
1980-03-190.03.0833333.0208333.0416673.041667295200
1980-03-200.03.0625003.0104173.0104173.010417159600
1980-03-210.03.0208332.9062502.9166672.916667130800
\n", "
" ], "text/plain": [ " Open High Low Close Adj_Close Volume\n", "Date \n", "1980-03-17 0.0 3.302083 3.125000 3.145833 3.145833 219600\n", "1980-03-18 0.0 3.125000 2.937500 3.031250 3.031250 727200\n", "1980-03-19 0.0 3.083333 3.020833 3.041667 3.041667 295200\n", "1980-03-20 0.0 3.062500 3.010417 3.010417 3.010417 159600\n", "1980-03-21 0.0 3.020833 2.906250 2.916667 2.916667 130800" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from sklearn.datasets import fetch_openml\n", "\n", "pandasAmd = fetch_openml(\"AMD-Stock-Prices-Historical-Data\", version=1, as_frame=True, parser=\"pandas\").frame\n", "print(type(pandasAmd))\n", "# set index to be compliant with TimeEval's canonical format\n", "pandasAmd.index= pandasAmd['Date'].astype('datetime64[ns]')\n", "del pandasAmd['Date']\n", "pandasAmd.head()" ] }, { "cell_type": "markdown", "id": "f56c1f8e-b5f7-4cb3-9856-545b8311ad4a", "metadata": {}, "source": [ "And use the `.from_pandas()` function" ] }, { "cell_type": "code", "execution_count": 6, "id": "ba52b0d5-804e-4839-a2a6-c4060e8c5f72", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<TimeSeries (DataArray) (Date: 3, component: 6, sample: 1)>\n",
       "array([[[0.00000000e+00],\n",
       "        [3.30208302e+00],\n",
       "        [3.12500000e+00],\n",
       "        [3.14583302e+00],\n",
       "        [3.14583302e+00],\n",
       "        [2.19600000e+05]],\n",
       "\n",
       "       [[0.00000000e+00],\n",
       "        [3.12500000e+00],\n",
       "        [2.93750000e+00],\n",
       "        [3.03125000e+00],\n",
       "        [3.03125000e+00],\n",
       "        [7.27200000e+05]],\n",
       "\n",
       "       [[0.00000000e+00],\n",
       "        [3.08333302e+00],\n",
       "        [3.02083302e+00],\n",
       "        [3.04166698e+00],\n",
       "        [3.04166698e+00],\n",
       "        [2.95200000e+05]]])\n",
       "Coordinates:\n",
       "  * Date       (Date) datetime64[ns] 1980-03-17 1980-03-18 1980-03-19\n",
       "  * component  (component) object 'Open' 'High' 'Low' ... 'Adj_Close' 'Volume'\n",
       "Dimensions without coordinates: sample\n",
       "Attributes:\n",
       "    static_covariates:  None\n",
       "    hierarchy:          None
" ], "text/plain": [ "\n", "array([[[0.00000000e+00],\n", " [3.30208302e+00],\n", " [3.12500000e+00],\n", " [3.14583302e+00],\n", " [3.14583302e+00],\n", " [2.19600000e+05]],\n", "\n", " [[0.00000000e+00],\n", " [3.12500000e+00],\n", " [2.93750000e+00],\n", " [3.03125000e+00],\n", " [3.03125000e+00],\n", " [7.27200000e+05]],\n", "\n", " [[0.00000000e+00],\n", " [3.08333302e+00],\n", " [3.02083302e+00],\n", " [3.04166698e+00],\n", " [3.04166698e+00],\n", " [2.95200000e+05]]])\n", "Coordinates:\n", " * Date (Date) datetime64[ns] 1980-03-17 1980-03-18 1980-03-19\n", " * component (component) object 'Open' 'High' 'Low' ... 'Adj_Close' 'Volume'\n", "Dimensions without coordinates: sample\n", "Attributes:\n", " static_covariates: None\n", " hierarchy: None" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "ontimeAmd = on.TimeSeries.from_pandas(pandasAmd, freq='D')\n", "ontimeAmd.head(3)" ] }, { "cell_type": "markdown", "id": "0253d7eb-1227-41ed-9a03-f5f331a5dfed", "metadata": {}, "source": [ "## From .csv files to onTime\n", "\n", "You can convert data from a .csv file to onTime's TimeSeries format, provided the index is time-based and correctly formatted. As an example, let's quickly generate and save a time series in .csv." ] }, { "cell_type": "code", "execution_count": 7, "id": "1217800d-503b-4939-adc0-3f7cd96c7a49", "metadata": {}, "outputs": [], "source": [ "# let's generate some sample data\n", "ts = on.generators.random_walk().generate(start=pd.Timestamp('2022-01-01'), end=pd.Timestamp('2022-12-31'))\n", "ts.pd_dataframe().to_csv('sample_series.csv')" ] }, { "cell_type": "markdown", "id": "31ec5ffa-dcdf-4f33-830b-7d6a571eeea1", "metadata": {}, "source": [ "The data now exist on our file system and we can load it as done below." ] }, { "cell_type": "code", "execution_count": 8, "id": "dd891e3e-3403-436f-81fa-92fc6bc33fee", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<TimeSeries (DataArray) (time: 3, component: 1, sample: 1)>\n",
       "array([[[0.44888864]],\n",
       "\n",
       "       [[1.71377365]],\n",
       "\n",
       "       [[2.81188623]]])\n",
       "Coordinates:\n",
       "  * time       (time) datetime64[ns] 2022-01-01 2022-01-02 2022-01-03\n",
       "  * component  (component) object 'random_walk'\n",
       "Dimensions without coordinates: sample\n",
       "Attributes:\n",
       "    static_covariates:  None\n",
       "    hierarchy:          None
" ], "text/plain": [ "\n", "array([[[0.44888864]],\n", "\n", " [[1.71377365]],\n", "\n", " [[2.81188623]]])\n", "Coordinates:\n", " * time (time) datetime64[ns] 2022-01-01 2022-01-02 2022-01-03\n", " * component (component) object 'random_walk'\n", "Dimensions without coordinates: sample\n", "Attributes:\n", " static_covariates: None\n", " hierarchy: None" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# load that sample data\n", "ts = on.TimeSeries.from_csv('sample_series.csv', index_col='time')\n", "ts.head(3)" ] }, { "cell_type": "markdown", "id": "f27bf901-d1b6-460e-b9cb-c7900144510d", "metadata": {}, "source": [ "## From regular data (e.g. Numpy) to onTime" ] }, { "cell_type": "markdown", "id": "c7b1d6b2-a146-4052-9613-ba9e6bca1645", "metadata": {}, "source": [ "Also, if your data is in Numpy or Python structure, you can assemble a TimeSeries if you have a data table and column names, provided the index is time-based and correctly formatted. Let's go through an example with generated data." ] }, { "cell_type": "code", "execution_count": 22, "id": "c9d57d37-6c6a-4d0f-9fc7-f34a95f0cd8f", "metadata": {}, "outputs": [], "source": [ "# generate some data\n", "from datetime import datetime\n", "\n", "data = np.random.rand(200,3)\n", "index = pd.date_range(datetime.today(), periods=200).tolist()\n", "columns = ['col1', 'col2', 'col3']" ] }, { "cell_type": "markdown", "id": "938044db-9c7e-4bf7-9e2e-19ae84ee5566", "metadata": {}, "source": [ "Now, we can import this time series in onTime." ] }, { "cell_type": "code", "execution_count": 23, "id": "5798d402-abd5-4eb1-b8ee-fdf3523703fc", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "\n", "
<TimeSeries (DataArray) (time: 3, component: 3, sample: 1)>\n",
       "array([[[0.00966967],\n",
       "        [0.27866962],\n",
       "        [0.56073953]],\n",
       "\n",
       "       [[0.0105793 ],\n",
       "        [0.80519664],\n",
       "        [0.74771133]],\n",
       "\n",
       "       [[0.40498576],\n",
       "        [0.76542589],\n",
       "        [0.82304112]]])\n",
       "Coordinates:\n",
       "  * time       (time) datetime64[ns] 2024-02-24T22:00:53.645121 ... 2024-02-2...\n",
       "  * component  (component) object 'col1' 'col2' 'col3'\n",
       "Dimensions without coordinates: sample\n",
       "Attributes:\n",
       "    static_covariates:  None\n",
       "    hierarchy:          None
" ], "text/plain": [ "\n", "array([[[0.00966967],\n", " [0.27866962],\n", " [0.56073953]],\n", "\n", " [[0.0105793 ],\n", " [0.80519664],\n", " [0.74771133]],\n", "\n", " [[0.40498576],\n", " [0.76542589],\n", " [0.82304112]]])\n", "Coordinates:\n", " * time (time) datetime64[ns] 2024-02-24T22:00:53.645121 ... 2024-02-2...\n", " * component (component) object 'col1' 'col2' 'col3'\n", "Dimensions without coordinates: sample\n", "Attributes:\n", " static_covariates: None\n", " hierarchy: None" ] }, "execution_count": 23, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# assemble into a timeseries\n", "ts = on.TimeSeries.from_data(data, index = index, columns = columns)\n", "ts.head(3)" ] }, { "cell_type": "code", "execution_count": null, "id": "56e89083-7642-4ef6-ac06-1cbe4e1de7d0", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.10" }, "papermill": { "default_parameters": {}, "duration": 60.248854, "end_time": "2024-01-31T17:51:31.161244", "environment_variables": {}, "exception": null, "input_path": "docs/user_guide/0_core/0.1-time-series-custom-class.ipynb", "output_path": "docs/user_guide/0_core/0.1-time-series-custom-class.ipynb", "parameters": {}, "start_time": "2024-01-31T17:50:30.912390", "version": "2.5.0" } }, "nbformat": 4, "nbformat_minor": 5 }