Time Series Forecasting on AWS

--- tags: AWS Network Firewall, Networking, AWS, Workshop --- # Time series forecasting with AWS services This is our list of hints and tips for this workshop. **JOIN EVENT:** https://catalog.workshops.aws/join 1. Complete OTP to an email address (Any that you can access over public internet) 2. **Code will be on the TV** screen for you to type. 3. Read "Terms and Conditions", select "**I agree with the Terms and Conditions**" and click "**Join event**". 4. On the next page, click "**Open AWS Console**" URL link. ![](https://hackmd.io/_uploads/SycdQF8P3.png) 3. **Skip the entire sections**: "Introduction" & "Setup" *this is already done!* 4. Go straight to the first lab [**Overview of the Environment**](https://catalog.us-east-1.prod.workshops.aws/workshops/caef4710-3721-4957-a2ce-33799920ef72/en-US/20-environmentsetup/23-overviewoftheenvironment) :::info :warning: **For this workshop resources are deployed in us-west-2** ::: ## Forecast Messaging For those seeking a UI-centric forecasting experience, the Canvas UI, which comes integrated with the Canvas (Autopilot) APIs, is highly recommended. It streamlines data integration, model training, and prediction processes, establishing itself as an essential tool for data-driven decision-making. Building forecasting models with the Canvas APIs offer the following key advantages over the Amazon Forecast service APIs: 1. **Better Performance**: * On average, building ML models with the Autopilot API is up to 50% faster than with the Amazon Forecast API. * Inference on these models is up to 45% faster. 1. **Lower Cost**: * Deploying the AP-generated model on a SageMaker inference instance for predictions is more cost-effective. In fact, using SageMaker Inference can be over 90% cheaper than using Amazon Forecast. 1. **Enhanced Control**: * When building a model with AP APIs: **a/** You gain visibility into the algorithms selected in the ensemble. **b/** You have the option to deploy either the ensemble model or a model created by an individual algorithm. **c/** You can choose any desired inference instance configuration (recommended instances are detailed in the document) to suit your use case. 1. **Wider Availability**: * Compared to Amazon Forecast, AP APIs are available in 8 additional regions: Asia Pacific (Hong Kong); Canada (Central); Europe (London); Europe (Paris); Europe (Stockholm); Middle East (Bahrain); South America (Sao Paulo); and US West (N. California). The AP API will be accessible in all regions where Autopilot currently operates or will operate in the future. # Key Concepts There are 3 key services for time series forecasting. 1. **No-code approach with SageMaker Canvas** 1. **Programmatic approach with SageMaker Autopilot**: 1. **Build custom model with SageMaker** ## Workshop briefing In this workshop, the hands on labs will cover different ways of forecasting with AWS Services such as Amazon Forecast, Amazon Canvas, and Amazon sageMaker. The session provides prescriptive guidance around choosing the right AWS service for your use case. * 3 labs, one for each of the 3 approaches discussed. * All workshops goes through the ML lifecycle of processing, training and interference. * Model training is time consuming. So please switch to next lab after starting the training process as per workshop instructions. * SageMaker Studio Jupyter Notebook kernel may take couple of minutes to initialize. * It is suggested to execute SageMaker Studio Notebooks by each cell to understand the flow. # Time-Series Forecasting with Amazon SageMaker Autopilot Import the SageMaker Python library and start a session. ```python= import sagemaker # This is the client we will use to interact with SageMaker Autopilot sm = boto3.Session().client(service_name="sagemaker", region_name=region) ``` [**automl_problem_type_config**](https://docs.aws.amazon.com/sagemaker/latest/APIReference/API_TimeSeriesForecastingJobConfig.html) The collection of settings used by an AutoML job V2 for the time-series forecasting problem type. ```python= automl_problem_type_config ={ 'TimeSeriesForecastingJobConfig': { 'ForecastFrequency': 'M', 'ForecastHorizon': 2, 'ForecastQuantiles': ['p50','p60','p70','p80','p90'], 'Transformations': { 'Filling': { 'demand': { 'middlefill' : 'zero', 'backfill' : 'zero' }, 'price': { 'middlefill' : 'zero', 'backfill' : 'zero', 'futurefill' : 'zero' } } }, 'TimeSeriesConfig': { 'TargetAttributeName': 'demand', 'TimestampAttributeName': 'ts', 'ItemIdentifierAttributeName': 'item_id', 'GroupingAttributeNames': [ 'store_id' ] } } } ``` [**Creates an Autopilot job also referred to as Autopilot experiment or AutoML job V2**](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker/client/create_auto_ml_job_v2.html#) You can find the best-performing model after you run an AutoML job V2 by calling DescribeAutoMLJobV2. ```python= sm.create_auto_ml_job_v2( AutoMLJobName=auto_ml_job_name, AutoMLJobInputDataConfig=input_data_config, OutputDataConfig=output_data_config, AutoMLProblemTypeConfig = automl_problem_type_config, AutoMLJobObjective=optimizaton_metric_config, RoleArn=role ) ``` [**create_model**](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker/client/create_model.html#create-model) Creates a model in SageMaker. In the request, you name the model and describe a primary container. For the primary container, you specify the Docker image that contains inference code, artifacts (from prior training), and a custom environment map that the inference code uses when you deploy the model for predictions. Use this API to create a model if you want to use SageMaker hosting services or run a batch transform job. ```python= best_candidate = sm.describe_auto_ml_job_v2(AutoMLJobName=auto_ml_job_name)['BestCandidate'] best_candidate_containers = best_candidate['InferenceContainers'] best_candidate_name = best_candidate['CandidateName'] reponse = sm.create_model( ModelName = best_candidate_name, ExecutionRoleArn = role, Containers = best_candidate_containers ) print('BestCandidateName:',best_candidate_name) print('BestCandidateContainers:',best_candidate_containers) ``` [**create_transform_job**](https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/sagemaker/client/create_transform_job.html#create-transform-job) Starts a transform job. A transform job uses a trained model to get inferences on a dataset and saves these results to an Amazon S3 location that you specify. To perform batch transformations, you create a transform job and use the data that you have readily available. ```python= response = sm.create_transform_job( TransformJobName=transform_job_name, ModelName=best_candidate_name, MaxPayloadInMB=0, ModelClientConfig={ 'InvocationsTimeoutInSeconds': 3600 }, TransformInput={ 'DataSource': { 'S3DataSource': { 'S3DataType': 'S3Prefix', 'S3Uri': 's3://{}/{}/batch_transform/input/'.format(bucket, prefix) } }, 'ContentType': 'text/csv', 'SplitType': 'None' }, TransformOutput={ 'S3OutputPath': 's3://{}/{}/batch_transform/output/'.format(bucket, prefix), 'AssembleWith': 'Line', }, TransformResources={ 'InstanceType': 'ml.m5.4xlarge', 'InstanceCount': 1 } ) ``` # Custom SageMaker/DeepAR This notebook complements the [DeepAR introduction notebook](https://github.com/awslabs/amazon-sagemaker-examples/blob/master/introduction_to_amazon_algorithms/deepar_synthetic/deepar_synthetic.ipynb). Here, we will consider a real use case and show how to use DeepAR on SageMaker for predicting energy consumption of 370 customers over time, based on a [dataset](https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014) that was used in the academic papers [[1](https://media.nips.cc/nipsbooks/nipspapers/paper_files/nips29/reviews/526.html)] and [[2](https://arxiv.org/abs/1704.04110)]. In particular, we will see how to: * Prepare the dataset * Use the SageMaker Python SDK to train a DeepAR model and deploy it * Make requests to the deployed model to obtain forecasts interactively * Illustrate advanced features of DeepAR: missing values, additional time features, non-regular frequencies and category information ### Import electricity dataset and upload it to S3 to make it available for Sagemaker We load and parse the dataset and convert it to a collection of Pandas time series, which makes common time series operations such as indexing by time periods or resampling much easier. The data is originally recorded in 15min interval, which we could use directly. Here we want to forecast longer periods (one week) and resample the data to a granularity of 2 hours. ### Train and Test splits Often times one is interested in evaluating the model or tuning its hyperparameters by looking at error metrics on a hold-out test set. Here we split the available data into train and test sets for evaluating the trained model. For standard machine learning tasks such as classification and regression, one typically obtains this split by randomly separating examples into train and test sets. However, in forecasting it is important to do this train/test split based on time rather than by time series. In this example, we will reserve the last section of each of the time series for evalutation purpose and use only the first part as training data. ```python-= start_dataset = pd.Timestamp("2014-01-01 00:00:00", freq=freq) end_training = pd.Timestamp("2014-09-01 00:00:00", freq=freq) ``` Error! ``` --------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[15], line 1 ----> 1 start_dataset = pd.Timestamp("2014-01-01 00:00:00", freq=freq) 2 end_training = pd.Timestamp("2014-09-01 00:00:00", freq=freq) File timestamps.pyx:1755, in pandas._libs.tslibs.timestamps.Timestamp.__new__() TypeError: __new__() got an unexpected keyword argument 'freq' ``` Update ```python= start_dataset = pd.Timestamp("2014-01-01 00:00:00") end_training = pd.Timestamp("2014-09-01 00:00:00") # Now, when you create a date range, you can specify the frequency date_range = pd.date_range(start=start_dataset, periods=(end_training - start_dataset).days * 12, freq=freq) print(date_range) ``` ### Train a model Here we define and then will launch the training job. (estimate 21 minutes) ### Create endpoint and predictor We have a trained model, we can use it to perform predictions by deploying it to an endpoint. ## Links [Speed up your time series forecasting by up to 50 percent with Amazon SageMaker Canvas UI and AutoML APIs](https://aws.amazon.com/blogs/machine-learning/speed-up-your-time-series-forecasting-by-up-to-50-percent-with-amazon-sagemaker-canvas-ui-and-automl-apis/) [Time-Series Forecasting with Amazon SageMaker Autopilot](https://github.com/aws/amazon-sagemaker-examples/blob/main/autopilot/autopilot_time_series.ipynb) ![image](https://hackmd.io/_uploads/SkH11RBpT.png) ## Follow up ### Working with Quantiles Time-series quantile selection example https://github.com/aws/amazon-sagemaker-examples/tree/main/sagemaker-datawrangler/timeseries-quantile-selection-dataflow Beyond forecasting: The delicate balance of serving customers and growing your business https://aws.amazon.com/blogs/machine-learning/beyond-forecasting-the-delicate-balance-of-serving-customers-and-growing-your-business/