Purpose: Provides functionality for working with algorithmic estimators in SageMaker. Enables users to use built-in algorithms for common ML tasks. Allows packaging custom algorithms for reuse. Supports algorithm validation and registration in the marketplace. Facilitates algorithm selection and tuning.
Key Submodules:
algorithm_estimator
- Classes for training with algorithm resourcesalgorithm_spec
- Algorithm specification utilitiesmetadata_properties
- Metadata processing componentsmetric_definitions
- ML metric definition helpersPurpose: Offers tools for analyzing training jobs, models, and model performance. Provides metrics collection and visualization capabilities. Enables tracking of model experiments and their results. Helps with monitoring training progress and resource utilization. Supports historical analysis of previous training jobs.
Key Submodules:
analyze_config
- Configuration for analysis jobsmetrics_fetcher
- Tools for retrieving metricstraining_job_analytics
- Training job performance analysisexperiment_analytics
- Experiment tracking and analysisPurpose: Implements bias detection and explainability tools for ML models. Helps identify bias in data and model predictions. Provides feature attribution methods to understand model decisions. Supports fairness metrics and reporting. Enables compliance with responsible AI practices.
Key Submodules:
bias
- Bias detection utilitiesexplainability
- Model explanation methodsfeature_attribution
- Feature importance analysisshap
- SHAP value calculation implementationsreport
- Report generation toolsPurpose: Enables real-time debugging of training jobs. Captures internal model states during training. Allows setting up rules to detect training issues like vanishing gradients. Supports custom debugging hooks for major ML frameworks. Provides visualization tools for understanding model behavior.
Key Submodules:
debugger
- Core debugging functionalityhook
- Framework hooks for data capturerule
- Built-in and custom rules for training issuesrule_configs
- Rule configuration helperstensorboard
- TensorBoard integrationPurpose: Provides core functionality for training models in SageMaker. Handles the entire training process from data preparation to model creation. Supports distributed training across multiple instances. Enables hyperparameter management and configuration. Provides framework-specific estimators for popular ML libraries.
Key Submodules:
estimator
- Base estimator classesframework
- Framework-specific estimatorstraining_config
- Training configuration utilitiesutils
- Utility functions for estimatorsPurpose: Manages feature data for ML training and inference. Provides a central repository for features across models. Supports online and offline feature stores for different access patterns. Enables feature sharing and reuse across teams. Helps with feature versioning and lineage tracking.
Key Submodules:
feature_group
- Feature grouping functionalityfeature_definition
- Feature definition utilitiesfeature_processor
- Feature processing componentsfeature_store_session
- Session management for feature storePurpose: Provides utilities for retrieving container image URIs. Simplifies the process of finding the right container for algorithms. Handles regional differences in container registries. Supports version management for container images. Enables custom container integration.
Key Submodules:
retrieve
- Core retrieval functionalityutils
- Helper functions for URI manipulationdeep_learning_containers
- DLC-specific URI helpersregions
- Region-specific URI mappingsPurpose: Manages input data channels for training jobs. Provides utilities for data location and format specification. Supports different input modes like File, Pipe, and FastFile. Enables data partitioning for distributed training. Handles input data validation and preprocessing.
Key Submodules:
channels
- Channel definition utilitiesfileio
- File I/O helperspipe_mode
- Support for pipe mode inputtransformer_input
- Input utilities for batch transformsPurpose: Provides access to pre-trained models and solution templates. Simplifies using foundation models from model hubs. Enables quick experimentation with state-of-the-art models. Supports fine-tuning of pre-trained models. Offers industry-specific solution templates.
Key Submodules:
models
- Pre-trained model accessartifacts
- Model artifact utilitiestemplates
- Solution templatesfine_tuning
- Fine-tuning utilitieshub_models
- Integration with model hubsPurpose: Handles model deployment and management in SageMaker. Provides functionality for creating and hosting model endpoints. Supports batch transform jobs for offline inference. Enables multi-model endpoints for cost efficiency. Handles model artifact management and versioning.
Key Submodules:
model
- Core model functionalitycontainer_def
- Container definition utilitiesmodel_package
- Model package functionalityinference_recommender
- Endpoint configuration recommendationsPurpose: Implements monitoring capabilities for models in production. Detects data drift and model quality issues. Provides automated alerting for model performance degradation. Enables custom monitoring schedules and thresholds. Supports visualizing monitoring results.
Key Submodules:
data_capture
- Data capture configurationmodel_monitoring
- Core monitoring functionalitydataset_format
- Dataset format utilitiesmonitoring_files
- Monitoring file managementvisualize
- Visualization tools for monitoring resultsPurpose: Provides automated machine learning capabilities. Automatically finds the best algorithm and hyperparameters for a dataset. Handles feature engineering and model selection. Supports explainability for AutoML models. Enables ensembling of top-performing models.
Key Submodules:
automl
- Core AutoML functionalitycandidate_estimator
- Candidate model managementautoml_config
- Configuration for AutoML jobscandidate_selector
- Model selection utilitiesPurpose: Handles data processing jobs separately from training. Provides data preprocessing, feature engineering, and model evaluation capabilities. Supports custom processing scripts and containers. Enables distributed processing for large datasets. Integrates with SageMaker pipelines.
Key Submodules:
processor
- Core processing functionalityframework_processor
- Framework-specific processorsprocessing_input
- Input configuration for processingprocessing_output
- Output configuration for processingPurpose: Enables running Python functions on SageMaker infrastructure. Simplifies moving compute-intensive workloads to the cloud. Provides seamless local-to-remote execution. Handles dependencies and environment setup automatically. Supports asynchronous execution and job management.
Key Submodules:
remote
- Core remote execution functionalitydecorators
- Function decorators for remote executioncontext
- Execution context managementconfig
- Configuration for remote executionPurpose: Manages SageMaker sessions and interactions with AWS services. Provides the primary interface for SageMaker API calls. Handles authentication and credentials. Enables resource tracking and management. Supports logging and error handling.
Key Submodules:
session
- Core session functionalityboto_session
- Boto3 session managementcustom_session
- Custom session configurationss3_utils
- S3 integration utilitiesPurpose: Provides TensorFlow integration with SageMaker. Enables training and deploying TensorFlow models. Supports distributed TensorFlow training. Handles TensorFlow-specific optimizations. Provides pre-built TensorFlow containers.
Key Submodules:
estimator
- TensorFlow estimatorsmodel
- TensorFlow model deploymentpredictor
- TensorFlow prediction utilitiesserving
- TensorFlow Serving integrationPurpose: Provides PyTorch integration with SageMaker. Enables training and deploying PyTorch models. Supports distributed PyTorch training. Handles PyTorch-specific optimizations. Provides pre-built PyTorch containers.
Key Submodules:
estimator
- PyTorch estimatorsmodel
- PyTorch model deploymentpredictor
- PyTorch prediction utilitiesdistributed
- Distributed training utilities for PyTorchPurpose: Integrates Hugging Face models and libraries with SageMaker. Enables training and fine-tuning transformer models. Provides optimized containers for NLP tasks. Supports deployment of Hugging Face models. Handles Hugging Face-specific configurations.
Key Submodules:
estimator
- Hugging Face estimatorsmodel
- Hugging Face model deploymentpredictor
- Hugging Face prediction utilitiestransformers
- Transformer model utilitiesPurpose: Enables building and managing ML workflows and pipelines. Provides DAG-based pipeline construction for ML workflows. Supports pipeline versioning and execution tracking. Enables conditional execution and branching in workflows. Integrates with SageMaker projects for MLOps.
Key Submodules:
pipeline
- Core pipeline functionalitysteps
- Pipeline step definitionsparameters
- Pipeline parameter utilitiesconditions
- Conditional execution componentsproperties
- Property reference utilitiesPurpose: Handles data serialization for model inference. Provides serializers for different data formats like JSON, CSV, and NPY. Supports custom serialization logic. Enables framework-specific serialization. Handles batch serialization for efficiency.
Key Submodules:
base
- Base serializer functionalityjson
- JSON serializationnumpy
- NumPy array serializationpandas
- DataFrame serializationtext
- Text serializationPurpose: Handles data deserialization after model inference. Provides deserializers for different data formats like JSON, CSV, and NPY. Supports custom deserialization logic. Enables framework-specific deserialization. Handles batch deserialization for efficiency.
Key Submodules:
base
- Base deserializer functionalityjson
- JSON deserializationnumpy
- NumPy array deserializationcsv
- CSV deserializationstream
- Stream deserializationPurpose: Enables making predictions with deployed models. Handles request and response formatting. Provides batch prediction capabilities. Supports different prediction protocols. Enables integrating custom pre/post-processing logic.
Key Submodules:
predictor
- Core prediction functionalityprediction_error
- Error handling for predictionsasync_predictor
- Asynchronous prediction supportbatch
- Batch prediction utilitiesPurpose: Implements tracking of model and data lineage. Enables understanding relationships between datasets, jobs, and models. Supports auditability and compliance requirements. Provides visualization of lineage graphs. Enables tracking of model provenance.
Key Submodules:
artifact
- Artifact trackingaction
- Action trackingcontext
- Context managementassociation
- Relationship trackingquery
- Lineage query utilitiesPurpose: Enables tracking and managing machine learning experiments. Provides experiment and trial components for organization. Supports metrics tracking and artifact logging. Enables experiment comparison and analysis. Integrates with SageMaker Studio for visualization.
Key Submodules:
experiment
- Core experiment functionalitytrial
- Trial managementtrial_component
- Trial component trackingrun
- Run managementmetrics
- Metrics tracking utilitiesPurpose: Provides hyperparameter tuning capabilities. Enables automatic optimization of model hyperparameters. Supports various search strategies like Bayesian, grid, and random search. Handles distributed tuning jobs. Provides visualization of tuning results.
Key Submodules:
tuner
- Core tuning functionalityhyperparameter_ranges
- Hyperparameter range definitionsobjective_metric
- Objective metric specificationsjob_definition
- Tuning job configurationanalysis
- Tuning result analysis