Try   HackMD

Amazon SageMaker Python SDK Subpackages

1. sagemaker.algorithm

Purpose: Provides functionality for working with algorithmic estimators in SageMaker. Enables users to use built-in algorithms for common ML tasks. Allows packaging custom algorithms for reuse. Supports algorithm validation and registration in the marketplace. Facilitates algorithm selection and tuning.

Key Submodules:

  • algorithm_estimator - Classes for training with algorithm resources
  • algorithm_spec - Algorithm specification utilities
  • metadata_properties - Metadata processing components
  • metric_definitions - ML metric definition helpers

2. sagemaker.analytics

Purpose: Offers tools for analyzing training jobs, models, and model performance. Provides metrics collection and visualization capabilities. Enables tracking of model experiments and their results. Helps with monitoring training progress and resource utilization. Supports historical analysis of previous training jobs.

Key Submodules:

  • analyze_config - Configuration for analysis jobs
  • metrics_fetcher - Tools for retrieving metrics
  • training_job_analytics - Training job performance analysis
  • experiment_analytics - Experiment tracking and analysis

3. sagemaker.clarify

Purpose: Implements bias detection and explainability tools for ML models. Helps identify bias in data and model predictions. Provides feature attribution methods to understand model decisions. Supports fairness metrics and reporting. Enables compliance with responsible AI practices.

Key Submodules:

  • bias - Bias detection utilities
  • explainability - Model explanation methods
  • feature_attribution - Feature importance analysis
  • shap - SHAP value calculation implementations
  • report - Report generation tools

4. sagemaker.debugger

Purpose: Enables real-time debugging of training jobs. Captures internal model states during training. Allows setting up rules to detect training issues like vanishing gradients. Supports custom debugging hooks for major ML frameworks. Provides visualization tools for understanding model behavior.

Key Submodules:

  • debugger - Core debugging functionality
  • hook - Framework hooks for data capture
  • rule - Built-in and custom rules for training issues
  • rule_configs - Rule configuration helpers
  • tensorboard - TensorBoard integration

5. sagemaker.estimator

Purpose: Provides core functionality for training models in SageMaker. Handles the entire training process from data preparation to model creation. Supports distributed training across multiple instances. Enables hyperparameter management and configuration. Provides framework-specific estimators for popular ML libraries.

Key Submodules:

  • estimator - Base estimator classes
  • framework - Framework-specific estimators
  • training_config - Training configuration utilities
  • utils - Utility functions for estimators

6. sagemaker.feature_store

Purpose: Manages feature data for ML training and inference. Provides a central repository for features across models. Supports online and offline feature stores for different access patterns. Enables feature sharing and reuse across teams. Helps with feature versioning and lineage tracking.

Key Submodules:

  • feature_group - Feature grouping functionality
  • feature_definition - Feature definition utilities
  • feature_processor - Feature processing components
  • feature_store_session - Session management for feature store

7. sagemaker.image_uris

Purpose: Provides utilities for retrieving container image URIs. Simplifies the process of finding the right container for algorithms. Handles regional differences in container registries. Supports version management for container images. Enables custom container integration.

Key Submodules:

  • retrieve - Core retrieval functionality
  • utils - Helper functions for URI manipulation
  • deep_learning_containers - DLC-specific URI helpers
  • regions - Region-specific URI mappings

8. sagemaker.inputs

Purpose: Manages input data channels for training jobs. Provides utilities for data location and format specification. Supports different input modes like File, Pipe, and FastFile. Enables data partitioning for distributed training. Handles input data validation and preprocessing.

Key Submodules:

  • channels - Channel definition utilities
  • fileio - File I/O helpers
  • pipe_mode - Support for pipe mode input
  • transformer_input - Input utilities for batch transforms

9. sagemaker.jumpstart

Purpose: Provides access to pre-trained models and solution templates. Simplifies using foundation models from model hubs. Enables quick experimentation with state-of-the-art models. Supports fine-tuning of pre-trained models. Offers industry-specific solution templates.

Key Submodules:

  • models - Pre-trained model access
  • artifacts - Model artifact utilities
  • templates - Solution templates
  • fine_tuning - Fine-tuning utilities
  • hub_models - Integration with model hubs

10. sagemaker.model

Purpose: Handles model deployment and management in SageMaker. Provides functionality for creating and hosting model endpoints. Supports batch transform jobs for offline inference. Enables multi-model endpoints for cost efficiency. Handles model artifact management and versioning.

Key Submodules:

  • model - Core model functionality
  • container_def - Container definition utilities
  • model_package - Model package functionality
  • inference_recommender - Endpoint configuration recommendations

11. sagemaker.model_monitor

Purpose: Implements monitoring capabilities for models in production. Detects data drift and model quality issues. Provides automated alerting for model performance degradation. Enables custom monitoring schedules and thresholds. Supports visualizing monitoring results.

Key Submodules:

  • data_capture - Data capture configuration
  • model_monitoring - Core monitoring functionality
  • dataset_format - Dataset format utilities
  • monitoring_files - Monitoring file management
  • visualize - Visualization tools for monitoring results

12. sagemaker.automl

Purpose: Provides automated machine learning capabilities. Automatically finds the best algorithm and hyperparameters for a dataset. Handles feature engineering and model selection. Supports explainability for AutoML models. Enables ensembling of top-performing models.

Key Submodules:

  • automl - Core AutoML functionality
  • candidate_estimator - Candidate model management
  • automl_config - Configuration for AutoML jobs
  • candidate_selector - Model selection utilities

13. sagemaker.processing

Purpose: Handles data processing jobs separately from training. Provides data preprocessing, feature engineering, and model evaluation capabilities. Supports custom processing scripts and containers. Enables distributed processing for large datasets. Integrates with SageMaker pipelines.

Key Submodules:

  • processor - Core processing functionality
  • framework_processor - Framework-specific processors
  • processing_input - Input configuration for processing
  • processing_output - Output configuration for processing

14. sagemaker.remote_function

Purpose: Enables running Python functions on SageMaker infrastructure. Simplifies moving compute-intensive workloads to the cloud. Provides seamless local-to-remote execution. Handles dependencies and environment setup automatically. Supports asynchronous execution and job management.

Key Submodules:

  • remote - Core remote execution functionality
  • decorators - Function decorators for remote execution
  • context - Execution context management
  • config - Configuration for remote execution

15. sagemaker.session

Purpose: Manages SageMaker sessions and interactions with AWS services. Provides the primary interface for SageMaker API calls. Handles authentication and credentials. Enables resource tracking and management. Supports logging and error handling.

Key Submodules:

  • session - Core session functionality
  • boto_session - Boto3 session management
  • custom_session - Custom session configurations
  • s3_utils - S3 integration utilities

16. sagemaker.tensorflow

Purpose: Provides TensorFlow integration with SageMaker. Enables training and deploying TensorFlow models. Supports distributed TensorFlow training. Handles TensorFlow-specific optimizations. Provides pre-built TensorFlow containers.

Key Submodules:

  • estimator - TensorFlow estimators
  • model - TensorFlow model deployment
  • predictor - TensorFlow prediction utilities
  • serving - TensorFlow Serving integration

17. sagemaker.pytorch

Purpose: Provides PyTorch integration with SageMaker. Enables training and deploying PyTorch models. Supports distributed PyTorch training. Handles PyTorch-specific optimizations. Provides pre-built PyTorch containers.

Key Submodules:

  • estimator - PyTorch estimators
  • model - PyTorch model deployment
  • predictor - PyTorch prediction utilities
  • distributed - Distributed training utilities for PyTorch

18. sagemaker.huggingface

Purpose: Integrates Hugging Face models and libraries with SageMaker. Enables training and fine-tuning transformer models. Provides optimized containers for NLP tasks. Supports deployment of Hugging Face models. Handles Hugging Face-specific configurations.

Key Submodules:

  • estimator - Hugging Face estimators
  • model - Hugging Face model deployment
  • predictor - Hugging Face prediction utilities
  • transformers - Transformer model utilities

19. sagemaker.workflow

Purpose: Enables building and managing ML workflows and pipelines. Provides DAG-based pipeline construction for ML workflows. Supports pipeline versioning and execution tracking. Enables conditional execution and branching in workflows. Integrates with SageMaker projects for MLOps.

Key Submodules:

  • pipeline - Core pipeline functionality
  • steps - Pipeline step definitions
  • parameters - Pipeline parameter utilities
  • conditions - Conditional execution components
  • properties - Property reference utilities

20. sagemaker.serializers

Purpose: Handles data serialization for model inference. Provides serializers for different data formats like JSON, CSV, and NPY. Supports custom serialization logic. Enables framework-specific serialization. Handles batch serialization for efficiency.

Key Submodules:

  • base - Base serializer functionality
  • json - JSON serialization
  • numpy - NumPy array serialization
  • pandas - DataFrame serialization
  • text - Text serialization

21. sagemaker.deserializers

Purpose: Handles data deserialization after model inference. Provides deserializers for different data formats like JSON, CSV, and NPY. Supports custom deserialization logic. Enables framework-specific deserialization. Handles batch deserialization for efficiency.

Key Submodules:

  • base - Base deserializer functionality
  • json - JSON deserialization
  • numpy - NumPy array deserialization
  • csv - CSV deserialization
  • stream - Stream deserialization

22. sagemaker.predictor

Purpose: Enables making predictions with deployed models. Handles request and response formatting. Provides batch prediction capabilities. Supports different prediction protocols. Enables integrating custom pre/post-processing logic.

Key Submodules:

  • predictor - Core prediction functionality
  • prediction_error - Error handling for predictions
  • async_predictor - Asynchronous prediction support
  • batch - Batch prediction utilities

23. sagemaker.lineage

Purpose: Implements tracking of model and data lineage. Enables understanding relationships between datasets, jobs, and models. Supports auditability and compliance requirements. Provides visualization of lineage graphs. Enables tracking of model provenance.

Key Submodules:

  • artifact - Artifact tracking
  • action - Action tracking
  • context - Context management
  • association - Relationship tracking
  • query - Lineage query utilities

24. sagemaker.experiments

Purpose: Enables tracking and managing machine learning experiments. Provides experiment and trial components for organization. Supports metrics tracking and artifact logging. Enables experiment comparison and analysis. Integrates with SageMaker Studio for visualization.

Key Submodules:

  • experiment - Core experiment functionality
  • trial - Trial management
  • trial_component - Trial component tracking
  • run - Run management
  • metrics - Metrics tracking utilities

25. sagemaker.tuner

Purpose: Provides hyperparameter tuning capabilities. Enables automatic optimization of model hyperparameters. Supports various search strategies like Bayesian, grid, and random search. Handles distributed tuning jobs. Provides visualization of tuning results.

Key Submodules:

  • tuner - Core tuning functionality
  • hyperparameter_ranges - Hyperparameter range definitions
  • objective_metric - Objective metric specifications
  • job_definition - Tuning job configuration
  • analysis - Tuning result analysis