Feature Store in Machine Learning

--- title: Feature Store in Machine Learning tags: ML, VTB description: View the slide with "Slide Mode". --- :bookmark_tabs: Links https://github.com/logicalclocks/hopsworks https://github.com/linkedin/feathr https://docs.featurestore.org/feature-stores-faq https://eng.uber.com/michelangelo-machine-learning-platform/ # Feature Store Overview slide: https://hackmd.io/p/qKKFmdE5SVWBM-N6brM_tg :point_right: TL,DR: Feature Store provides a centralized repository for organizing, storing, and serving ML features [EXISTING FEATURE STORES](https://www.featurestore.org/) Uber built Palette. Airbnb built Zipline. Netflix built Time Travel. Google Cloud + GoJek built Feast... --- ![](https://i.imgur.com/T9cEuGT.png) --- --- :question: Why talking about Feature Store? :point_right: ***to Use/Serve/Sharing/Data Lineage tracking, Training - Serving skew problem*** --- ### Traditional solutions - Putting the preprocessing code within the model Ex: Keras/ Pytorch preprocessing_layers ![](https://i.imgur.com/SWXJm8m.png) > Pros: - Simplicity - No extra infrastructure is required. - Easy to deploy, nothing special you have to do. The SavedModel format contains all the necessary information. > Cons: - Preprocessing steps will be wastefully repeated on each iteration through the training dataset - Have to implement the preprocessing code in the same framework as the ML model - Using a transform function Ex: [TensorFlow TFX](https://www.tensorflow.org/tfx) ![](https://i.imgur.com/AKwsctV.png) > Pros: - No need the raw data during each iteration > Cons: - Adds complexity - No sharing... ![Traditional Store](https://i.imgur.com/GnyWhWt.png) - **Using a feature store** ![](https://i.imgur.com/A6rEki3.png) > When - Prediction request needs more features which are calculated - Prevent unnecessary copies of the data (sharing features among models) - When models need history and context data Ex: Embedding model, dynamic while streaming ![](https://i.imgur.com/RHcpJzk.png) ## Detail on Feature Store ### Existing solution ![](https://i.imgur.com/OBnF0Jg.png) ### Components ![](https://i.imgur.com/gIlcVXB.png) - Feature Registry > It provides search & discovery of features - Operational Monitoring > Feature store describes data correctness and data quality. - Transform > Processes raw data variables into features.There are 3 types of transformations - 1. Batch — data at rest, archived data typically in a data warehouse such as user transaction history - 2. Stream — data in motion, typically in a PubSub engine such as no of clicks in current session - 3. On-Demand — data available at that time, cannot be pre-computed and available from frontend application such as user IP Address - Storage > Offline (Redshift, Snowflake, S3, BigQuery or HDFS) and online (Redis, Cassandra, MongoDB, DynamoDB, Elasticsearch, Solr etc) storage are provided by feature stores. - Serving ### Architecture (by Hopsworks) ![](https://i.imgur.com/5dJDIOi.png) ![](https://i.imgur.com/kkIqijX.png) ### Feature Store vs Data Warehouse https://www.hopsworks.ai/post/feature-store-vs-data-warehouse ### Example https://www.hopsworks.ai/post/show-me-the-code-how-we-linked-notebooks-to-features https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/feature_store/gapic-feature-store.ipynb#scrollTo=foNB0D2aw37c --- ## When using Feature Store? ![](https://i.imgur.com/Opo9xsP.png) ### Wrap up --- ### Thank you! :sheep: