---
title: Feature Store in Machine Learning
tags: ML, VTB
description: View the slide with "Slide Mode".
---
:bookmark_tabs: Links
https://github.com/logicalclocks/hopsworks
https://github.com/linkedin/feathr
https://docs.featurestore.org/feature-stores-faq
https://eng.uber.com/michelangelo-machine-learning-platform/
# Feature Store Overview
slide: https://hackmd.io/p/qKKFmdE5SVWBM-N6brM_tg
:point_right: TL,DR: Feature Store provides a centralized repository for organizing, storing, and serving ML features
[EXISTING FEATURE STORES](https://www.featurestore.org/)
Uber built Palette. Airbnb built Zipline. Netflix built Time Travel. Google Cloud + GoJek built Feast...
---

---
---
:question: Why talking about Feature Store?
:point_right: ***to Use/Serve/Sharing/Data Lineage tracking, Training - Serving skew problem***
---
### Traditional solutions
- Putting the preprocessing code within the model
Ex: Keras/ Pytorch preprocessing_layers

> Pros:
- Simplicity
- No extra infrastructure is required.
- Easy to deploy, nothing special you have to do. The SavedModel format contains all the necessary information.
> Cons:
- Preprocessing steps will be wastefully repeated on each iteration through the training dataset
- Have to implement the preprocessing code in the same framework as the ML model
- Using a transform function
Ex: [TensorFlow TFX](https://www.tensorflow.org/tfx)

> Pros:
- No need the raw data during each iteration
> Cons:
- Adds complexity
- No sharing...

- **Using a feature store**

> When
- Prediction request needs more features which are calculated
- Prevent unnecessary copies of the data (sharing features among models)
- When models need history and context data
Ex: Embedding model, dynamic while streaming

## Detail on Feature Store
### Existing solution

### Components

- Feature Registry
> It provides search & discovery of features
- Operational Monitoring
> Feature store describes data correctness and data quality.
- Transform
> Processes raw data variables into features.There are 3 types of transformations
- 1. Batch — data at rest, archived data typically in a data warehouse such as user transaction history
- 2. Stream — data in motion, typically in a PubSub engine such as no of clicks in current session
- 3. On-Demand — data available at that time, cannot be pre-computed and available from frontend application such as user IP Address
- Storage
> Offline (Redshift, Snowflake, S3, BigQuery or HDFS) and online (Redis, Cassandra, MongoDB, DynamoDB, Elasticsearch, Solr etc) storage are provided by feature stores.
- Serving
### Architecture
(by Hopsworks)


### Feature Store vs Data Warehouse
https://www.hopsworks.ai/post/feature-store-vs-data-warehouse
### Example
https://www.hopsworks.ai/post/show-me-the-code-how-we-linked-notebooks-to-features
https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/official/feature_store/gapic-feature-store.ipynb#scrollTo=foNB0D2aw37c
---
## When using Feature Store?

### Wrap up
---
### Thank you! :sheep: