Tools: Feast, Hopsworks

Feature Stores in Modern Machine Learning

As machine learning (ML) systems mature, the gap between model development and production deployment has widened. Data scientists often face inconsistencies between offline training and online serving. To solve this, feature stores have emerged as the missing layer in the MLOps ecosystem. Among the most prominent open-source and enterprise-ready tools are Feast and Hopsworks. These platforms standardize feature management, ensuring consistency, reproducibility, and scalability across ML pipelines.

1. What Are Feature Stores?

A feature store is a centralized system for managing and serving features—the data attributes used by machine learning models. It provides a consistent interface to store, retrieve, and share features across teams and environments. Instead of manually recreating feature logic for each use case, teams define features once and reuse them seamlessly in training and inference.

Feature Store Capabilities

  • Feature Engineering Standardization – Centralized transformation logic shared between offline (batch) and online (real-time) use cases.
  • Feature Serving – Low-latency online store for real-time inference.
  • Versioning and Lineage – Track feature versions, data sources, and transformations.
  • Integration – Connect with data warehouses, stream processors, and model-serving tools.

2. Introducing Feast

Feast (Feature Store) is an open-source feature store originally developed by GO-JEK and later adopted by Google Cloud. It focuses on simplicity, scalability, and integration flexibility. Feast provides a unified interface for feature retrieval, enabling data scientists and engineers to define, materialize, and serve features efficiently.

Key Concepts in Feast

Concept Description
Feature View A logical definition of features computed from a source dataset.
Entity A unique identifier for grouping features (e.g., user_id, product_id).
Feature Store The core component managing metadata, offline/online storage, and retrieval APIs.

Example: Defining a Feature View

from feast import FeatureStore, Entity, FeatureView, Field
from feast.types import Float32, Int64

customer = Entity(name="customer_id")

transactions_view = FeatureView(
 name="customer_transactions",
 entities=[customer],
 ttl=timedelta(days=7),
 schema=[
 Field(name="transaction_count", dtype=Int64),
 Field(name="avg_transaction_value", dtype=Float32),
 ],
 source=SomeDataSource()
)

store = FeatureStore(repo_path=".")
store.apply([customer, transactions_view])

Architecture Overview

 +-----------------------------+
 | Feast CLI / SDK |
 +-------------+---------------+
 |
 +------------v-----------+
 | Feature Registry |
 +------------+-----------+
 |
 +------------------+------------------+
 | |
 +--------v--------+ +--------v--------+
 | Offline Store | | Online Store |
 | (e.g. BigQuery)| | (e.g. Redis) |
 +-----------------+ +-----------------+

Why Engineers Choose Feast

  • Open-source and cloud-agnostic.
  • Supports hybrid batch + streaming ingestion.
  • Integrates with Redis, BigQuery, Snowflake, and Kafka.
  • Lightweight and easy to embed into existing MLOps workflows (e.g., Kubeflow, MLflow).

3. Introducing Hopsworks

Hopsworks is a full-featured feature store and ML platform built by Logical Clocks. It supports both open-source and enterprise editions and is deeply integrated with the HopsFS distributed file system and Hudi for versioned feature storage. Hopsworks extends beyond feature management to include lineage tracking, model serving, and observability.

Hopsworks Highlights

  • Online and Offline Stores – Backed by MySQL Cluster and Hudi.
  • Feature Pipelines – Integration with Spark, Kafka, and Airflow for end-to-end automation.
  • Feature Governance – Includes data validation, access control, and feature versioning.
  • Seamless MLOps – Built-in integration with TensorFlow Extended (TFX), SageMaker, and Databricks.

Example: Writing Features to Hopsworks

import hopsworks
project = hopsworks.login()
fs = project.get_feature_store()

transactions_fg = fs.get_or_create_feature_group(
 name="customer_transactions",
 version=1,
 primary_key=["customer_id"],
 description="Customer transaction features"
)

transactions_fg.insert(transactions_df)

Hopsworks Architecture Diagram

 +-------------------------------------------+
 | Hopsworks UI / API |
 +-------------------------------------------+
 |
 +---------------------------+---------------------------+
 | | |
+-------v--------+ +---------v---------+ +--------v--------+
| Feature Engine | | Online Feature DB | | Offline Storage |
| (Spark/Kafka) | | (MySQL Cluster) | | (Hudi/HopsFS) |
+----------------+ +-------------------+ +-----------------+

Why Enterprises Use Hopsworks

  • Feature lineage and metadata tracking out of the box.
  • Enterprise-grade governance and access control.
  • Visual interface for feature management and exploration.
  • Cloud integrations: AWS, Azure, and GCP with managed deployments.

4. Comparing Feast and Hopsworks

Capability Feast Hopsworks
Primary Focus Lightweight feature serving Full-featured enterprise ML platform
Architecture Decoupled (plug-in storage) Tightly integrated (HopsFS, Hudi, MySQL)
Ease of Setup Simple (Docker or local repo) Complex (requires platform installation)
Use Cases Fast feature serving, experimentation End-to-end governance, compliance, large-scale ML
Integrations MLflow, Kubeflow, Redis, BigQuery SageMaker, Databricks, Airflow, TFX
Best For Agile ML teams and startups Enterprises needing governance and compliance

5. Industry Adoption and Ecosystem

  • Feast is widely adopted by companies like Robinhood, Netflix, and Grab for managing real-time ML features.
  • Hopsworks powers ML pipelines in enterprises like Ericsson, Volvo, and Scania where regulatory and lineage requirements are critical.

Feature Store Growth Trend (2022-2025)

Feature Store Market Growth (Estimates)

100 | β–ˆβ–ˆβ–ˆβ–ˆ
 90 | β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
 80 | β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
 70 | β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
 60 | β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
 50 | β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
 40 | β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
 30 | β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
 20 | β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
 10 | β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ
 0 |__________________________________________________________
 2022 2023 2024 2025 (Projected)

6. Integrating Feature Stores in MLOps Pipelines

Feature stores bridge the gap between data engineering and machine learning by ensuring that the same features used for model training are available in production. They integrate into CI/CD pipelines, automate version control, and provide unified monitoring.

Pipeline Example

Data Ingestion β†’ Feature Computation β†’ Feature Store (Feast/Hopsworks) β†’ Model Training β†’ Deployment β†’ Online Inference

With orchestration frameworks like Kubeflow Pipelines, Airflow, and Dagster, teams can automate feature ingestion, validation, and model retraining. Tools like MLflow and Weights & Biases integrate seamlessly with Feast or Hopsworks to track experiments and lineage.

7. Best Practices for Using Feature Stores

  • Define features once – Reuse definitions for both training and serving.
  • Automate data validation using Great Expectations or TensorFlow Data Validation.
  • Implement observability to monitor drift and data freshness.
  • Leverage versioning for reproducibility and rollback.
  • Secure access through IAM roles and fine-grained permissions.

8. Emerging Trends (2025 and Beyond)

As AI and data infrastructure continue to evolve, feature stores are becoming foundational components of the MLOps stack. The next generation of feature stores focuses on:

  • Real-time Feature Streaming with Kafka and Flink integration.
  • Vector Stores for embedding management in LLM applications (e.g., Pinecone, Weaviate).
  • Federated Feature Stores for cross-cloud data sharing.
  • AI-assisted Feature Discovery leveraging metadata and data profiling.

9. Conclusion

Feature stores have transitioned from niche infrastructure to core data science tools. Both Feast and Hopsworks provide powerful capabilities to operationalize ML, but they cater to different organizational needs. Feast excels in lightweight, flexible deployments, while Hopsworks shines in enterprise-scale governance and observability. The future lies in combining these principles to create self-service, intelligent data platforms that empower ML teams to innovate faster and safer.

References: