The Federated Query Landscape: AWS Athena Federation, Trino, and Starburst in 2025
In today’s data ecosystem, information lives everywhere—S3 buckets, data warehouses, APIs, transactional databases, and even spreadsheets. The modern data engineer faces a fundamental challenge: how to query all of these sources efficiently without building fragile, expensive ETL pipelines. This post explores how AWS Athena Federation, Trino, and Starburst solve this problem by enabling federated querying—an approach that lets you access and join data across multiple systems as if they were one.
1. The Evolution of Federated Query Engines
Before 2020, querying distributed data was synonymous with movement: data had to be copied, transformed, and centralized before analysis. But the explosion of cloud-native architectures and data governance requirements made centralization impractical. By 2025, federated query engines have become a pillar of the modern data mesh architecture, where domains own their data but still participate in global analytics.
At the heart of this movement are three systems that shaped the current landscape:
- Trino – The open-source distributed SQL query engine designed for low-latency analytics on large datasets.
- Starburst – The enterprise-grade platform built on Trino, offering governance, performance, and enterprise integrations.
- AWS Athena Federation – A fully managed, serverless querying service powered by Trino’s predecessor (Presto), extended with federated connectors to access external data sources.
2. What Is Federated Querying?
Federated querying allows engineers and analysts to query multiple, disparate data sources using standard SQL without physically moving the data. Instead of building ETL pipelines to consolidate everything into a data warehouse, a federated query engine connects to each source and executes queries where the data resides.
┌───────────────────────────────┐
│ Federated Query Engine │
│ (Trino / Starburst / Athena) │
└──────────────┬────────────────┘
│
┌────────────┼────────────┐
▼ ▼ ▼
MySQL S3 Parquet Snowflake
(PostgreSQL) (Data Lake) (Warehouse)
Federation saves on storage costs, avoids data duplication, and simplifies access management. However, it also introduces challenges around performance, query optimization, and cross-system consistency—areas where Athena, Trino, and Starburst differ significantly.
3. AWS Athena Federation
AWS Athena Federation builds upon the serverless Athena service, originally designed to query Amazon S3 using SQL. With federation, Athena extends its reach across AWS and external sources through connectors that execute queries in place via AWS Lambda functions.
Architecture Overview
┌──────────────────────────────────────────┐
│ AWS Athena Client │
│ (Console / JDBC / BI Tool / API) │
└──────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ Athena Federated Query Engine │
│ (Managed Presto / Trino Runtime) │
└──────────────────────────────────────────┘
│
▼
┌──────────────────────────────────────────┐
│ Lambda-Based Connectors │
│ (RDS, Redshift, DynamoDB, Snowflake...) │
└──────────────────────────────────────────┘
│
▼
Source Systems Queried In-Place
The use of Lambda functions for connectors makes Athena Federation fully serverless. When a query is executed, each Lambda connector retrieves relevant data from its source, applies pushdown filters, and returns intermediate results for aggregation within Athena’s distributed runtime.
Advantages
- No infrastructure to manage—completely serverless.
- Tight integration with AWS Glue Data Catalog and IAM.
- Good fit for AWS-native workloads (S3, RDS, Redshift).
Drawbacks
- Higher latency for external connectors due to Lambda cold starts.
- Limited control over optimization and cluster tuning.
- Best suited for medium-scale analytics, not massive cross-source joins.
Typical Athena Federation users include data analysts and engineering teams operating within AWS environments that want quick, ad-hoc insights without provisioning Trino clusters.
4. Trino: The Open-Source Powerhouse
Trino, the successor to PrestoSQL, is a high-performance distributed SQL query engine designed to query data where it lives. Unlike traditional warehouses, Trino doesn’t store data—it executes queries across many systems concurrently using a massively parallel processing (MPP) architecture.
Core Architecture
┌─────────────────────┐
│ Client │
│ (CLI / JDBC / REST) │
└─────────┬───────────┘
▼
┌─────────────────────┐
│ Coordinator │
│ - Parses SQL │
│ - Plans execution │
│ - Schedules tasks │
└─────────┬───────────┘
▼
┌─────────────────────┐
│ Worker Nodes │
│ - Execute splits │
│ - Exchange data │
└─────────────────────┘
Trino’s modular connector framework allows it to communicate with over 60 systems—from Hadoop and S3 to MySQL, PostgreSQL, Cassandra, and Kafka. It pushes computations as close to the data source as possible, minimizing network traffic and improving performance.
Sample Federated Query
SELECT u.user_id, o.order_total
FROM mysql.users u
JOIN s3.orders o
ON u.user_id = o.user_id
WHERE o.order_date > DATE '2025-01-01';
This simple example shows Trino’s ability to join relational and lake-based data seamlessly.
Adoption and Ecosystem
Trino’s adoption has surged among large-scale data-driven organizations like Netflix, Shopify, and DoorDash. Its open-source nature makes it a backbone for hybrid and multi-cloud analytics architectures. Popular integrations include:
- dbt-trino for transformation pipelines.
- Trino Gateway for multi-cluster routing.
- Iceberg connector for transactional lakehouse support.
Trino has also become a key player in data mesh implementations—each domain can deploy its own Trino cluster while maintaining query federation across others.
5. Starburst: Trino’s Enterprise Twin
Starburst extends Trino for the enterprise, adding management, governance, and performance capabilities critical to production environments. Founded by the original Presto developers, Starburst offers both a self-managed enterprise version and a SaaS platform called Starburst Galaxy.
Enterprise Features
| Feature | Trino (OSS) | Starburst |
|---|---|---|
| Cluster Management | Manual | Automated provisioning and scaling |
| Caching Layer | None | Smart caching (Data acceleration layer) |
| Security | Basic roles | SSO, row-level policies, audit logs |
| Query Governance | External monitoring | Built-in query cost and lineage tracking |
| Deployment Options | Self-hosted | Self-hosted or fully managed (Galaxy) |
Starburst focuses on simplifying Trino’s operational complexity. Features like smart query routing, data caching, and fine-grained access controls make it suitable for regulated industries such as finance, healthcare, and telecom.
Example: Smart Caching in Action
Query: SELECT * FROM sales WHERE region = 'EMEA';
Starburst Accelerator Cache
├── Checks cache metadata
├── Cache hit: serves data instantly
└── Cache miss: queries source, stores in cache for reuse
This caching approach reduces cross-system query latency from seconds to milliseconds for repeated analytical workloads—critical for dashboards and interactive queries.
Industry Adoption
Companies like Comcast, Goldman Sachs, and Roche have adopted Starburst to unify analytics across Snowflake, S3, and on-prem databases while maintaining strict compliance controls.
6. Comparing Athena, Trino, and Starburst
| Category | AWS Athena Federation | Trino | Starburst |
|---|---|---|---|
| Deployment Model | Serverless (AWS Managed) | Self-managed (Open Source) | Managed / Enterprise SaaS |
| Connectors | AWS + Limited JDBC | 60+ (OSS) | Extended & Optimized |
| Performance | Moderate | High | Very High (with caching) |
| Security & Governance | AWS IAM | Basic | Advanced (Ranger, SSO, Lineage) |
| Cost | Per-query pricing | Compute cost only | License + Infra |
| Best For | AWS-native quick analytics | Flexible hybrid data lakehouse | Enterprise data federation |
7. When to Use Each Tool
- Use Athena Federation if your data and workloads live mostly within AWS. It’s perfect for on-demand queries, cost-conscious analytics, or prototyping data access layers.
- Use Trino when you need open-source flexibility across multi-cloud or hybrid environments. It’s ideal for organizations that value control and extensibility.
- Use Starburst for enterprise-scale federation where performance, compliance, and manageability are critical. It provides the governance and optimization layer Trino lacks out of the box.
8. Federation in Practice: Example Use Case
Consider a retail company with customer data in PostgreSQL, orders in Snowflake, and clickstream data in S3. The goal: analyze customer lifetime value (CLV) without moving data.
WITH orders AS (
SELECT c.id AS customer_id, SUM(o.total) AS total_spent
FROM postgres.customers c
JOIN snowflake.orders o ON c.id = o.customer_id
GROUP BY c.id
)
SELECT o.customer_id, o.total_spent, COUNT(events.session_id) AS sessions
FROM orders o
JOIN s3.clickstream.events events ON o.customer_id = events.user_id
GROUP BY o.customer_id, o.total_spent;
This federated query demonstrates how Trino or Starburst can seamlessly blend structured and semi-structured data, something previously achievable only via complex ETL.
9. The Future of Federated Querying
As of 2025, federated querying continues to mature toward unified data access. Emerging trends include:
- Cost-based federation optimizers that automatically balance performance and cloud spend.
- AI-assisted query planning—adaptive optimizers that learn workloads over time.
- Integration with governance frameworks like DataHub and Amundsen.
- Transactional lakehouse support (Trino + Iceberg + Delta connectors) closing the gap between OLAP and OLTP.
Federation is no longer a stopgap—it’s becoming the standard layer in the modern data stack, bridging domains without losing autonomy or control.
Conclusion
Federated query tools have shifted the way engineers think about analytics. AWS Athena Federation delivers simplicity and serverless accessibility, Trino provides the open foundation for complex, hybrid analytics, and Starburst brings enterprise-grade performance and governance to the mix. Together, they form the backbone of the 2025 federated data ecosystem—empowering teams to query everything, everywhere, using one language: SQL.
