Tools: AWS Athena Federation, Starburst, Trino

Exploring the Modern Query Federation Stack: AWS Athena Federation, Starburst, and Trino

As data infrastructures grow increasingly decentralized, the ability to query data across diverse sources has become a cornerstone of modern analytics. Tools like AWS Athena Federation, Starburst, and Trino enable teams to perform federated queries—analyzing data without moving it. This post explores how these systems compare, integrate, and evolve within the broader data engineering landscape of 2025.

1. The Rise of Federated Query Engines

In a world where data resides across lakes, warehouses, SaaS platforms, and operational databases, traditional ETL approaches struggle to keep pace. Federated query engines bridge that gap by providing a single SQL interface across multiple backends. Instead of centralizing data, they bring computation to where the data lives.

Key benefits of federation include:

Reduced data movement: Minimize data duplication and cost of transfers.
Unified access: Analysts can query across S3, PostgreSQL, Snowflake, and even APIs from one endpoint.
Governance-friendly: Data stays in its source domain, aligning with modern data mesh principles.

In 2025, the federation stack has consolidated around a few powerful engines—Trino (and its commercial fork Starburst) and cloud-native integrations like AWS Athena Federation.

2. AWS Athena Federation Overview

AWS Athena started as a serverless SQL query service for Amazon S3, powered by Presto (now Trino). Federation expanded Athena’s scope to query data across multiple AWS and external systems without ETL. Using Athena connectors, engineers can query data from sources like:

Amazon RDS and Aurora (MySQL, PostgreSQL)
Amazon Redshift
DynamoDB
Google BigQuery and Snowflake (via custom connectors)
On-premises JDBC-compatible databases

Architecture Diagram (Textual)

┌──────────────────────────────────────────┐
│ AWS Athena Client │
│ (SQL query via console, API, or SDK) │
└──────────────────────────────────────────┘
 │
 ▼
┌──────────────────────────────────────────┐
│ Athena Federated Query Engine │
│ (Presto/Trino runtime on AWS) │
└──────────────────────────────────────────┘
 │
 ▼
┌──────────────────────────────────────────┐
│ AWS Lambda Connectors (Federation Layer) │
│ • S3 • RDS • Redshift • DynamoDB │
│ • Snowflake • API Endpoints │
└──────────────────────────────────────────┘
 │
 ▼
 Data Sources Queried in Place

The Lambda-based connector model is particularly elegant—each data source has a small, stateless connector deployed as a Lambda function. When a query runs, Athena invokes these functions in parallel, federating results back to the engine. This model offers scalability and low operational overhead.

Strengths

Completely serverless (no cluster management).
Tight integration with AWS IAM and Glue Data Catalog.
Supports custom connectors for third-party data sources.

Limitations

Limited optimization for cross-source joins compared to native Trino clusters.
Connector cold-start latency (due to AWS Lambda).
Less control over execution tuning (since AWS manages runtime).

3. Trino: The Open Engine Behind It All

Trino is the open-source distributed SQL query engine originally developed as PrestoSQL. It allows querying data from multiple systems using connectors and executes queries across clusters using massively parallel processing (MPP). Its architecture is designed for high-performance analytics over federated and large-scale datasets.

Core Architecture

┌────────────────────────┐
│ Client │
│ (CLI / BI / JDBC) │
└────────────────────────┘
 │
 ▼
┌────────────────────────┐
│ Coordinator │
│ Parses & optimizes │
└────────────────────────┘
 │
 ▼
┌────────────────────────┐
│ Workers │
│ Execute split tasks │
└────────────────────────┘

Trino excels at pushing computation down to source systems and parallelizing work. It supports dozens of connectors out of the box, including S3, Hive, Cassandra, Kafka, MySQL, PostgreSQL, and Elasticsearch.

Example: Querying Across Sources

SELECT c.customer_id, o.order_total
FROM mysql.sales.customers c
JOIN s3.orders_data.orders o
 ON c.customer_id = o.customer_id
WHERE o.order_date > DATE '2025-01-01';

This query demonstrates Trino’s federated capability—joining customer data in MySQL with order data in S3 seamlessly, using a single SQL interface.

Adoption and Ecosystem

Trino is now one of the most popular open data query engines, widely adopted by Netflix, LinkedIn, Shopify, and DoorDash. Its performance and flexibility make it ideal for organizations pursuing lakehouse or data mesh architectures.

Popular integrations include:

dbt-trino: dbt adapter for data transformations.
Trino Gateway: Load-balancing multiple Trino clusters.
Starburst Galaxy: Managed cloud offering built on Trino.

4. Starburst: Enterprise Trino on Steroids

Starburst emerged from the creators of Presto/Trino to offer a commercial, enterprise-grade distribution of Trino. It adds performance optimization, data governance, and enterprise security on top of the open-source base.

In 2025, Starburst’s products include:

Starburst Galaxy: Fully managed Trino clusters in AWS, Azure, and GCP.
Starburst Enterprise: Self-hosted Trino with enhanced caching and cost governance.
Gravity: Built-in catalog for unified metadata management.

Enterprise Features

Feature	Starburst	Trino OSS
Cluster Management	Automatic scaling and provisioning	Manual deployment
Data Caching	Smart caching layer with local spill	None
Security	Fine-grained access control, SSO, and audit logs	Basic authentication
Cost Governance	Query monitoring and budgeting	Limited via external tools

Starburst integrates natively with Apache Ranger, AWS Lake Formation, and Okta for enterprise-grade access management, making it a top choice for regulated industries like finance and healthcare.

Performance Enhancements

Starburst’s smart query routing and data locality optimization significantly improve latency when federating across heterogeneous sources. Its cost-based optimizer (CBO) evaluates multiple execution plans, reducing scan time and improving join efficiency.

5. Comparative Overview

Aspect	AWS Athena Federation	Trino	Starburst
Deployment	Serverless (AWS-managed)	Self-managed (open source)	Managed or enterprise-deployed
Connectors	Limited, AWS-focused	Extensive (50+)	Extended (optimized enterprise connectors)
Performance	Moderate (Lambda-based)	High (MPP architecture)	Very high (optimized CBO + caching)
Security & Governance	IAM integration	Basic roles	Advanced (Ranger, SSO, audit)
Use Case Fit	Quick AWS analytics	Cross-platform, open analytics	Enterprise data federation

6. When to Use Each

Use Athena Federation when you want to query multiple AWS-native sources with minimal setup. It’s perfect for ad-hoc analytics, cloud cost optimization, or quick joins across S3 and RDS.
Use Trino when you need flexibility, control, and high throughput across diverse data ecosystems—ideal for data lakehouse implementations.
Use Starburst when enterprise governance, compliance, and performance tuning are mission-critical. Large-scale organizations like Comcast and Goldman Sachs use Starburst to power federated BI and self-service analytics.

7. Future Trends: Beyond Federation

Federation is evolving into a broader vision of unified data access. In 2025, leading vendors are integrating AI-driven query planning and cost-based optimizers that dynamically adjust query paths based on source latency and cost metrics. Expect closer integration with data catalogs (like DataHub and Amundsen) and governance layers to provide end-to-end lineage.

Meanwhile, the open-source community continues pushing Trino forward. The introduction of the Iceberg connector with ACID transactions and the Trino-on-Delta adapter has blurred the lines between federated and transactional queries.

8. Example: End-to-End Federation Architecture

┌───────────────────────────────┐
│ BI Tools │
│ (Tableau, Looker, Superset) │
└──────────────┬────────────────┘
 │ JDBC/ODBC
 ▼
┌───────────────────────────────┐
│ Trino / Starburst Layer │
│ Federated Query Engine │
└──────────────┬────────────────┘
 │ Connectors
 ▼
┌───────────────────────────────┐
│ Data Sources: │
│ • S3 / Lakehouse │
│ • Snowflake / BigQuery │
│ • MySQL / PostgreSQL │
│ • Kafka Streams │
└───────────────────────────────┘

This architecture exemplifies how modern analytics teams can unify access to hybrid data environments while keeping governance centralized.

Conclusion

Federated query engines represent the future of cloud-scale analytics. AWS Athena Federation offers simplicity, Trino provides flexibility and speed, and Starburst delivers enterprise-grade control and optimization. Together, they define a mature ecosystem where engineers can query anything, anywhere, using the language of SQL—without sacrificing performance or compliance.

Whether you are designing a lakehouse, data mesh, or unified analytics layer, these tools provide the foundation for a federated future that prioritizes accessibility, governance, and cost efficiency.