Exploring the Modern Query Federation Stack: AWS Athena Federation, Starburst, and Trino
As data infrastructures grow increasingly decentralized, the ability to query data across diverse sources has become a cornerstone of modern analytics. Tools like AWS Athena Federation, Starburst, and Trino enable teams to perform federated queriesβanalyzing data without moving it. This post explores how these systems compare, integrate, and evolve within the broader data engineering landscape of 2025.
1. The Rise of Federated Query Engines
In a world where data resides across lakes, warehouses, SaaS platforms, and operational databases, traditional ETL approaches struggle to keep pace. Federated query engines bridge that gap by providing a single SQL interface across multiple backends. Instead of centralizing data, they bring computation to where the data lives.
Key benefits of federation include:
- Reduced data movement: Minimize data duplication and cost of transfers.
- Unified access: Analysts can query across S3, PostgreSQL, Snowflake, and even APIs from one endpoint.
- Governance-friendly: Data stays in its source domain, aligning with modern data mesh principles.
In 2025, the federation stack has consolidated around a few powerful enginesβTrino (and its commercial fork Starburst) and cloud-native integrations like AWS Athena Federation.
2. AWS Athena Federation Overview
AWS Athena started as a serverless SQL query service for Amazon S3, powered by Presto (now Trino). Federation expanded Athena’s scope to query data across multiple AWS and external systems without ETL. Using Athena connectors, engineers can query data from sources like:
- Amazon RDS and Aurora (MySQL, PostgreSQL)
- Amazon Redshift
- DynamoDB
- Google BigQuery and Snowflake (via custom connectors)
- On-premises JDBC-compatible databases
Architecture Diagram (Textual)
ββββββββββββββββββββββββββββββββββββββββββββ
β AWS Athena Client β
β (SQL query via console, API, or SDK) β
ββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββ
β Athena Federated Query Engine β
β (Presto/Trino runtime on AWS) β
ββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββββββββββββββββββββ
β AWS Lambda Connectors (Federation Layer) β
β β’ S3 β’ RDS β’ Redshift β’ DynamoDB β
β β’ Snowflake β’ API Endpoints β
ββββββββββββββββββββββββββββββββββββββββββββ
β
βΌ
Data Sources Queried in Place
The Lambda-based connector model is particularly elegantβeach data source has a small, stateless connector deployed as a Lambda function. When a query runs, Athena invokes these functions in parallel, federating results back to the engine. This model offers scalability and low operational overhead.
Strengths
- Completely serverless (no cluster management).
- Tight integration with AWS IAM and Glue Data Catalog.
- Supports custom connectors for third-party data sources.
Limitations
- Limited optimization for cross-source joins compared to native Trino clusters.
- Connector cold-start latency (due to AWS Lambda).
- Less control over execution tuning (since AWS manages runtime).
3. Trino: The Open Engine Behind It All
Trino is the open-source distributed SQL query engine originally developed as PrestoSQL. It allows querying data from multiple systems using connectors and executes queries across clusters using massively parallel processing (MPP). Its architecture is designed for high-performance analytics over federated and large-scale datasets.
Core Architecture
ββββββββββββββββββββββββββ
β Client β
β (CLI / BI / JDBC) β
ββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββ
β Coordinator β
β Parses & optimizes β
ββββββββββββββββββββββββββ
β
βΌ
ββββββββββββββββββββββββββ
β Workers β
β Execute split tasks β
ββββββββββββββββββββββββββ
Trino excels at pushing computation down to source systems and parallelizing work. It supports dozens of connectors out of the box, including S3, Hive, Cassandra, Kafka, MySQL, PostgreSQL, and Elasticsearch.
Example: Querying Across Sources
SELECT c.customer_id, o.order_total
FROM mysql.sales.customers c
JOIN s3.orders_data.orders o
ON c.customer_id = o.customer_id
WHERE o.order_date > DATE '2025-01-01';
This query demonstrates Trinoβs federated capabilityβjoining customer data in MySQL with order data in S3 seamlessly, using a single SQL interface.
Adoption and Ecosystem
Trino is now one of the most popular open data query engines, widely adopted by Netflix, LinkedIn, Shopify, and DoorDash. Its performance and flexibility make it ideal for organizations pursuing lakehouse or data mesh architectures.
Popular integrations include:
- dbt-trino: dbt adapter for data transformations.
- Trino Gateway: Load-balancing multiple Trino clusters.
- Starburst Galaxy: Managed cloud offering built on Trino.
4. Starburst: Enterprise Trino on Steroids
Starburst emerged from the creators of Presto/Trino to offer a commercial, enterprise-grade distribution of Trino. It adds performance optimization, data governance, and enterprise security on top of the open-source base.
In 2025, Starburstβs products include:
- Starburst Galaxy: Fully managed Trino clusters in AWS, Azure, and GCP.
- Starburst Enterprise: Self-hosted Trino with enhanced caching and cost governance.
- Gravity: Built-in catalog for unified metadata management.
Enterprise Features
| Feature | Starburst | Trino OSS |
|---|---|---|
| Cluster Management | Automatic scaling and provisioning | Manual deployment |
| Data Caching | Smart caching layer with local spill | None |
| Security | Fine-grained access control, SSO, and audit logs | Basic authentication |
| Cost Governance | Query monitoring and budgeting | Limited via external tools |
Starburst integrates natively with Apache Ranger, AWS Lake Formation, and Okta for enterprise-grade access management, making it a top choice for regulated industries like finance and healthcare.
Performance Enhancements
Starburstβs smart query routing and data locality optimization significantly improve latency when federating across heterogeneous sources. Its cost-based optimizer (CBO) evaluates multiple execution plans, reducing scan time and improving join efficiency.
5. Comparative Overview
| Aspect | AWS Athena Federation | Trino | Starburst |
|---|---|---|---|
| Deployment | Serverless (AWS-managed) | Self-managed (open source) | Managed or enterprise-deployed |
| Connectors | Limited, AWS-focused | Extensive (50+) | Extended (optimized enterprise connectors) |
| Performance | Moderate (Lambda-based) | High (MPP architecture) | Very high (optimized CBO + caching) |
| Security & Governance | IAM integration | Basic roles | Advanced (Ranger, SSO, audit) |
| Use Case Fit | Quick AWS analytics | Cross-platform, open analytics | Enterprise data federation |
6. When to Use Each
- Use Athena Federation when you want to query multiple AWS-native sources with minimal setup. Itβs perfect for ad-hoc analytics, cloud cost optimization, or quick joins across S3 and RDS.
- Use Trino when you need flexibility, control, and high throughput across diverse data ecosystemsβideal for data lakehouse implementations.
- Use Starburst when enterprise governance, compliance, and performance tuning are mission-critical. Large-scale organizations like Comcast and Goldman Sachs use Starburst to power federated BI and self-service analytics.
7. Future Trends: Beyond Federation
Federation is evolving into a broader vision of unified data access. In 2025, leading vendors are integrating AI-driven query planning and cost-based optimizers that dynamically adjust query paths based on source latency and cost metrics. Expect closer integration with data catalogs (like DataHub and Amundsen) and governance layers to provide end-to-end lineage.
Meanwhile, the open-source community continues pushing Trino forward. The introduction of the Iceberg connector with ACID transactions and the Trino-on-Delta adapter has blurred the lines between federated and transactional queries.
8. Example: End-to-End Federation Architecture
βββββββββββββββββββββββββββββββββ
β BI Tools β
β (Tableau, Looker, Superset) β
ββββββββββββββββ¬βββββββββββββββββ
β JDBC/ODBC
βΌ
βββββββββββββββββββββββββββββββββ
β Trino / Starburst Layer β
β Federated Query Engine β
ββββββββββββββββ¬βββββββββββββββββ
β Connectors
βΌ
βββββββββββββββββββββββββββββββββ
β Data Sources: β
β β’ S3 / Lakehouse β
β β’ Snowflake / BigQuery β
β β’ MySQL / PostgreSQL β
β β’ Kafka Streams β
βββββββββββββββββββββββββββββββββ
This architecture exemplifies how modern analytics teams can unify access to hybrid data environments while keeping governance centralized.
Conclusion
Federated query engines represent the future of cloud-scale analytics. AWS Athena Federation offers simplicity, Trino provides flexibility and speed, and Starburst delivers enterprise-grade control and optimization. Together, they define a mature ecosystem where engineers can query anything, anywhere, using the language of SQLβwithout sacrificing performance or compliance.
Whether you are designing a lakehouse, data mesh, or unified analytics layer, these tools provide the foundation for a federated future that prioritizes accessibility, governance, and cost efficiency.
