Comunicati Stampa
Information Technology

Starburst Announces 100GB/second Streaming Ingest from Apache Kafka to Apache Iceberg Tables

Businesses that require data to be available for analytics in their cloud data lake with minimal delay traditionally build complex ingestion systems that require cobbling together multiple tools and writing custom software to stream data into cloud data lakes. Alternatively, these organizations may rely on incomplete solutions that only handle the ingestion process. Both approaches tend to be fragile, difficult to scale, costly to maintain, and solve only part of the problem. After the data lands in the lake, it still needs to be transformed and optimized for efficient querying—requiring even more code, pipelines, tools, and added complexity. In addition, the pressure for cost optimization across analytics functions is increasing. CIOs are looking for ways to improve their operational overhead against traditional lakehouses and legacy data warehouses while maintaining control of their data and analytics stack. 
BOSTON, (informazione.it - comunicati stampa - information technology)

Businesses that require data to be available for analytics in their cloud data lake with minimal delay traditionally build complex ingestion systems that require cobbling together multiple tools and writing custom software to stream data into cloud data lakes. Alternatively, these organizations may rely on incomplete solutions that only handle the ingestion process. Both approaches tend to be fragile, difficult to scale, costly to maintain, and solve only part of the problem. After the data lands in the lake, it still needs to be transformed and optimized for efficient querying—requiring even more code, pipelines, tools, and added complexity. In addition, the pressure for cost optimization across analytics functions is increasing. CIOs are looking for ways to improve their operational overhead against traditional lakehouses and legacy data warehouses while maintaining control of their data and analytics stack. 

"As businesses strive to perform analytics on real-time data, they seek frictionless solutions for continuous data ingestion. They also prioritize open standards like Apache Iceberg to future-proof their environments amid rapidly evolving technologies. Furthermore, reducing complexity and simplifying architectures is critical, helping organizations optimize IT investments and avoid unnecessary costs associated with integrating disparate systems," said Sanjeev Mohan , Principal and Founder of SanjMo. "Starburst's latest announcements are significant because they address these exact needs—delivering improved price performance, simplicity, and efficient elastic scaling for modern data workloads."

Starburst now enables the easy creation of fully managed ingestion pipelines for Kafka topics at a verified scale up to 100GB/second, at half the cost of alternative solutions. Configuration is completed in minutes and simply entails selecting the Kafka topic, the auto-generated table schema, and the location of the resulting Iceberg table.

Additionally, Starburst is expanding its ingestion capabilities by introducing file loading, offering customers a powerful, automated alternative to DIY or off-the-shelf solutions. This feature reads, parses, and writes records from files directly into Iceberg tables, which leverage the new ingestion capabilities to automatically optimize the tables for read performance through capabilities like compaction, snapshot retention, orphaned file removal, and statistics collection. The public preview of file loading will be available in November 2024 .

 Starburst makes auto scaling smarter in Starburst Galaxy. In environments with high concurrent users, demand for compute resources can fluctuate dynamically. The enhanced Auto Scaling intelligently monitors both active and pending queries to understand and allocate how much compute resources are needed per query up to 50% faster. Not only does enhanced Auto Scaling provision additional compute resources faster, but it also includes the ability to automatically reactivate draining worker nodes, improving the efficiency of resource utilization.

Data engineers undertake various labor-intensive data preparation tasks. Starburst Warp Speed helps automate some of those tasks. Still, as business needs evolve and teams turn to a semantic layer approach with tools like dbt, data engineers struggle to provide fast query performance, scalability, and stability for BI and dashboarding without significant overhead. The next-generation caching in Starburst Galaxy combines the power of Warp Speed's smart indexing and caching capabilities to intermediate workload results. Warp Speed will now be able to identify patterns of similar subqueries across different workloads while improving performance up to 62% compared to non-accelerated queries.

Previously, users would spend too much effort determining which queries were appropriate for different cluster types. Also, administrators weren't able to assign groups of users to a cluster via roles and privileges. With User Role Based Routing, Starburst now supports the easy allocation of resources by cluster type. Customers can programmatically route queries to the appropriate Galaxy cluster based on a predefined set of rules. Users can send all queries to a single URL, which will route the queries based on the user's role, minimizing human intervention while improving what is already industry-leading price-performance against other leading cloud data warehouses and lakehouses.

"With our new ingestion capabilities to Iceberg, customers don't have to worry about how fast or how much data they need to land in their data lake. At 100GB/second, Galaxy's ingestion can handle the scale of the most demanding use cases. Because it is so easy to configure and cost-effective to operate, customers don't have to artificially limit the number of up-to-date, fresh tables in their lake, enabling them to make the most informed business decisions," said Tobias Ternstrom , Starburst's Chief Product Officer.

For more information, read Starburst's Icehouse launch blog .
Download an image of the Starburst Open Data Lakehouse here .

Starburst, the Open Hybrid Lakehouse, is the leading end-to-end data platform to securely access, analyze, and share data for analytics and AI across hybrid, on-premises, and multi-cloud environments. As the leaders in Trino, a modern open-source SQL engine, Starburst empowers the most data-intensive and security-conscious organizations like Comcast, Halliburton, Vectra, EMIS Health, and 7 of the top 10 global banks to democratize data access, enhance analytics performance, and improve architecture optionality. With the Open Hybrid Lakehouse from Starburst, enterprises globally can easily discover and use all their relevant business data to power new applications and analytics across risk mitigation, supply chain, customer experiences, product optimization, streaming, and more.   

For additional information, please visit https://www.starburst.io/

View original content: https://www.prnewswire.co.uk/news-releases/starburst-announces-100gbsecond-streaming-ingest-from-apache-kafka-to-apache-iceberg-tables-302285181.html

Ufficio Stampa
 PR Newswire (Leggi tutti i comunicati)
209 - 215 Blackfriars Road
LONDON United Kingdom
Allegati
Non disponibili