AI-powered search & chat for Data / Computer Science Students

Apache Flink Series 1 — What is Apache Flink

 Analytics Vidhya

In this post, I will try to explain what is Apache Flink, what is used for, and features of Apache Flink. Before pass the “use cases for Apache Flink”, let me point to the what does the stateful…

Read more at Analytics Vidhya

The Foundations for Building an Apache Flink Application

 Analytics Vidhya

Our monolith solution does not cope with the increased load of incoming data, and thus it has to evolve. This is the time for the next generation of our product. Stream processing is the new data…

Read more at Analytics Vidhya

Apache Flink Series 4 — DataStream API

 Analytics Vidhya

When we look at the Flink as a software, Flink is built as layered system. And one of the layer is DataStream API which places top of Runtime Layer. close()= is an finalization method. It is called…

Read more at Analytics Vidhya

Apache Flink Series 6 —Reading the Log files

 Analytics Vidhya

In this post, we will look at the log files (both for TaskManager and JobManager) and try to understand what is going on Flink cluster. Actually this post will be about the step 3 for creating sample…...

Read more at Analytics Vidhya

An Introduction to Stream Processing with Apache Flink

 Towards Data Science

An Introduction to Stream Processing with Apache Flink

Read more at Towards Data Science

Flink Checkpointing and Recovery

 Towards Data Science

Apache Flink is a popular real-time data processing framework. It’s gaining more and more popularity thanks to its low-latency processing at extremely high throughput in a fault-tolerant manner…

Read more at Towards Data Science

Apache BEAM + Flink Cluster + Kubernetes + Python

 Python in Plain English

Without going on about all the benefits of BEAM such as open-source and its APIs that alleviates some pain with an added level of abstraction we’ll get downright to implementation. If you have been…

Read more at Python in Plain English

Building a realtime dashboard with Flink: The Backend

 Towards Data Science

With the demand for “realtime” low latency data growing more data scientists will likely have to become familiar with streams. One good place to start is Apache Flink. Flink is a distributed…

Read more at Towards Data Science

Running Apache Flink with RocksDB on Azure Kubernetes Service

 Towards Data Science

Recently I was looking into how to deploy an Apache Flink cluster that uses RocksDB as the backend state and found a lack of detailed documentation on the subject. I was able to piece together how to…...

Read more at Towards Data Science

Learn Flink SQL — The Easy Way

 Analytics Vidhya

Flink is almost the de facto standard streaming engine today. Flink SQL is the recommended approach to use Flink. But streaming sql is not the same as the traditional batch sql, you have to learn…

Read more at Analytics Vidhya

Apache Flume

 Towards Data Science

Trickle-feed unstructured data into HDFS using Apache Flume

Read more at Towards Data Science

Combine and Preprocess Your Heterogeneous Data for Analytics with Apache Flink

 Towards Data Science

Data-driven decisions and applications are the core of future businesses. Getting insights from your data means cost reduction, efficiency increase, and strategic advantages. More and more companies…

Read more at Towards Data Science

A Guide to Apache Airflow (and Docker)

 Level Up Coding

Member-only story A Guide to Apache Airflow (and Docker) Thomas Reid · Follow Published in Level Up Coding · 13 min read · Just now -- Share Part 2, Using Airflow This is the second of a two-part seri...

Read more at Level Up Coding

PyFlink - How To Create a Table From A CSV Source

 Towards Data Science

In this first tutorial on Apache Flink, learn how to import data into a table from a CSV source, using the Python Table API. Continue reading on Towards Data Science

Read more at Towards Data Science

Here’s how Flink stores your State

 Towards Data Science

If you have every wondered, what happens once you update a value in your Flink state, here’s the answer. A low-level view of Flink’s High level state APIs.

Read more at Towards Data Science

Integrating Flask and Streamlit

 Python in Plain English

A Guide to Creating Interactive Web Pages and Embedding Them Into Existing Websites Continue reading on Python in Plain English

Read more at Python in Plain English

Apache Airflow

 Towards Data Science

Airflow was born out of Airbnb’s problem of dealing with large amounts of data that was being used in a variety of jobs. To speed up the end-to-end process, Airflow was created to quickly author…

Read more at Towards Data Science

Apache Thrift

 Software Architecture with C plus plus

Apache Thrift is an interface description language and binary communication protocol. It is used as an RPC method that allows creating distributed and scalable services built in a variety of languages...

Read more at Software Architecture with C plus plus

How to Install Apache Airflow With Docker

 Level Up Coding

The 8-Steps Guide tested on Windows, Ubuntu, and Mac OS X Continue reading on Level Up Coding

Read more at Level Up Coding

Getting started with Apache Airflow

 Towards Data Science

In this post, I am going to discuss Apache Airflow, a workflow management system developed by Airbnb. Earlier I had discussed writing basic ETL pipelines in Bonobo. Bonobo is cool for write ETL…

Read more at Towards Data Science

Setting Up Apache Airflow with Docker-Compose in 5 Minutes

 Towards Data Science

Create a development environment and start building DAGs Continue reading on Towards Data Science

Read more at Towards Data Science

How to connect Snowflake with Airflow on Docker in order to build a data extraction pipeline for…

 Analytics Vidhya

Apache airflow is a great tool for orchestrating workflows and data processing pipelines that can be used in several cloud providers as GCP, AWS, and Azure among others more, but at this moment we…

Read more at Analytics Vidhya

Introduction to Apache Iceberg

 Towards Data Science

Throughout the years, Apache Iceberg has been open-sourced by Nexflix and many other companies such as SnowFlake and Dremio have decided to invest in the project. Each Apache Iceberg table follows a 3...

Read more at Towards Data Science

Apache Airflow — Part 1

 Analytics Vidhya

Every programmer loves automating their stuff. Learning and using any automation tool is fun for us. A few months ago, I came across a wonderful open source project called apache-airflow. I have…

Read more at Analytics Vidhya