hive

Apache Hive is a data warehouse software built on top of Apache Hadoop, designed for data summarization, querying, and analysis. It provides a SQL-like interface, known as HiveQL, allowing users to perform ad-hoc queries on large datasets stored in various file systems, such as HDFS. Originally developed by Facebook to manage vast amounts of data, Hive simplifies the complexities of MapReduce by automatically converting queries into efficient MapReduce jobs. Its features include support for different storage types, data modeling, and built-in User Defined Functions (UDFs), making it a powerful tool for data analysis and processing in big data environments.

Getting Started With Hive

Towards Data Science

The aim of this blog post is to help you get started with Hive using Cloudera Manager. Apache Hive is a data warehouse software project built on top of Apache Hadoop for providing data summarization…

Introduction to Hive

Towards Data Science

This article focuses on Hive, it’s features, use cases, and Hive queries. Since a lot of DML and DDL queries are very similar to SQL, it can act as a foundation or building block for anyone new to…

Apache Hive Hacks for a Data Scientist: Part II

Towards AI

Apache Hive is a SQL engine for manipulating big data via SQL commands. If you want to know, more about Hive and why Hive is for Data… Continue reading on Towards AI

Limitation of Hive Data Validation

Analytics Vidhya

In a big data world, hive is one of the most popular data warehouse tool. Though it comes with some convenient and flexibility features including SQL liked data manipulation language or easily data…

Is it still good to learn Apache Hive?

Analytics Vidhya

As the big data world moves towards Apache Spark, Databricks, or Cloud-based Data Warehouses like Amazon RedShift / Snowflake, the general conception is, Hive is an obsolete technology to learn.

Working with Hive using AWS S3 and Python

Towards Data Science

In this article, I’m going to share my experience of maintaining a Hive schema. This will be useful to the freshers who are willing to step into Big Data technologies. Mainly this will describe how…

Ultimate Hive Tutorial: Essential Guide to Big Data Management and Querying

Towards Data Science

Introduction Navigating the labyrinth of big data can be a daunting endeavor, especially when the paths are paved with complex terminology and intricate processes. This is particularly true for Apache...

Build a Simple To-Do App Using Hive Database

Better Programming

Is Hive the best local storage database? Let’s find out. Continue reading on Better Programming

Must-Know Techniques for Handling Big Data in Hive

Towards Data Science

In most tech companies, data teams must possess strong capabilities to manage and process big data. As a result, familiarity with the Hadoop ecosystem is essential for these teams. Hive Query Language...

How To Create Your Own Hive SerDe — Hive Custom Data Serialize-Deserialize Mechanism

Analytics Vidhya

As mentioned in my earlier blog post, SerDe is an interface which hive use to deserialize (read data from table’s hdfs location then converting it to java object) and serialize data (convert a Java…

Shared External Hive Metastore with Azure Databricks and Synapse Spark Pools

Towards Data Science

To help structure your data in a data lake you can register and share your data as tables in a Hive metastore. A Hive metastore is a database that holds metadata about our data, such as the paths to…

I Made a Version of Honey That Doesn’t SCREW Creators

ArjanCodes

✅ Learn how to build robust and scalable software architecture: https://arjan.codes/checklist. Honey might save you money, but it’s not so sweet for creators. In this video, I build Kale—an ethical, o...