AI-powered search & chat for Data / Computer Science Students

Big Data File Formats Explained

 Towards Data Science

For data lakes, in the Hadoop ecosystem, HDFS file system is used. However, most cloud providers have replaced it with their own deep storage system such as S3 or GCS. When using deep storage…

Read more at Towards Data Science

Big Data File Formats Explained Using Spark Part 1

 Analytics Vidhya

When dealing with large datasets, using traditional CSV or JSON formats to store data is extremely ineffecient in terms of query speed and storage costs.

Read more at Analytics Vidhya

Which Data Format to Use For Your Big Data Project?

 Towards Data Science

Choosing the right data format is crucial in Data Science projects, impacting everything from data read/write speeds to memory consumption and interoperability. This article explores seven popular ser...

Read more at Towards Data Science

Comparing Performance of Big Data File Formats: A Practical Guide

 Towards Data Science

Parquet vs ORC vs Avro vs Delta Lake Photo by Viktor Talashuk on Unsplash The big data world is full of various storage systems, heavily influenced by different file formats. These are key in nearly ...

Read more at Towards Data Science

A Comprehensive Guide to File Formats in Data Engineering

 Python in Plain English

Understanding the Pros and Cons of using CSV, JSON, Parquet, Avro, and ORC file format in Data Engineering. Photo by Mika Baumeister on Unsplash Introduction In big data and data engineering, choosing...

Read more at Python in Plain English

Types of Data

 Analytics Vidhya

John Tukey in his 1962 paper called “The Future of Data Analysis” proposed a new scientific discipline called ‘Data Analysis’, this was one of the important work in the foundation of Data Science…

Read more at Analytics Vidhya

Data Lake -Comparing Performance of Known Big Data Formats

 Towards Data Science

For the past several years, I have been using all kinds of data formats in Big Data projects. During this time I have strongly favored one format over other — my failures have taught me a few…

Read more at Towards Data Science

Big Data Vs Small Data/Traditional Data

 Analytics Vidhya

Term Big data catching a lot of attention these days. So, Folks, I am trying to give you a brief summary of this. Big Data is simply data that makes our Excel crash. It’s actually when we have to…

Read more at Analytics Vidhya

Big Data 3 V’s and 5 V’s

 Analytics Vidhya

The big data stands on mainly 5 pillars are Volume, Velocity, Variety, Veracity and Value.These pillars are briefly describes in 3 V's and 5 V's architectures.

Read more at Analytics Vidhya

File Formats

 Codecademy

File formats determine how information or data, such as text and images, are created, stored, and read. Each file format has a unique extension. For example, a CSV (Comma Separated Values) file has th...

Read more at Codecademy

Data Loading, Storage, and File Formats

 Python for Data Analysis Book

Reading data and making it accessible (often called data loading ) is a necessary first step for using most of the tools in this book. The term parsing is also sometimes used to describe loading text ...

Read more at Python for Data Analysis Book

Data Loading, Storage, and File Formats

 Python for Data Analysis Book

Reading data and making it accessible (often called data loading ) is a necessary first step for using most of the tools in this book. The term parsing is also sometimes used to describe loading text ...

Read more at Python for Data Analysis Book

How Big is BIGDATA?

 Becoming Human: Artificial Intelligence Magazine

How big is BIGDATA? No fuss, straight talk. One of the most popular catchphrases emphasizing the importance of data is: data is the new oil. During the COVID-19 pandemic of the last two years, the im...

Read more at Becoming Human: Artificial Intelligence Magazine

File Formats

 The Python Standard Library

File Formats The modules described in this chapter parse various miscellaneous file formats that aren’t markup languages and are not related to e-mail. csv — CSV File Reading and Writing Module Conte...

Read more at The Python Standard Library

Big Data

 Codecademy

Big data involves working with and developing insights from large datasets. The key distinctions between regular data and big data are volume, velocity, and variety. Generally, big data is more exten...

Read more at Codecademy

Visual Guide to Big Data: 10 Differences vs. Data and Technologies

 Towards AI

Learn about the 10 differences between data and big data, 4 high level steps taken in the big data implementation process, challenges in… Continue reading on Towards AI

Read more at Towards AI

Data Types for Data Sciences

 Towards Data Science

Big Data and Data Science is now in everyone’s mind. But not everyone clearly understands that not all data is the same, and has a clear vision of the types of applications and technologies available…...

Read more at Towards Data Science

Data Types in Data Science

 Towards Data Science

There are a lot of engineers who have never been involved in the field of statistics or data science. But in order to build data science pipelines or rewrite produced code by data scientists to an…

Read more at Towards Data Science

Data Types in Data Science

 Python in Plain English

There are a lot of engineers who have never been involved in the field of statistics or data science. But to build data science pipelines or rewrite produced code by data scientists to an adequate…

Read more at Python in Plain English

Parquet file format — everything you need to know!

 Towards Data Science

New data flavors require new ways of storing them! Learn all you need to know about the Parquet file format Continue reading on Towards Data Science

Read more at Towards Data Science

New in Hadoop: You should know the Various File Format in Hadoop.

 Towards Data Science

A few weeks ago, I wrote an article about Hadoop and talked about the different parts of it. And how it plays an essential role in data engineering. In this article, I’m going to give a summary of…

Read more at Towards Data Science

Big Data Architecture Concepts

 Analytics Vidhya

With the advancement of technology, the volumes of data organisation’s collect have increased exponentially. A big data architecture is used to ingest, process and analyse data that is too…

Read more at Analytics Vidhya

Beginner guideline to big data with apache spark

 Analytics Vidhya

As the name says, big data referred to as the massive amount of data that cannot be stored and processed with the traditional computer system. But how do we define a dataset as big data? It depends…

Read more at Analytics Vidhya

Data types

 NumPy user guide

See also Data type objects Array types and conversions between types NumPy supports a much greater variety of numerical types than Python does. This section shows which are available, and how to modif...

Read more at NumPy user guide