Database definition

What is SQL on Hadoop? – Definition of

SQL-on-Hadoop is a class of analytical application tools that combine SQL-question style with newer HadoopName data frame elements.

By supporting familiar SQL queries, SQL-on-Hadoop enables a wider group of enterprise developers and business analysts to work with Hadoop on basic computing groups. Since SQL was originally developed for relational databases, it needs to be modified for the Hadoop 1 model, which uses the Hadoop Distributed File System and Map-Reduce or the Hadoop 2 model, which can work without HDFS or Map-Reduce.

The various means of running SQL in Hadoop environments can be divided into (1) connectors that translate SQL into a MapCollapse format; (2) “push down” systems that forgo batch-oriented MapReduce and run SQL in Hadoop clusters; and (3) systems that distribute SQL work between MapReduce-HDFS clusters or raw HDFS clusters, depending on the workload.

One of the first efforts to combine SQL and Hadoop resulted in the Hive data warehouse, which featured HiveQL software to translate SQL-like queries into MapReduce tasks. Other tools that help support SQL-on-Hadoop include BigSQL, Drill, Hadapt, Hawq, H-SQL, Impala, JethroData, Polybase, Presto, Shark (Hive on Spark), Spark, Splice Machine, Stinger, and Tez (Hive on Tez).