Big Data Hadoop Online

  • What is BigData
  • Hadoop Overview
  • Introduction to HDFS
  • HDFS Architecture
  • MapReduce v1
  • MapReduce v2/YARN
  • HBase
  • Hive
  • Pig
  • Flume
  • Sqoop


This course brings together several key information technologies used in manipulating, storing, and analyzing big data. We look at the basic tools for statistical analysis, R, and key methods used in machine learning. We review MapReduce techniques for parallel processing and Hadoop, an open source framework that allow us to cheaply and efficiently implement MapReduce on Internet scale problems. We touch on related tools that provide SQL-like access to unstructured data: Pig and Hive. We analyze so-called NoSQL storage solutions exemplified by HBase for their critical features: speed of reads and writes, data consistency, and ability to scale to extreme volumes. We examine memory resident databases and streaming technologies which allow analysis of data in real time. We work with the public cloud as unlimited resource for big data analytics. Students gain the ability to design highly scalable systems that can accept, store, and analyze large volumes of unstructured data in batch mode and/or real time



Self Paced Training