Apache Hadoop
The Apache Hadoop software package library could be a framework that permits for the distributed process of enormous knowledge sets across clusters of computers victimization straightforward programming models. a large style of firms and organizations use Hadoop for each analysis and production. It provides a software package framework for distributed storage and process of huge knowledge victimization the MapReduce programming model.
The core of Apache Hadoop consists of a storage half, referred to as Hadoop Distributed file system (HDFS), and a process half that could be a MapReduce programming model. This approach takes advantage of knowledge neck of the woods, wherever nodes manipulate the information they need access to. this permits the knowledgeset to be processed quicker and additional with efficiency than it might be in an exceedingly additional typical mainframe design that depends on a parallel filing system wherever computation and data area unit distributed via high-speed networking.
Module 2:
Other Components of the Hadoop ecosystem
Flume for Relatime data collection
Kafka for Realtime Log analysis: Log Filtering
Spark for Realtime In memory Analytics
Advanced Spark Concepts, Spark Programming APIs, Spark RDDs
- Spark Controlling Parallelism, Partitions & Persistence
- Spark SQL
- Spark Streaming
Scala Programming Basics to Advanced
Python Introduction & Python Spark programming using PySpark
Spark for Realtime Log analysis: Analytics
- Creating and Deploying End-to-End Web Log Analysis Solution
- Realtime Log collection using Flume
- Filtering the Logs in Kafka
- Realtime Threat detection in Spark using Logs from Kafka Stream
- Click Stream analysis using Spark
Hadoop MR2 deployment(Yarn) Integration with Spark
Spark Machine Learning concepts and Lambda Architecture
Machine Learning using ML Lib
Customer Churn Modeling using Spark ML Lib
Zeppelin for Data Visualization, Spark Programming in Zeppelin using iPython Notebooks
Case studies & POC – Run Hadoop on a Medium size dataset(~5GB Data), POC can be on relatime project from your company or Duratech’s Live project
Course conclusion
Final Steps:
- Project evaluation and exit Test
- Profile Building to realign your profile Suitable for Bigdata Industry
Placement assistance & Interview handling support