Big Data

Big Data

(307 reviews)
Bigdata

Big Data is data sets that are so big and complex that traditional data-processing application software are inadequate to deal with them. Big data challenges include capturing data, data storage, data analysis, search, sharing, transfer, visualization, querying, updating, information privacy and data source. There are a number of concepts associated with big data: originally there were concepts volume, variety, velocity.Other concepts later attributed with big data are veracity (i.e., how much noise is in the data and value.

Data sets grow rapidly - in part because they are increasingly gathered by cheap and numerous information-sensing Internet of things devices such as mobile devices, aerial (remote sensing), software logs, cameras, microphones, radio-frequency identification (RFID) readers and wireless sensor networks.The world's technological per-capita capacity to store information has roughly doubled every 40 months since the 1980s;as of 2012, every day 2.5 exabytes (2.5×1018) of data are generated.Based on an IDC report prediction, the global data volume will grow exponentially from 4.4 zettabytes to 44 zettabytes between 2013 and 2020.By 2025, IDC predicts there will be 163 zettabytes of data.One question for large enterprises is determining who should own big-data initiatives that affect the entire organization.

Course Features

  • Students 307
  • Duration4/6 week
  • Skill levelall
  • LanguageEnglish
    • DAY

      DESCRIPTION

      1

      INTRODUCTION TO BIG DATA

      • Introduction and relevance

      • Uses of Big Data analytics in various industries like Telecom, E- commerce, Finance and Insurance etc.

      • Problems with Traditional Large-Scale System

       

      2 and 3

      HADOOP (BIG DATA) ECOSYSTEM

      • Motivation for Hadoop

      • Different types of projects by Apache

      • Role of projects in the Hadoop Ecosystem

      • Key technology foundations required for Big Data

      • Limitations and Solutions of existing Data Analytics Architecture

      • Comparison of traditional data management systems with Big Data management systems

      • Evaluate key framework requirements for Big Data analytics

      • Hadoop Ecosystem & Hadoop 2.x core components

      • Explain the relevance of real-time data

      • Explain how to use big and real-time data as a Business planning tool

       

      4 and 5

      HADOOP CLUSTER -ARCHITECUTRE - CONFIGURATION FILES

      • Hadoop Master-Slave Architecture

      • The Hadoop Distributed File System - Concept of data storage

      • Explain different types of cluster setups(Fully distributed/Pseudo etc)

      • Hadoop cluster set up - Installation

      • Hadoop 2.x Cluster Architecture

      • A Typical enterprise cluster – Hadoop Cluster Modes

      • Understanding cluster management tools like Cloudera manager/Apache ambari

       

      6 and 7

      HADOOP CORE COMPONENTS - HDFS & MAPREDUCE(YARN)

      • HDFS Overview & Data storage in HDFS

      • Get the data into Hadoop from local machine(Data Loading Techniques) - vice versa

      • Map Reduce Overview (Traditional way Vs. MapReduce way)
      Concept of Mapper & Reducer

      • Understanding MapReduce program Framework

      • Develop MapReduce Program using Java (Basic)

      • Develop MapReduce program with streaming API) (Basic)

       

      8, 9 and 10

      DATA INTEGRATION USING SQOOP & FLUME

      • Integrating Hadoop into an Existing Enterprise

      • Loading Data from an RDBMS into HDFS by Using Sqoop

      • Managing Real-Time Data Using Flume

      • Accessing HDFS from Legacy Systems

       

      11, 12 and 13

      DATA ANALYSIS USING PIG

      • Introduction to Data Analysis Tools

      • Apache PIG - MapReduce Vs Pig, Pig Use Cases

      • PIG’s Data Model

      • PIG Streaming

      • Pig Latin Program & Execution

      • Pig Latin : Relational Operators, File Loaders, Group Operator, COGROUP Operator, Joins and COGROUP, Union, Diagnostic

      • Operators, Pig UDF

      • Writing JAVA UDF’s

      • Embedded PIG in JAVA

      • PIG Macros

      • Parameter Substitution

      • Use Pig to automate the design and implementation of MapReduce applications

      • Use Pig to apply structure to unstructured Big Data

       

      14, 15 and 16

      DATA ANALYSIS USING HIVE

      • Apache Hive - Hive Vs. PIG - Hive Use Cases

      • Discuss the Hive data storage principle

      • Explain the File formats and Records formats supported by the Hive environment

      • Perform operations with data in Hive

      • Hive QL: Joining Tables, Dynamic Partitioning, Custom Map/Reduce Scripts

      • Hive Script, Hive UDF

      • Hive Persistence formats

      • Loading data in Hive - Methods

      • Serialization & Deserialization

      • Handling Text data using Hive

      • Integrating external BI tools with Hadoop Hive

      17 and 18

      DATA ANALYSIS USING IMPALA

      • Introduction to Impala & Architecture

      • How Impala executes Queries and its importance

      • Hive vs. PIG vs. Impala

      • Extending Impala with User Defined functions

       

      19

      INTRODUCTION TO OTHER ECOSYSTEM TOOLS

      • NoSQL database - Hbase
      Introduction Oozie

       

      20 and 21

      SPARK: INTRODUCTION

      • Introduction to Apache Spark

      • Streaming Data Vs. In Memory Data

      • Map Reduce Vs. Spark

      • Modes of Spark

      • Spark Installation Demo

      • Overview of Spark on a cluster

      • Spark Standalone Cluster

       

      22 and 23

      SPARK: SPARK IN PRACTICE

      • Invoking Spark Shell

      • Creating the Spark Context

      • Loading a File in Shell

      • Performing Some Basic Operations on Files in Spark Shell

      • Caching Overview

      • Distributed Persistence

      • Spark Streaming Overview(Example: Streaming Word Count)

       

      24 and 25

      SPARK: SPARK MEETS HIVE

      • Analyze Hive and Spark SQL Architecture

      • Analyze Spark SQL

      • Context in Spark SQL

      • Implement a sample example for Spark SQL

      • Integrating Hive and Spark SQL

      • Support for JSON and Parquet File Formats Implement Data Visualization in Spark

      • Loading of Data

      • Hive Queries through Spark

      • Performance Tuning Tips in Spark

      • Shared Variables: Broadcast Variables & Accumulators

      26

      SPARK STREAMING

      • Extract and analyze the data from twitter using Spark streaming

      • Comparison of Spark and Storm – Overview

       

      27

      SPARK GRAPHX

      • Overview of GraphX module in spark

      • Creating graphs with GraphX

       

       

      28,29 and 30

      IMPLEMENT MACHINE LEARNING USING SPARK

      • Brief introduction to Machine learning framework

      • Introduction to Machine Learning & Predictive Modeling

      • Types of Business problems - Mapping of Techniques - Regression vs. classification vs. segmentation vs. Forecasting

      • Major Classes of Learning Algorithms -Supervised vs Unsupervised Learning

      • Different Phases of Predictive Modeling (Data Pre-processing, Sampling, Model Building, Validation)

      • Overfitting (Bias-Variance Trade off) & Performance Metrics

      • Feature engineering & dimension reduction

      • Concept of optimization & cost function

      • Concept of gradient descent algorithm

      • Concept of Cross validation(Bootstrapping, K-Fold validation etc)

      • Model performance metrics (R-square, RMSE, MAPE, AUC, ROC curve, recall, precision, sensitivity, specificity, confusion metrics)

      • Implement some of the ML algorithms using Spark MLLib (ML is not covered in detail in this course)

       

      Last 15 Days participants will guided in the projects

      CASE STUDIES

      1. Data storage using HDFS

      This case study aims to give practical experience on Storing & managing different types of data(Structured/Semi/Unstructured) - both compressed and un-compressed.

      2. Processing data using map reduce

      This case study aims to give practical experience on understanding & developing Map reduce programs in JAVA & R and running streaming job in terminal & Ecclipse

      3. Data integration using sqoop & flume

      This case study aims to give practical experience on Extracting data from Oracle and load into HDFS and vice versa also Extracting data from twitter and store in HDFS

      4. Data Analysis using Pig

      This case study aims to give practical experience on complete data analysis using pig and create and usage of user defined function (UDF)

      5. Data Analysis using Hive

      This case study aims to give practical experience on complete data analysis using Hive and create and usage of user defined function (UDF)

      6. Hbase-NoSql data base creation

      This case study aims to give practical experience on Data table/cluster creation using Hbase

       

       

       

       

       

       

      Final Project :

       

      Project #1:

      Industry: Stock Market

      Problem Statement

      TickStocks, a small stock trading organization, wants to build a Stock Performance System. You have been tasked to create a solution to predict good and bad stocks based on their history. You also have to build a customized product to handle complex queries such as calculating the covariance between the stocks for each month.

       

      Project #2:

      Industry: Health-Care

      Problem statement

      MobiHeal is a mobile health organization that captures patient’s physical activities, by attaching various sensors on different body parts. These sensors measure the motion of diverse body parts like acceleration, the rate of turn, magnetic field orientation, etc. You have to build a system for effectively deriving information about the motion of different body parts like chest, ankle, etc.

       

      Project #3:

      Industry: Social Media

      Problem Statement:

      Socio-Impact is a social media marketing company which wants to expand its business. They want to find the websites which have a low rank web page. You have been tasked to find the low-rated links based on the user comments, likes etc. 

       

      Project #4:

      Industry: Retail

      Problem Statement:

      A retail company wants to enhance their customer experience by analysing the customer reviews for different products. So that, they can inform the corresponding vendors and manufacturers about the product defects and shortcomings. You have been tasked to analyse the complaints filed under each product & the total number of complaints filed based on the geography, type of product, etc. You also have to figure out the complaints which have no timely response.

       

      Project #5:

      Industry: Tourism

      Problem Statement:

      A new company in the travel domain wants to start their business efficiently, i.e. high profit for low TCO. They want to analyse & find the most frequent & popular tourism destinations for their business. You have been tasked to analyse top tourism destinations that people frequently travel & top locations from where most of the tourism trips start. They also want you to analyze & find the destinations with costly tourism packages.

       

       

       

Course Name: Big Data


Student

8,500

6000

14,500

Professional

9,500

6,500

16,000


To enroll in a course:

1. Click Registration Form.


2. Fill each and every details in the form and submit it.
3. After successful registration you will get a confirmation mail from Teach Tech Services.

To deposit your course fee

1. Click on Pay Now.


2. After successful payment our team member will contact you within 3 hours.

Certification:

All participants will get ISO certified Certificate of the course from Teach Tech Services in association with iSmriti, IIT Kanpur

This certificate is globally accepted.

lpucampus
JALANDHAR (LPU CMAPUS)

Address :Jalandhar - Delhi G.T. Road, Phagwara, Punjab 144411

Phone : +91-9023647226

94.75 average based on 307 ratings

5 Star
266 reviews
4 Star
21 reviews
3 Star
11 reviews
2 Star
3 reviews
1 Star
6 reviews

Relative Courses

C & C++

C & C++

207 students
(0 review)
Java

Java

147 students
(0 review)
Digital Marketing

Digital Marketing

167 students
(0 review)
Angular

Angular

195 students
(0 review)
Python

Python

289 students
(0 review)
Mongo DB

Mongo DB

213 students
(0 review)
Node JS

Node JS

184 students
(0 review)
Internet of Things

Internet of Things

192 students
(218 review)
MS Excel

MS Excel

512 students
(218 review)
Tableau

Tableau

118 students
(218 review)
BigData

BigData

307 students
(218 review)
Ethical Hacking

Ehical Hacking

254 students
(218 review)
Data Science

Data Science

187 students
(132 review)
CSS

Basic CSS

77 students
(0 review)
PHP

Basic PHP

139 students
(0 review)
Javascript

Introduction to Javascript

56 students
(0 review)
Android

Android App Development

369 students
(0 review)
PhotoShop

Advanced Photoshop Skills

67 students
(0 review)
Wordpress

WordPress for Beginners

27 students
(0 review)