Data Science

Course Details

Data science it is software used for distributing and processing the large set of data into the cluster of computers. This Course is designed to Master yourself in the Data Science Techniques and Upgrade your skill set to the next level to sustain your career in ever changing the software Industry. This Course covers from the basics of Data Science to Big Data Hadoop, Python, Apache Spark etc.

  • Objective of the Course: Job Secured Course
  • Course Duration: 120 days daily 1 hour
  • Share

Course Content

Data Science Training Overview
  1. Objectives of the Course
  2. Pre-Requites of the Course
  3. Course Duration
Data Science Course Content
  1. Introduction to Data Science
  2. Data
  3. Big Data
  4. Data Science Deep Dive
  5. Intro to R Programming
  6. R Programming Concepts
  7. Data Manipulation in R
  8. Data Import Techniques in R
  9. Exploratory Data Analysis (EDA) using R
  10. Data Visualization in R
  11. HADOOP
    1. Big Data and Hadoop Introduction
    2. Understand Hadoop Cluster Architecture
    3. Map Reduce Concepts
    4. Advanced Map Reduce Concepts
  12. Hadoop 2.0 and YARN
  13. PIG
  14. HIVE
    1. Module-9
  15. HBASE
    1. Module-11
  16. SQOOP
  17. Flume and Oozie
  18. Projects
  19. Project in Healthcare Domain
  20. Project in Finance/Banking Domain
  21. Spark
    1. Apache Spark
    2. Introduction to Scala
    3. Spark Core Architecture
    4. Spark Internals
    5. Spark Streaming
  22. Statistics + Machine Learning
    1. Statistics
      1. What is Statistics?
  23. Machine Learning
    1. Machine Learning Introduction
  24. Python
    1. Getting Started with Python
    2. Sequences and File Operations
  25. Deep Dive – Functions Sorting Errors and Exception Handling
  26. Regular Expressionist’s Packages and Object – Oriented Programming in Python
  27. Debugging, Databases and Project Skeletons
  28. Machine Learning Using Python
  29. Supervised and Unsupervised learning
  30. Algorithm
  31. Application Example
  32. Scikit and Introduction to Hadoop
  33. Hadoop and Python
  34. Python Project Work
Introduction to Data Science
  1. Need for Data Scientists
  2. Foundation of Data Science
  3. What is Business Intelligence
  4. What is Data Analysis, Data Mining, and Machine Learning
  5. Analytics vs Data Science
  6. Value Chain
  7. Types of Analytics
  8. Lifecycle Probability
  9. Analytics Project Lifecycle Data
  10. Basis of Data Categorization
  11. Types of Data
  12. Data Collection Types
  13. Forms of Data and Sources
  14. Data Quality, Changes and Data Quality Issues, Quality Story
  15. What is Data Architecture
  16. Components of Data Architecture
  17. OLTP vs OLAP
  18. How is Data Stored?
Big Data
  1. What is Big Data?
  2. 5 Vs of Big Data
  3. Big Data Architecture, Technologies, Challenge and Big Data Requirements
  4. Big Data Distributed Computing and Complexity
  5. Hadoop
  6. Map Reduce Framework
  7. Hadoop Ecosystem
Data Science Deep Dive
  1. What is Data Science?
  2. Why are Data Scientists in demand?
  3. What is a Data Product
  4. The growing need for Data Science
  5. Large-Scale Analysis Cost vs Storage
  6. Data Science Skills
  7. Data Science Use Cases and Data Science Project Life Cycle & Stages
  8. Map-Reduce Framework
  9. Hadoop Ecosystem
  10. Data Acquisition
  11. Where to source data
  12. Techniques
  13. Evaluating input data
  14. Data formats, Quantity and Data Quality
  15. Resolution Techniques
  16. Data Transformation
  17. File Format Conversions
  18. Anonymization
Intro to R Programming
  1. Introduction to R
  2. Business Analytics
  3. Analytics concepts
  4. The importance of R in analytics
  5. R Language community and eco-system
  6. Usage of R in industry
  7. Installing R and other packages
  8. Perform basic R operations using command line
  9. Usage of IDE R Studio and various GUI
R Programming Concepts
  1. The datatypes in R and its uses
  2. Built-in functions in R
  3. Subsetting methods
  4. Summarize data using functions
  5. Use of functions like head(), tail(), for inspecting data
  6. Use-cases for problem solving using R
  7. Data Manipulation in R
  8. Various phases of Data Cleaning
  9. Functions used in Inspection
  10. Data Cleaning Techniques
  11. Uses of functions involved
  12. Use-cases for Data Cleaning using R
  13. Data Import Techniques in R
  14. Import data from spreadsheets and text files into R
  15. Importing data from statistical formats
  16. Packages installation for database import
  17. Connecting to RDBMS from R using ODBC and basic SQL queries in R
  18. Web Scraping
  19. Other concepts on Data Import Techniques
Exploratory Data Analysis (EDA) using R
  1. What is EDA?
  2. Why do we need EDA?
  3. Goals of EDA
  4. Types of EDA
  5. Implementing of EDA
  6. Boxplots, cor() in R
  7. EDA functions
  8. Multiple packages in R for data analysis
  9. Some fancy plots
  10. Use-cases for EDA using R
Data Visualization in R
  1. Storytelling with Data
  2. Principle tenets
  3. Elements of Data Visualization
  4. Infographics vs Data Visualization
  5. Data Visualization & Graphical functions in R
  6. Plotting Graphs
  7. Customizing Graphical Parameters to improvise the plots
  8. Various GUIs
  9. Spatial Analysis
  10. Other Visualization concepts
  11. Hadoop-Online-Training
HADOOP
Big Data and Hadoop Introduction
    What is Big Data and Hadoop?
  • Challenges of Big Data
  • Traditional approach Vs Hadoop
  • Hadoop Architecture
  • Distributed Model
  • Block structure File System
  • Technologies supporting Big Data
  • Replication
  • Fault Tolerance
  • Why Hadoop?
  • Hadoop Eco-System
  • Use cases of Hadoop
  • Fundamental Design Principles of Hadoop
  • Comparison of Hadoop Vs RDBMS
  • Understand Hadoop Cluster Architecture
    1. Hadoop Cluster and Architecture
    2. 5 Daemons
    3. Hands-On Exercise
    4. Typical Workflow
    5. Hands-On Exercise
    6. Writing Files to HDFS
    7. Hands-On Exercise
    8. Reading Files from HDFS
    9. Hands-On Exercise
    10. Rack Awareness
    11. Before Map Reduce
    12. Map Reduce Concepts
    13. Map Reduce Concepts
    14. What is Map Reduce?
    15. Why Map Reduce?
    16. Map Reduce in real world and Map Reduce Flow
    17. What is Mapper, Reducer, and Shuffling?
    18. Word Count Problem
    19. Hands-On Exercise
    20. Distributed Word Count Flow and Solution
    21. Log Processing and Map Reduce
    22. Hands-On Exercise
    23. Advanced Map Reduce Concepts
    24. What is Combiner?
    25. Hands-On Exercise
    26. What is Partitioner?
    27. Hands-On Exercise
    28. What is Counter?
    29. Hands-On Exercise
    30. InputFormats/Output Formats
    31. Hands-On Exercise
    32. Map Join using MR
    33. Hands-On Exercise
    34. Hands-On Exercise
    35. Reduce Join using MR
    36. MR Distributed Cache
    37. Hands-On Exercise
    38. Using sequence files & images with MR
    39. Hands-On Exercise
    40. Planning for Cluster & Hadoop 2.0 Yarn
    41. Configuration of Hadoop
    42. Choosing Right Hadoop Hardware and Software?
    43. Hadoop Log Files?
    Hadoop 2.0 and YARN
    1. Hadoop 1.0 Challenges
    2. NN Scalability, SPOF, and HA
    3. Job Tracker Challenges
    4. Hadoop 2.0 New Features
    5. Hadoop 2.0 Cluster Architecture & Federation
    6. Hadoop 2.0 HA
    7. Yarn & Hadoop Ecosystem
    8. Yarn MR Application Flow
    PIG
    1. Introduction to Pig
    2. What Is Pig?
    3. Pig’s Features & Pig Use Cases
    4. Interacting with Pig
    5. Basic Data Analysis with Pig
    6. Hands-On Exercise
    7. Pig Latin Syntax
    8. Loading Data
    9. Hands-On Exercise
    10. Simple Data Types
    11. Field Definitions
    12. Data Output
    13. Viewing the Schema
    14. Hands-On Exercise
    15. Filtering and Sorting Data
    16. Hands-On Exercise
    17. Commonly-Used Functions
    18. Hands-On Exercise: Pig for ETL Processing
    19. Processing Complex Data with Pig
    20. Hands-On Exercise
    21. Storage Formats
    22. Complex/Nested Data Types
    23. Hands-On Exercise
    24. Grouping
    25. Hands-On Exercise
    26. Built-in Functions for Complex Data
    27. Hands-On Exercise
    28. Iterating Grouped Data
    29. Hands-On Exercises
    30. Multi-Dataset Operations with Pig
    31. Hands-On Exercise
    32. Techniques for Combining Data Sets
    33. Joining Data Sets in Pig
    34. Hands-On Exercise
    35. Splitting Data Sets
    36. Hands-On Exercise
    HIVE
    1. Hive Fundamentals and Architecture
    2. Loading and Querying Data in Hive
    3. Hands-On Exercise
    4. Hive Architecture and Installation
    5. Comparison with Traditional Database
    6. HiveQL: Data Types, Operators and Functions
    7. Hands-On Exercise
    8. Hive Tables, Managed Tables and External Tables
    9. Hands-On Exercise
    10. Partitions and Buckets
    11. Hands-On Exercise
    12. Storage Formats, Importing Data, Altering Tables, Dropping Tables
    13. Hands-On Exercise
    14. Querying Data, Sorting and Aggregating, Map Reduce Scripts
    15. Hands-On Exercise
    Module-9
    1. Joins & Sub queries, Views
    2. Hands-On Exercise
    3. Integration, Data manipulation with Hive
    4. Hands-On Exercise
    5. User Defined Functions
    6. Hands-On Exercise
    7. Appending Data into existing Hive Table
    8. Hands-On Exercise
    9. Static partitioning vs dynamic partitioning
    10. Hands-On Exercise
    HBASE
    1. CAP Theorem
    2. HBase Architecture and concepts
    3. Introduction to HBase
    4. Client API’s and their features
    5. HBase tables The ZooKeeper Service
    6. Data Model, Operations
    Module-11
    1. Programming and Hands on Exercises
    SQOOP
    1. Introduction to Sqoop
    2. MySQL Client & server
    3. Connecting to relational data base using Sqoop
    4. Importing data using Sqoop from Mysql
    5. Exporting data using Sqoop to MySql
    6. Incremental append
    7. Importing data using Sqoop from Mysql to hive
    8. Exporting data using Sqoop to MySql from hive
    9. Importing data using Sqoop from Mysql to hbase
    10. Using queries and sqoop
    Flume and Oozie
    1. What is Flume?
    2. Why use Flume, Architecture, configurations
    3. Master, collector, Agent
    4. Twitter Data Sentimental Analysis project
    5. Oozie
    6. What is Oozie, Architecture, configurations?
    7. Oozie Job Submission
    8. Oozie properties
    9. Hands-on exercises
    Projects
    1. Social Media Final Project
    2. Hadoop Project
    3. Project in Healthcare Domain
    4. Hadoop Project in Healthcare
    5. Project in Finance/Banking Domain
    6. Hadoop Project in Banking Domain
    7. Discuss datasets and specifications of the project
    8. spark-with-scala-online-course-training-nareshit Spark
    Spark
    Apache Spark
    1. Introduction to Apache Spark
    2. Why Spark
    3. Batch Vs. Real-Time Big Data Analytics
    4. Batch Analytics – Hadoop Ecosystem Overview
    5. Real-Time Analytics Options
    6. Streaming Data – Storm
    7. In Memory Data – Spark, What is Spark?
    8. Spark benefits to Professionals
    9. Limitations of MR in Hadoop
    10. Components of Spark
    11. Spark Execution Architecture
    12. Benefits of Apache Spark
    13. Hadoop vs Spark
    Introduction to Scala
    1. Features of Scala
    2. Basic Data Types of Scala
    3. Val vs Var
    4. Type Inference
    5. REPL
    6. Objects & Classes in Scala
    7. Functions as Objects in Scala
    8. Anonymous Functions in Scala
    9. Higher Order Functions
    10. Lists in Scala
    11. Maps
    12. Pattern Matching
    13. Traits in Scala
    14. Collections in Scala
    Spark Core Architecture
    1. Spark & Distributed Systems
    2. Spark for Scalable Systems
    3. Spark Execution Context
    4. What is RDD
    5. RDD Deep Dive and Dependencies
    6. RDD Lineage
    7. Spark Application In Depth and Spark Deployment
    8. Parallelism in Spark
    9. Caching in Spark
    10. Spark Internals
    11. Spark Transformations, Actions, Cluster and SQL Introduction
    12. Spark Data Frames
    13. Spark SQL with CSV, JSON, and Database
    14. Spark Streaming
    15. Spark Streaming
    16. Features of Spark Streaming
    17. Micro Batch
    18. Dstreams
    19. Transformations on Dstreams
    20. Spark Streaming Use Case
    Statistics + Machine Learning
    Statistics
    1. What is Statistics?
    2. Descriptive Statistics
    3. Central Tendency Measures
    4. The Story of Average
    5. Dispersion Measures
    6. Data Distributions
    7. Central Limit Theorem
    8. What is Sampling
    9. Why Sampling
    10. Sampling Methods
    11. Inferential Statistics
    12. What is Hypothesis testing
    13. Confidence Level
    14. Degrees of freedom
    15. what is pValue
    16. Chi-Square test
    17. What is ANOVA
    18. Correlation vs Regression
    19. Uses of Correlation and Regression
    Machine Learning
    1. Machine Learning Introduction
    2. ML Fundamentals
    3. ML Common Use Cases
    4. Understanding Supervised and Unsupervised Learning Techniques
    5. Clustering
    6. Similarity Metrics
    7. Distance Measure Types: Euclidean, Cosine Measures
    8. Creating predictive models
    9. Understanding K-Means Clustering
    10. Understanding TF-IDF, Cosine Similarity and their application to Vector Space Model
    11. Case study
    12. Implementing Association rule mining
    13. Case study
    14. Understanding Process flow of Supervised Learning Techniques
    15. Decision Tree Classifier
    16. How to build Decision trees
    17. Case study
    18. Random Forest Classifier
    19. What is Random Forests
    20. Features of Random Forest
    21. Out of Box Error Estimate and Variable Importance
    22. Case study
    23. Naive Bayes Classifier
    24. Case study
    25. Project Discussion
    26. Problem Statement and Analysis
    27. Various approaches to solving a Data Science Problem
    28. Pros and Cons of different approaches and algorithms
    29. Linear Regression
    30. Case study
    31. Logistic Regression
    32. Case study
    33. Text Mining
    34. Case study
    35. Sentimental Analysis
    36. Case study
    Python
    Getting Started with Python
    1. Python Overview
    2. About Interpreted Languages
    3. Advantages/Disadvantages of Python pydoc
    4. Starting Python
    5. Interpreter PATH
    6. Using the Interpreter
    7. Running a Python Script
    8. Python Scripts on UNIX/Windows, Editors and IDEs
    9. Using Variables
    10. Keywords
    11. Built-in Functions
    12. StringsDifferent Literals
    13. Math Operators and Expressions
    14. Writing to the Screen
    15. String Formatting
    16. Command Line Parameters and Flow Control
    Sequences and File Operations
    1. Lists
    2. Tuples
    3. Indexing and Slicing
    4. Iterating through a Sequence
    5. Functions for all Sequences
    6. Using Enumerate()
    7. Operators and Keywords for Sequences
    8. The xrange() function
    9. List Comprehensions
    10. Generator Expressions
    11. Dictionaries and Sets
    Deep Dive – Functions Sorting Errors and Exception Handling
    Functions
    1. Function Parameters
    2. Global Variables
    3. Variable Scope and Returning Values. Sorting
    4. Alternate Keys
    5. Lambda Functions
    6. Sorting Collections of Collections, Dictionaries and Lists in Place
    7. Errors and Exception Handling
    8. Handling Multiple Exceptions
    9. The Standard Exception Hierarchy
    10. Using Modules
    11. The Import Statement
    12. Module Search Path
    13. Package Installation Ways
    Regular Expressionist’s Packages and Object – Oriented Programming in Python
    1. The Sys Module
    2. Interpreter Information
    3. STDIO
    4. Launching External Programs
    5. path directories and Filenames
    6. Walking Directory Trees
    7. Math Function
    8. Random Numbers
    9. Dates and Times
    10. Zipped Archives
    11. Introduction to Python Classes
    12. Defining Classes
    13. Initializers
    14. Instance Methods
    15. Properties
    16. Class Methods and Data Static Methods
    17. Private Methods and Inheritance
    18. Module Aliases and Regular Expressions
    Debugging, Databases and Project Skeletons
    1. Debugging
    2. Dealing with Errors
    3. Using Unit Tests
    4. Project Skeleton
    5. Required Packages
    6. Creating the Skeleton
    7. Project Directory
    8. Final Directory Structure
    9. Testing your Setup
    10. Using the Skeleton
    11. Creating a Database with SQLite 3
    12. CRUD Operations
    13. Creating a Database Object.
    Machine Learning Using Python
    1. Introduction to Machine Learning
    2. Areas of Implementation of Machine Learning
    3. Why Python
    4. Major Classes of Learning Algorithms
    5. Supervised vs Unsupervised Learning
    6. Learning NumPy
    7. Learning Scipy
    8. Basic plotting using Matplotlib
    9. Machine Learning application
    10. Supervised and Unsupervised learning
    11. Classification Problem
    12. Classifying with k-Nearest Neighbours (kNN)
    Algorithm
    1. General Approach to kNN
    2. Building the Classifier from Scratch
    3. Testing the Classifier
    4. Measuring the Performance of the Classifier
    5. Clustering Problem
    6. What is K-Means Clustering
    7. Clustering with k-Means in Python and an
    Application Example
    1. Introduction to Pandas
    2. Creating Data Frames
    3. GroupingSorting
    4. Plotting Data
    5. Creating Functions
    6. Converting Different Formats
    7. Combining Data from Various Formats
    8. Slicing/Dicing Operations.
    9. Scikit and Introduction to Hadoop
    10. Introduction to Scikit-Learn
    11. Inbuilt Algorithms for Use
    12. What is Hadoop and why it is popular
    13. Distributed Computation and Functional Programming
    14. Understanding MapReduce Framework Sample MapReduce Job Run
    15. Hadoop and Python
    16. PIG and HIVE Basics
    17. Streaming Feature in Hadoop
    18. Map Reduce Job Run using Python
    19. Writing a PIG UDF in Python
    20. Writing a HIVE UDF in Python
    21. Pydoop and MRjob Basics
    Python Project Work
    1. Real world project