Data Science
Data science is the study of data to extract meaningful insights for business.
Course Contents
It is a multidisciplinary approach that combines principles and practices from the fields of mathematics, statistics, artificial intelligence, and computer engineering to analyze large amounts of data.
Module 1
Module 2
Module 3
Module 4
Module 5
Module 1
DESCRIPTIVE AND INFERENTIAL STATISTICS
- Turning Data into Information
- Data Visualization
- Measures of Central Tendency
- Measures of Variables
- Covariance, Correlation
- Real time Problems
- Probability Distributions
- Discrete Random Variables
- Binomial, Poisson Distributions
- Mean, Expected Values
- Continuous Random Variables
- Normal Distributions
- Real time Problems
- Sampling Distributions
- Sampling Distributions for Sample Mean and Proportions
- Central Limit Theorem
- Real time Problems
- Confidence Intervals
- Statistical Inference
- Construction Confidence Intervals to Estimate a Population Mean, Variance and Proportions
- Real time Problems
- Hypothesis Testing
- Hypothesis Testing
- Type 1 and Type 2 Errors
- Decision Making in Hypothesis Testing
- Hypothesis Testing for a Mean, Variance and Proportion
- Real Time Problems
- Analysis of Variance (ANOVA)
- ANOVA Assumptions
- One-way and Two-way ANOVA
- Multiple Comparisons (Tukey, Dunnett)
- Real time Problems
Module 2
PREDICTIVE ANALYTICS/REGRESSION ANALYSIS
- Simple Linear Regression
- Simple Linear Regression Model
- Least Square Estimation of Parameters
- Hypothesis Testing on Slope and Intercepts
- Coefficient Determination
- Real time Problems
- Multiple Linear Regression
- Multiple Regression Models
- Estimation of Model Parameters
- Hypothesis Testing in MLR
- Multi-collinearity
- Real time Problems
- Model Adequacy Checking
- Residual Analysis
- The PRESS Statistic
- Detection and Treatment of Outliers
- Real time Problems
- Transformations
- Variance Stabilizing Transformations
- Transformations to Linearize the model
- Box-cox, Tidwell Transformations
- Generalized and weighted Least Squares
- Real time Problems
- Diagnosis for leverage and influential Points
- Leverage/Cook’s D/DFFITS/ DFBETAS
- Treatment of Influential Observations
- Real time Problems
- Polynomial Regression
- Polynomial Model in one, two or More Variables
- Real time Problems
- Variables selection and Model Building
- Forward Selection
- Backward Elimination
- Stepwise Regression
- Real time Problems
Module 3
Applied Multivariate Analysis
- Measures of Central Tendency, Dispersion and Association.
- Measure of Central Tendency
- Measure of Dispersion
- Real time Problems
- Multivariate Normal Distribution
- Multivariate Normality and Outliers
- Exponent of Multivariate Normal Distribution
- Eigen Values and Eigen Vectors
- Spectral Decomposition
- Single Value Decomposition
- Sample Mean Vector and Sample Correlation
- Distribution of Sample Mean Vector
- Interval Estimate of Population Mean
- Inferences for Correlations
- Real time Problems
- Principal Component Analysis (PCA)
- Principal Component Analysis Procedure
- Real time Problems
- Discriminant Analysis
- Discriminant Analysis (Linear/Quadratic)
- Estimating Misclassification Problems
- Real time Problems
Module 4
Machine Learning
- Introduction
- Application Examples
- Supervised Learning
- Unsupervised Learning
- Cluster Analysis
- Agglomerative Hierarchical Clustering
- K-Means Procedure
- Medoid Cluster Analysis
- Dimensionality Reduction
- Principal Component Analysis
- Real time Problems
- Association Rules
- Market Basket Analysis
- Apriori/Support/Confidence/Lift
- Real time Problems
- Classification
- Bayes Law
- Naive Bayes
- Variance-Bias Tradeoff
- Gradient Descent/Ascent Procedure
- Maximum Likelihood Method
- Logistic Regression
- Nearest-Neighbor Methods (K-NN Classifier)
- Using Software-Real Time Problems
- Treebased Methods
- Basics of Decision Trees
- Regression Trees
- Classification Trees
- Ensemble Methods
- Bagging, Bootstrap, Random Forests Boosting
- Using software-Real time Problems
- Support Vector Machines
- Maximum Marginal Classifier
- Support Vector Classifier
- Kernel Trick
- Support Vector Machine
- SVMs with more than two Classes
- Using Software Real time Problems
- Regression Shrikage Methods
- Ridge Regression
- Lasso Regression
- Using Software Real time Problems
Module 5
- R Programming
- R Basics
- Numbers, Attributes
- Creating Vector
- Mixing Objects
- Explicit Coercion
- Formatting Data Values
- Matrices, List, Factors, Data Frames, Missing Values
- Names , Reading and Writing Data
- Using Dput/DDump
- Interface to the Outside world
- Sub setting R objects
- Vectorized Operations
- Dates and Times
- Managing Data Frames with the DPLYR package
- Control Structures
- Functions
- Lexical /Dynamic Scoping
- Loop Functions
- Data Analytics Using R
- Module 1-4 demonstrated