Data Science

data science course

Course Syllabus

Data Science course

(With SAS, R, WEKA, and SPSS & Excel)*

Module1: Introduction Data Science

Part -1 Referential details for Data science Business Analytics

1. Scope & Fact of Data Science and Business analytics

2. SWOT Analysis of Data Science Business Analytics

3. Introduction to Advanced Data Analytics

4. Journey Mathematics-Statistics-Econometrics

5. Flow chart for Data Science and Business Analytics

6. Data wherehouse conceptual discussions

7. Hadoop for Data Science

8. OLTP OLAP for Data information

9. Web Application report

Module 2: Data Visualization and Summarization

Part-2: Descriptive Statistics:

  1. Descriptive Statistical
  2. Inferential Statistics
  3. Types of Variables
  4. Measures of central tendency
  5. Data Viability Dispersion
  6. Five number Summary Analysis
  7. Data Distribution Techniques
  8. Exploration Techniques for Numerical data
  9. Exploration techniques for Character Data
  10. Visualization Exploration
  11. Summary Exploration
  12. Chebychevs Inequality.

Part-3: Basic Probability for Business Issues:

  1. Simple Probability
  2. Marginal Probability
  3. Joint Probability
  4. Conditional probability (linked with decision Tress Algorithms)
  5. Bayes Theorem probability (linked with Na├?┬»ve Bayes Algorithms)
  6. Discrete Distributions
  7. Binomial Distribution
  8. Hypergeomatric Distributions
  9. Poisson Distribution
  10. Continuous Distributions
  11. Normal Distribution and Properties
  12. Scandalized Distributions

Part-4: Sampling Techniques Big Data

Sampling Distributions

  1. Simple Random
  2. Systematic Sample
  3. Stratified sample
  4. Cluster Sample
  5. Standard Error of the Mean
  6. Skewed Std. Error
  7. Kurtosis Std. Error
  8. Central Limit Theorem,
  9. Sampling from Infinity
  10. Sampling Distributions for Mean
  11. Sampling Distributions for proportions

Module 3: Data Preparation and Quality Check

Part-5: Data Validation Data Normality

  1. Unvariate normality techniques
  2. Bivariate techniques
  3. Multivariate techniques
  4. Q-Q probability plots
  5. Cumulative frequency
  6. Explorer analysis
  7. Steam and leaf analysis
  8. Histogram
  9. Box plot
  10. Scores for Normality Check
  11. Kolmogorov Smirnov test
  12. Shapiro Wilks test
  13. Anderson darling test

Part 6 Data Cleaning process Quality check

  1. PCA for Big Data Analysis or Unsupervised data
  2. PCA Regression Scores for Supervised aata
  3. Noise Data detecting
  4. Data cleaning with Regression Residual
  5. Data Scrubbing with statistical sense

Part-7: Data Imputation and outlier treatment

  1. Outlier treatment with robust measurements
  2. Outlier treatment with central tendency Mean
  3. Outlier with Min Max Likelihood methods
  4. Outlier Detection with Density Based
  5. Visualize Outlier Treatment
  6. Summarized Outlier Treatment
  7. Multivariate Outlier Detection Mahalanobis Distance
  8. Multivariate Chi-square statistics
  9. Outlier with Residual Analysis
  10. Outlier Detection with PCA Analysis
  11. Data Imputation with series Central Tendency

Part-8: Test of Hypothesis

  1. Null Hypothesis formulation
  2. Alternative Hypothesis
  3. Type I and Type II errors
  4. Power Value
  5. One tail and Two tail
  6. One Sample T-TEST
  7. Paired T-TEST
  8. Independent Sample T-TEST
  9. Analysis of Variance ( ANOVA),
  10. MANOVA
  11. Chi Square Test
  12. Kendall Chi Square
  13. Kruskal-Wallis Rank Test Chi Square
  14. Mann-Whitney, Chi Square
  15. Wilcoxon, Chi Square
  16. McNemar test Chi Square

Part-9: Data Transformation

  1. Log transformation
  2. Arcsine transformation
  3. Box- Cox transformation
  4. Square root transformation
  5. Inverse transformation
  6. Min Max Data normalization

Module 4: Predictive & Estimation Models (Supervised earning)

Part-10: Predictive modeling & Diagnostics

  1. Correlation - Pearson, Kendall, Wilcox
  2. SLR Regression
  3. MLR Regression
  4. Examination Residual analysis
  5. Auto Correlation
  6. Test of ANOVA Significant
  7. VIF Analysis
  8. Test of Ttest Significant
  9. CP Indexing
  10. Eigen Value for PCA Analysis
  11. Homoscedasticity
  12. Heteroskedasticity
  13. Stepwise regression
  14. Forward Regression
  15. Backward Regression
  16. Multicollinearity
  17. Cross validation
  18. MAPE
  19. Check prediction accuracy
  20. Standized regression
  21. Quadraint Regression
  22. Transformed Regression
  23. Dummy Variables Regression

Part-11 Logistic Regression Analysis

  1. Logistic Regression
  2. Discriminate Regression Analysis
  3. Multiple Discriminant Analysis
  4. Stepwise Discriminant Analysis
  5. Logit function
  6. Test of Associations
  7. Chi-square strength of association
  8. Binary Regression Analysis
  9. Profit and Logit Models
  10. Estimation of probability using logistic regression,
  11. Wald Test statistics for Model
  12. Hosmer Lemshow
  13. Nagurkake R square
  14. Pseudio R square
  15. Maximum likelihood estimation
  16. Model Fit
  17. Model cross validation
  18. Discrimination functions
  19. AIC
  20. BIC (Bayesian information criterion)

Module 5: Advanced Big Data Analytics

Part-12: Dimension Reduction Analysis

  1. Introduction to Factor Analysis
  2. Principle component analysis
  3. Reliability Test
  4. KMO MSA tests, Eigen Value Interpretation,
  5. Rotation and Extraction steps
  6. Varmix Models
  7. Conformity Factor Analysis
  8. Exploitary Factor Analysis
  9. Factor Score for Regression

Part-13: Cluster Analysis

Introduction to Cluster Techniques

  1. Hierarchical clustering
  2. K Means clustering
  3. Wards Methods
  4. Agglomerative Clustering
  5. Variation Methods
  6. Maximum distance Linkage Methods
  7. Centroid distance Methods
  8. Minimum distance Linkage Method
  9. Cluster Dengogram,
  10. Ecludin distance method s

Module 6: Data Mining (Machine Learning)

Part -14: Data Mining Machine Learning / Artificial Intelligence

Functional Models

  1. Prediction
  2. Support Vector Machines (SVM)
  3. Gaussian Models
  4. Neural Network
  5. Classification Models
  6. Binary Regression/Logit Model
  7. Probit Model
  8. Native Bayes
  9. Native Bayes Multinomial
  10. Ordinal Regression
  11. Multinomial Regression
  12. Discriminate analysis

Clustering Models

DBSCAN

  1. EM (Expectation Maximization)
  2. K-Means Clustering
  3. Simple Cluster
  4. Hierarchical Cluster
  5. k-Nearest Neighbor Classification

Tree Models

  1. Random Forests :Bagging & Boosting
  2. Decision Stump
  3. CHAID Analysis
  4. C4.5 / C5.0
  5. J48 Pronning, Unproning
  6. Decision trees

Suvervial Analysis

  1. Mantel Haenszel Test
  2. Kaplan-Meier (Product- Limit) Estimator
  3. Cox's Proportional Hazards Model
  4. Cox Snell Residual
  5. Hazard Functions
  6. Proportional Hazards Assumption

Part-15 Time series

  1. Auto Regression Models
  2. Moving Average Model
  3. Multiplicative model
  4. ARIMA Model
  5. Additive Model

Part-16 Model Validation and Testing

  1. Kappa Statistics
  2. AIC
  3. BIC
  4. Error/ Confusion matrices
  5. ROC
  6. APE
  7. MAPE
  8. Lift Curve
  9. Sensitivity
  10. Misclassification Rating
  11. Specificity
  12. Maximum Absolute Error
  13. Root Final Prediction Error
  14. Gini Coefficient
  15. Schwarz's Bayesian Criterion

data science coursebig data analytics data scientist data analytics courses big data courses big data trainingbig data analysis data science training data analysis courses