Data Science course
(With SAS, R, WEKA, and SPSS & Excel)*
Module1: Introduction Data
Science
Part -1 Referential
details for Data science Business Analytics
1. Scope
& Fact of Data Science and Business
analytics
2. SWOT
Analysis of Data Science Business Analytics
3. Introduction
to Advanced Data Analytics
4. Journey
Mathematics-Statistics-Econometrics
5. Flow
chart for Data Science and Business Analytics
6. Data
wherehouse conceptual discussions
7. Hadoop
for Data Science
8. OLTP
OLAP for Data information
9. Web
Application report
Module 2: Data
Visualization and Summarization
Part-2: Descriptive
Statistics:
- Descriptive Statistical
- Inferential Statistics
- Types of Variables
- Measures of central tendency
- Data Viability Dispersion
- Five number Summary Analysis
- Data Distribution Techniques
- Exploration Techniques for Numerical
data
- Exploration techniques for
Character Data
- Visualization Exploration
- Summary Exploration
- Chebychevs Inequality.
Part-3: Basic
Probability for Business Issues:
- Simple Probability
- Marginal Probability
- Joint Probability
- Conditional probability (linked with decision Tress Algorithms)
- Bayes Theorem probability
(linked with Na�¯ve Bayes
Algorithms)
- Discrete Distributions
- Binomial Distribution
- Hypergeomatric Distributions
- Poisson Distribution
- Continuous Distributions
- Normal Distribution and
Properties
- Scandalized Distributions
Part-4: Sampling
Techniques Big Data
Sampling Distributions
- Simple Random
- Systematic Sample
- Stratified sample
- Cluster Sample
- Standard Error of the Mean
- Skewed Std. Error
- Kurtosis Std. Error
- Central Limit Theorem,
- Sampling from Infinity
- Sampling Distributions for Mean
- Sampling Distributions for
proportions
Module 3: Data Preparation
and Quality Check
Part-5: Data
Validation Data Normality
- Unvariate normality techniques
- Bivariate
techniques
- Multivariate
techniques
- Q-Q
probability plots
- Cumulative
frequency
- Explorer
analysis
- Steam
and leaf analysis
- Histogram
- Box
plot
- Scores
for Normality Check
- Kolmogorov
Smirnov test
- Shapiro
Wilks test
- Anderson
darling test
Part 6 Data Cleaning
process Quality check
- PCA
for Big Data Analysis or Unsupervised data
- PCA
Regression Scores for Supervised aata
- Noise
Data detecting
- Data
cleaning with Regression Residual
- Data
Scrubbing with statistical sense
Part-7: Data Imputation
and outlier treatment
- Outlier
treatment with robust measurements
- Outlier
treatment with central tendency Mean
- Outlier
with Min Max Likelihood methods
- Outlier Detection with Density Based
- Visualize
Outlier Treatment
- Summarized
Outlier Treatment
- Multivariate
Outlier Detection Mahalanobis Distance
- Multivariate
Chi-square statistics
- Outlier
with Residual Analysis
- Outlier
Detection with PCA Analysis
- Data
Imputation with series Central Tendency
Part-8: Test of Hypothesis
- Null
Hypothesis formulation
- Alternative
Hypothesis
- Type
I and Type II errors
- Power
Value
- One
tail and Two tail
- One
Sample T-TEST
- Paired
T-TEST
- Independent
Sample T-TEST
- Analysis
of Variance ( ANOVA),
- MANOVA
- Chi
Square Test
- Kendall Chi Square
- Kruskal-Wallis
Rank Test Chi Square
- Mann-Whitney,
Chi Square
- Wilcoxon,
Chi Square
- McNemar
test Chi Square
Part-9: Data
Transformation
- Log
transformation
- Arcsine
transformation
- Box-
Cox transformation
- Square
root transformation
- Inverse
transformation
- Min
Max Data normalization
Module 4: Predictive &
Estimation Models (Supervised earning)
Part-10: Predictive
modeling & Diagnostics
- Correlation
- Pearson, Kendall, Wilcox
- SLR
Regression
- MLR
Regression
- Examination
Residual analysis
- Auto
Correlation
- Test
of ANOVA Significant
- VIF
Analysis
- Test
of Ttest Significant
- CP
Indexing
- Eigen
Value for PCA Analysis
- Homoscedasticity
- Heteroskedasticity
- Stepwise
regression
- Forward
Regression
- Backward
Regression
- Multicollinearity
- Cross
validation
- MAPE
- Check
prediction accuracy
- Standized
regression
- Quadraint
Regression
- Transformed
Regression
- Dummy
Variables Regression
Part-11 Logistic
Regression Analysis
- Logistic Regression
- Discriminate
Regression Analysis
- Multiple
Discriminant Analysis
- Stepwise
Discriminant Analysis
- Logit
function
- Test
of Associations
- Chi-square
strength of association
- Binary
Regression Analysis
- Profit
and Logit Models
- Estimation
of probability using logistic regression,
- Wald
Test statistics for Model
- Hosmer
Lemshow
- Nagurkake
R square
- Pseudio
R square
- Maximum
likelihood estimation
- Model
Fit
- Model
cross validation
- Discrimination
functions
- AIC
- BIC
(Bayesian information criterion)
Module 5: Advanced Big
Data Analytics
Part-12: Dimension
Reduction Analysis
- Introduction
to Factor Analysis
- Principle
component analysis
- Reliability
Test
- KMO
MSA tests, Eigen Value Interpretation,
- Rotation
and Extraction steps
- Varmix
Models
- Conformity
Factor Analysis
- Exploitary
Factor Analysis
- Factor
Score for Regression
Part-13: Cluster
Analysis
Introduction
to Cluster Techniques
- Hierarchical
clustering
- K
Means clustering
- Wards
Methods
- Agglomerative
Clustering
- Variation
Methods
- Maximum
distance Linkage Methods
- Centroid
distance Methods
- Minimum
distance Linkage Method
- Cluster Dengogram,
- Ecludin
distance method s
Module 6: Data Mining
(Machine Learning)
Part -14: Data Mining
Machine Learning / Artificial Intelligence
Functional Models
- Prediction
- Support
Vector Machines (SVM)
- Gaussian Models
- Neural
Network
- Classification
Models
- Binary
Regression/Logit Model
- Probit Model
- Native
Bayes
- Native
Bayes Multinomial
- Ordinal
Regression
- Multinomial
Regression
- Discriminate
analysis
Clustering Models
DBSCAN
- EM
(Expectation Maximization)
- K-Means
Clustering
- Simple
Cluster
- Hierarchical Cluster
- k-Nearest
Neighbor Classification
Tree Models
- Random
Forests :Bagging & Boosting
- Decision
Stump
- CHAID Analysis
- C4.5
/ C5.0
- J48
Pronning, Unproning
- Decision
trees
Suvervial Analysis
- Mantel Haenszel
Test
- Kaplan-Meier
(Product- Limit) Estimator
- Cox's
Proportional Hazards Model
- Cox Snell
Residual
- Hazard
Functions
- Proportional
Hazards Assumption
Part-15 Time series
- Auto
Regression Models
- Moving
Average Model
- Multiplicative
model
- ARIMA
Model
- Additive
Model
Part-16 Model Validation
and Testing
- Kappa
Statistics
- AIC
- BIC
- Error/
Confusion matrices
- ROC
- APE
- MAPE
- Lift
Curve
- Sensitivity
- Misclassification
Rating
- Specificity
- Maximum
Absolute Error
- Root
Final Prediction Error
- Gini
Coefficient
- Schwarz's
Bayesian Criterion
data science coursebig data analytics
data scientist data analytics courses big data courses big data trainingbig data analysis data science training data
analysis courses