## Data Science COURSE

### About this course

Data Science Training by Sathya Technologies lets you build and master skills like Descriptive and Inferential Statistics, Probability Distribution, Prediction Analytics using tools R Studio, Data Visualization, SQL, SAS, Hadoop etc. The course is aimed at preparing you to take up the role of **Data Scientist** working with huge amount of data to make prediction models using various statistical concepts. Using Machines Learning concepts and modeling tools Data Scientist should help organizations process large amount of unstructured data and gain information to shape Business goals.

#### Why This Course?

- Average Salary of a Data Scientist is $120,931 PA
- Demand for Data Scientist will zoom to 28% by 2020 across all industry verticals.
- Financial Institutions, Insurance Sectors, FMCG companies depend on Data Scientist to shape their Business Goals and Strategies.

#### Best Data Science Training in Ameerpet, Hyderabad

Data Scientist is a person who employs various methods and tools to extract meaningful data . They deal with huge amounts of data and make predictions using statistical concepts. They have to formulate and write queries and derive information from raw data.

At Sathya Technologies we begin with learning in detail about inferential statistics. Proceeding further we learn process of data science workflow. WE learn bow to collect, explore, model and validate data using various prediction analysis tools.

#### Course Objective

- To learn key features of Data Science.
- ‘Understand the probability distributions in details.
- Working with real time problems.
- To work on data handling concepts.
- Working on integrating with other tools.

#### How the program will be conducted

Sathya Technologies with its start-of- art class rooms and Lab infrastructure at Ameerpet Hyderabad offer the best and most conducive learning environment, with a team of highly skilled trainers having years of industry experience. Classroom trainings will be conducted on a daily basis. Practical exercises are provided for the topics conducted on daily basis to be worked upon during the lab session. Online session conducted through the virtual classroom also have the same program flow with theory and practical sessions. Our Labs can be accessed online from across the world allowing our online training student to make the best use of the infrastructure from the comfort of their home.

#### Career Opportunities in Data Scientist

With the popularity of Big Data increasing exponentially, opportunities as Data Scientist / architects has been growing in all major industry sectors .etc. Training programs on Data Science technology by Sathya Technologies focuses on empowering the students with the latest concepts and industry specific topics.Our well experienced trainer and well planned course materials ensures for 100% success in interviews.

### Who can learn?

#### Targeted Audience

- Software Developers
- Statisticians
- College / Fresher’s with statistics and math background
- Statistics Professionals

#### Prerequisite to learn the course

Experience in Statistics Machine Language will help becoming Data Scientist. Understanding business and domain concepts would be added advantage. Basic R programming concepts will come in handy. Knowledge in BigData would also be helpful.

##### INTRODUCTION TO DATMSCEINCE

- Need of Data Science
- History of Data Science
- Whatis Data Science
- Data Science vs Data Analytics
- Whatis DataAnalytics
- Whatis Data Analysis
- DataMining

##### INTRODUCTION TO MACHINE LEARINING

- What is machine learning
- Types of learning
- Sup°rvised Machine Learning
- Unsupervised Machine Learning
- Machine learning algorithms
- Flow of Supervised and Unsupervised Machine Learning
- Simple linear Regression ”
- Multiple Linear Regression
- Logistic Regression
- K-Nearest Neighbour
- Support Vector Machine
- Decision Tree
- Random forest
- Ensemble Machine Learning
- naive Bayes
- Clustering
- K-Means
- Hierarchical Clustering

##### PYTHON

- Whatismachine learning
- Types of learning
- Sup°rvised Machine Learning
- Unsupervised fvlachine Learning
- Mechine learning algorithms
- Flow of Supervised and Unsupervised Machine Learning
- Simple linear Regression ”
- Multiple Linear Regression
- Logistic Regression
- K-Nearest Neighbour
- Support Vector Machine
- Decision Tr”ee ”
- Random r-orest
- Ensemble Machine Learning
- NaTve Bayes
- Clustering
- K-Means
- Hierarchical Clustering
- Data Science Essentials
- Numpy
- Introduction
- Numpy Package
- Ndarray Object – “
- Data Types
- ArrayAttributes
- Array from Numerical Ranges
- Indexing & Slicing
- Advanced Indexing
- lterating over array
- String Functions
- Arithmetic Operations
- Statistical Functions

##### Pandas

- Introduction
- Pandas Package
- Series
- Data Frame
- Panel
- Descriptive Statistics
- Indexing and Selecting Data
- Itaration
- Sorting
- Aggregations
- Missing Data
- GroupBy
- Merging/Joining
- Concatenation
- DataFunctonality
- Pandas-Visualization
- Pandas- IO Tools
- CSVto DataFrame
- Locandiloc
- DafaFrame Filtering

##### Manipulating DataFrames with Pandas

- Extracting and Transforming Data
- Neshapng Data
- Grouping Data

##### Data Visualization using Python

- matplotlib
- Bar Graph
- Histogram
- Scatter Plot
- Pie Chart

##### Statistic and Mathematical Essentials for Data Science

- feature of Central Tendency
- Mean
- Mode
- Median
- Range
- Inter Quartile Range
- Variance
- Standard Deviation
- Correlation
- Regression models in Machine Learning
- Residuals
- Correlation Coefficients ( Pearson)
- Accuracy Measurement
- Least Square Regression
- Root Mean Square Error
- Coefficient of Determination (R2 Score)
- Cost Function
- Gradient Descent
- Hypothesis Tes lint and p-values
- T-values
- Z-score
- Create Dummy Variables
- Cross Validation
- Confusion Matrix
- Compete Precision, Recall, F-Measure and support
- TPR, FPR, FNR, TNR
- Accuracy
- Learning rate
- Sensitivity and Specificity
- ROC Curve
- (Receiver Operating Characteristic)
- Receiver Operating Characteristic
- Calculating similarity based on Euclidean/Manhattan Distance
- Calculation of Entropy and Information Gain
- Calculation of Gini index
- Basicprobability
- Randomness
- Conditional Probability
- Naive BayesTheorem
- Multiplication rule for dependent and independent events
- Ditferential Equations and Partial Derivatións
- LinearAlgebra :
- Corretation, Covariance
- Matrices and Vectors
- Addition and Scalar Multiplication
- Matrix Vector Multiplication
- Matrices Multiplication
- Matrix Transformations
- Inverse and Transpose of IVlatrices
- Eigen Values and Eigen Vectors

##### Machine Learning using Python

- Regression
- Linear Regression
- What is Regression
- Types of Regression
- Model Description
- Ordinary Lea st Squar e meth od
- Import and R ead the Data
- Perform Exploratory Data Analysis
- Interpreting Model Coefficients
- Feature Selection
- Training and Testing the data
- Model Evaluation Using Trainffest Split
- Training the model
- PredictingTestdata
- Model Evaluation Metrics for Regression
- Use Case – Linear Regression using Advertising Dataset and Housing Dataset

##### Logistic Regression

- Introduction
- Data Exploration
- Data Visualization
- Feature Selection (Recursive Feature Elimination)
- Implementing the Model
- Logistic Regression Model Fitting
- Predicting Test Set Results and Calculate Accuracy
- Cross Validation
- Confusion Matrix
- Compute-Precision, Recall, F-Measure and support
- ROC curve(Receiver Operating Characteristic)
- ClassificationReport
- Logistic Regression Hypothesis
- Use Case – Logistic Regression using Advantages of Random Forest Banking dataset

##### K-Nearest Neighbor

- Understanding classification using Forest in Medicine Nearest Neighbor
- FindK-Nearest Neighbors CLASS IFICATJON USING
- Rescale using min-max normalization
- Diagnosing cancer with the K-NN algorithm
- Import /LoadData
- Exploring and Preparing the data
- Transformation Normalizing numeric
- Data preparation creating training
- Training a model on the data
- Evaluating model performance
- Improve model performance

##### Support Vector Machine (SVM)

- Goal of Support Vector Machine (SVM)
- Support Vector Machine-Basics
- Advantages and Disadvantages of-SVMs
- Hyperplane and Margin
- Classification with Hyperplanes
- Linear Separable Case
- Kernel and Radial Functions
- Constructing the Maximal margin classifier
- Usecase- SVM Using cancer dataset

##### Decision Tree and Random Forest

- Understanding decision trees
- Calculation of Entropy and Information Gain
- Choosing the best split
- Pruning the decision tree
- Collection data
- Exploding and preparing the data
- Trair ing a model on the data
- Evaluating model performance
- Improving model performance
- Boosting the accuracy of decision trees
- What is a Random Forest algorithm?
- Advantages of Random Forest algorithm
- Use Case-Decision Tree and Random Forest in Medicine

##### PROBABILISTIC LEARNING - CLASS IFICATJON USING NAIVE BAYES

- Understanding naive Bayes
- Basic concepts of Bayesian methods Probability
- Joint probability
- Conditional probability with Bayes theorem
- The Naive Bayes algorithm
- The naive Bayes classific ation Using numeric features with naive Bayes
- Naive Bayes algorithm Example
- Collecting data
- Exploring and preparing the data
- Trai ing a model on the dc1ta
- Evaluating model performance
- Improving model performance

##### FINDINGGRO UPSOF DATA CLUSTERING WITH K-MEANS

- Understanding clustering
- Clustering as a machine learning task
- the KM-means algorithm for clustering
- Using distance assign and update cluster :
- Choosing the appropriate number of Custer
- Finding segments using K-means clustering
- Collecting data
- Exploring and preparing the data
- Data preparation dummy coding missing values
- Data preparation imputing missing
- Training a model on the data
- Evaluating model performance
- Improving model performance
- Principal component analysis (PCA)
- Dimensionality Reduction
- Use Case – KMeans Clustering using Wholesale Customers dataset

##### DIMENSIONALITY REDUCTION AND VISUALIZATION

- What is Dimensionality reduction?
- Row Vector and Column Vector
- How to represent a data set?
- How to represent a dataset as a Matrix.
- Data Pre-processing Feature Normalisation
- Mean of a data matrix
- Data Pre-processing: Column Standardizatio n
- Co-varian ce of a Data Matrix

##### PCA(PRINCIPAL COMPONE_NTANALYSIS}

- Why learn PCA?
- Geometric intuition of PCA
- Mathematical objective function of PCA
- Eigenvalues and Eigenvectors (PCA): Dimensionality reduction
- PCA for Dimensionality Reduction and Visualization

##### Deep learning

- Introduction to Deep Learning
- Building
- Neural network architecture
- Convolutional Neural Networks (CNN)

##### Artificial Neural Networks(ANN)

- Deep Learning with Keras & Tensorflow
- Image Classification with Keras

##### Artificial lntelligence

- Natural Language Processing
- Introduction to NLP and NLTK
- Preprocessing data using tokenization
- Stemming text date
- Converting text to its base form using lemmatization
- Building a bag-of-words model
- Building a text classifier
- Text to Features
- TF-IDF Extraction
- Word Vectors
- Analyzing the sentiment of the sentence

##### Building Recommendation Engines

- What is RecommendationEngine
- Types of Recommendation Engines
- Collaborative Filtering
- ItemBased Collaborative Filtering
- User Based Collaborative Filtering
- GontentBased Filtering

##### Optical Character Recognition

- Extraction of text from PDF
- Extraction of text from the image

### Syllabus

[ninja_tables id="4344"]

### Reviews

Lorem Ipsn gravida nibh vel velit auctor aliquet. Aenean sollicitudin, lorem quis bibendum auci elit consequat ipsutis sem nibh id elit. Duis sed odio sit amet nibh vulputate cursus a sit amet mauris. Morbi accumsan ipsum velit. Nam nec tellus a odio tincidunt auctor a ornare odio. Sed non mauris vitae erat consequat auctor eu in elit.