data science tutorial – DatabaseTown

19 Basic Machine Learning Interview Questions and Answers

Zubair Akhtar — Thu, 27 Feb 2020 17:27:00 +0000

There are several companies who hire data engineers or data scientists to make their data more reliable and secure; and for that purpose they use machine learning.

The companies may hire number of engineers who are data analyst, machine learning engineers, deep learning engineer.

All these posts are of similar job nature. The employer can ask different types of interview questions to hire the best employee for the company.

How can we solve the real world problems using machine learning? So we get some seniors and give the proper judgments.

Machine Learning Interview Questions and Answers

1 – What is Machine learning?

Machine learning is the application of artificial intelligence which is programmed in such a way to access data and learn automatically to improve its experience.

The primary object of machine learning is to access/retrieve data and learn without the intervention of the human to make decisions.

2 – How will you teach machine learning in easy words?

The interviewer is interested that how you will explain the machine learning in easy words. How you describe the basic components with the help of examples.

There is an easy way to explain the machine learning with an example. When yours friend invites you in a party. You don’t know the participants in that party. You just classify all the participants after visualizing in gender, their age and dressing.

You have no prior knowledge or past knowledge and experience about participants in party which is known as un-supervised learning.

On the other side when you have knowledge about those participants you classify them in different groups is known as supervised learning.

3 – How many types of machine learning?

There are three major types

Supervised learning
Un-Supervised learning
Re-enforcement learning

4 – What is supervised learning?

The trained data is given to the machine to learn which is based on the characteristics and data sets. It is labeled data having groups on the basis of characteristics.

For example the shape and color of different fruits is given to the machine as training data. The machine will proceed and work in future on the basis of that given data.

5 – What is Un-Supervised learning?

When there is no data and information which is given to the computer. It is the learning without the teacher.

It collects and categorizes all the data on the basis of assumption and in groups. It groups the data on the basis of relationships and characteristics.

6 – What is Reinforcement learning?

This learning is based on environment or model. When you perform some action on machine it uses special software which leads to perform certain tasks.

The software has a specific model and having the steps of action to perform. Example of reinforcement learning is playing game when an agent has a set of goals to get high score and feedback on basis of punishment and reward.

7 – What is deep learning?

Deep learning is not completely different from machine learning. Deep learning is the small part of machine learning.

Deep learning is based on neural networks. These neural networks are based on the idea of human brain. The inspiration came from the structure of human brain. It detects the features and working same as the human mind works.

8 – What is the neural networking?

Artificial neural network is an algorithm which allows computer or machine to learn by incorporating new data.

It works like human brain. Neuron is the main object of human brain. It works like the same way.

9 – What is classification and regression in machine learning?

Both classification and regression are the part of supervised learning. When you predict continuous values like predicting stock market and try to predict sales.

Classification is based on the class to predict whether customers is going to buy some product or not and salary is predicted as high or low. It classifies in labels on the basis of characteristics.

10 – What do you understand by selection bias?

In statistical terms, bias is the sampling of data on the basis of population. Take an example, when you want to get information about the use of gaming computers in some specific state. To get accurate information you have to take data from all the prevailing markets that are dealing with gaming computers in that state.

If you assume to get data from one city you can be called bias on the collection of data. You are not collecting the data from all over the state. This may produce wrong conclusion.

11 – What is precision and recall?

Recall is the process of recall previous events which is held or managed by you. For example if your friend is giving you gifts on your birthday from last ten years.

One day your friend asks you to remember about the all gifts given on birthdays, then you recall all the previous birthdays events and try to remember about the gifts means recalling memory.

When you recall your memory, you may answer it right or wrong. The precision is the ratio of a number of events you can correctly recall. If you recall 8 out of 10 birthday events then precision is 80%.

12 – What are true positive, true negative, false positive and false negative?

Let take an example to understand above terms. We have a model in which alarm goes on or not in case of fire or otherwise.

True positive:

If the alarm goes on in case of fire it is known as true positive. In this case, the fire is positive and prediction made by system to alarm is true.

False Positive:

If alarm goes on when there is no fire, in this situation fire is positive and the prediction made by the system is false. This is the worst condition.

True Negative:

If alarm does not go on when there is no fire. System considered the fire as negative and prediction made by the system is true.

False Negative:

If the alarm does not go on when there is fire. System considered fire as negative and prediction made by the system is false.

13 – What is confusion Matrix?

A model have matrix which is used to make predictions. It is also known as error matrix which is designed in tables for easy identifications but its terminology looked confusing.

14 – What is inductive learning and deductive learning?

Inductive learning is the learning in which the learner discovers rules from specific to general phenomena. Based on some examples a learner can get into conclusion.

The deductive learning is the learning in which learners have some specific rules from conclusion and get specific observations. It works more general to more specific.

15 – What is clustering in machine learning?

The method of identifying similar groups of data in one data set is called clustering.

In other words it is the process of making different groups on the base of data structure.

Similar type of data is put in one group or cluster. For example a retailer wants to improve its business and try to gets reviews from different customers. All reviews are categorized in different possible groups called clusters to put suggestion to improve the business.

16 – What is KNN Clustering and K-means clustering?

KNN stands for K-Nearest Neighbor. It is used in supervised learning technique. This algorithm uses the method of classification or regression to make the clustering on continuous values.

K-means clustering is un-supervised learning technique. It is used in clustering. This is an algorithm to make classification on the basis of attributes of features.

17 -What ROC curve, how and when you use this also the representation?

ROC curve stands for Receiver Operating Characteristic curve. It is the fundamental tool to diagnose the testing of algorithm in machine learning.

It tests the algorithm to specify the true positive rate and false positive rate. The more area this curve takes, the better algorithm it is. The true positive rate should increase faster in this curve for algorithms.

18 – What is difference between in type-I and type-II error?

Type-I error is false positive. When algorithm specifies something which actually can’t be true and model shows that it is true. For example algorithm shows that a male person is pregnant. It is the example of false positive which will never happen.

Type-II error is false negative error in which the machine shows false results. For example if a woman is pregnant and the machine shows that the woman is not pregnant, then this algorithm has some error.

19 – What is more important; model accuracy or model performance?

Model accuracy is the part of model performance. It is sub set of model performance. For example if there are bulk of data and set of rows and system have to identify the fraud in this data.

It will happen through this model accuracy that should be higher to increase the model performance.

Machine learning interview questions and answers

Linear Algebra in TensorFlow (Scalars, Vectors & Matrices)

Zubair Akhtar — Wed, 22 Jan 2020 14:58:07 +0000

Linear Algebra in TensorFlow: TensorFlow is open source software under Apache Open Source license for dataflow which is frequently being used for machine learning applications like deep-neural-network to improve the performance of search engines, such as, Google, image captioning, recommendation and translation.

For example, when a user types a keyword in Google’s search bar, it provides a recommendation which could be helpful for users or researcher. The stable version of TensorFlow appeared in 2017 which was developed by the Google Brain Team to improve the services of Gmail and Google search engine.

Its architecture performs in three parts such as data preprocessing, model building, model training and model estimation.

TensorFlow acquires input as multi dimensional array and its library fit in various API to make at scale deep learning architecture such as CNN or RNN.

TensorFlow runs on GPU and CPU. It is based on graph calculation which permits the developer to visualize the construction of the Neural Network with TensorBoard as it runs on GPU and CPU.

These are the algorithms supported by TensorFlow.

Classification – tf.estimator.LinearClassifier
Deep Learning Classification – tf.estimator.DNNClassifier
Deep Learning wipe and deep – tf.estimator.DNNLinearCombinedClassifier
Boosted Tree Classification – tf.estimator.BoostedTreesClassifier
Linear Regression – tf.estimator.LinearRegressor
Boosted Tree Regression – tf.estimator.BoostedTreesRegressor

You can see more details here.

Before to start a practical example of TensorFlow, it is essential to recall the concepts of scalar, vector, and matrix.

A scalar is always one by one so, it has the lowest dimensionality, whereas, each element of a vector is a scalar and dimension of a vector is (m x 1) or (1 x m) matrix and a matrix is a collection of vectors (m x n) or a collection of scalars.

A few instances of scalar, vector, and matrix are given below.

Examples of 1 x 1 Scalar:

Examples of m x 1 Vector:

$latex \displaystyle \left[ {\begin{array}{*{20}{c}} 1 \\ 2 \\ 3 \\ 4 \end{array}} \right]$
$latex \displaystyle \left[ {\begin{array}{*{20}{c}} 4 \\ 6 \\ 2 \end{array}} \right]$

Examples of m x n Matrices:

$latex \displaystyle \left[ {\begin{array}{*{20}{c}} 1 & 4 & 1 \\ 2 & 5 & 3 \\ 3 & 6 & 2 \end{array}} \right]$

$latex \displaystyle \left[ {\begin{array}{*{20}{c}} 3 & 4 & 7 \\ 1 & 3 & 0 \\ 8 & 2 & 5 \end{array}} \right]$

Let’s start a practical example of TensorFlow…

Practical Example of TensorFlow

Before to create a Tensor, it is essential, first to import the relevant library in Jupyter Notbook as shown in below snap.

Import the relevant library:

Creating a Tensor and checking its shape: Now we are going to create and Tensor and check its shape. Tensor can be stored in an array like this,

In this example, firs,t we take two matrices t1 and t2 and create an array with two elements t1 and t2 and the result obtained in the form of an array which contains these two matrices.

Now we check this shape like this,

The above result depicts that this array contains two matrices, each of which is 2 by 3.

Manually creating a Tensor:

We can also create a tensor manually, but in fact, it is a bit difficult as various brackets are involved.

This is an example of Linear Algebra in TensorFlow

Linear Algebra in TensorFlow

Read related article: Linear Algebra for Data Science

Scalars, Vector and Matrices in Python (Using Arrays)

Tariq Aziz Rao — Thu, 16 Jan 2020 16:07:35 +0000

Arrays in python, are frequently used to work with scalars, vectors and matrices, a topic of today’s post. This post is continuation of linear algebra for data science.

We use NumPy, a library for the python programming which allows us to work with multidimensional arrays and matrices along with a large collection of high-level mathematical functions to operate on these arrays. let’s start a practical example,

import numpy as np

Declaring Scalar, Vectors and Matrices

declaring scalars

As you have already known that scalar has no dimension and the above example showed how to declare a scalar quantity in python. Now in the next example, we are going to declare a new variable i.e. vector which is equal to an array of 2, 4 and 6 which are enclosed by a bracket like this,

declaring vectors

Now we are going to declare a Matrix having two rows and three columns. The elements of each row are enclosed by a bracket and the two brackets are separated by a comma, whereas, these two rows are enclosed by a whole bracket, result of which is given below,

declaring matrix

Data Types

If we intend to get the data type of any variable like previously declared as scalar, vector and matrix then we simply write type and variable name which enclosed by a bracket as shown in given picture,

data types

In above example, vector is a one-dimensional array and matrix is a two-dimensional array. It is pertinent to mention here that within numpy, integers, floats and arrays with one element perform in the same way as regards linear algebraic operations.

Data Shapes

Shape can also apply to our objects to confirm their dimensionality or shape like this,

It is cleared from above snap that the first matrix has three rows and one column i.e. (3,), whereas, the second matrix has two rows and three columns i.e. (2, 3).

Create a column vector

Here, we use a reshape method which provides an array with a new shape without altering its data like this,

create a column vector in python

You can see that the resultant matrix contains three rows and one column which was previously contained only one row and three columns.

Linear Algebra for Data Science

Tariq Aziz Rao — Thu, 09 Jan 2020 18:34:10 +0000

Linear Algebra for Data Science and machine learning is very essential as the concepts of linear algebra are used to understand the working of algorithms. In this post, we are going to discuss the basic concepts of linear algebra.

Why Linear Algebra?

Enormous datasets mostly contain hundreds to a large number of individual data objects.

It is simpler to work with this information and operate on it when it is characterized in the form of matrices and vectors.

Linear Algebra is a branch of mathematics that manages vectors and tasks on vectors. Linear Algebra is significantly essential for Artificial Intelligence and information handling calculations.

Data Scientists must have basic knowledge of mathematics to solve complex data problems in an efficient way to boost the company revenue.

Linear Algebra is mostly concerned with operations on vectors and matrices, so let’s start learning about matrices and vectors.

What is the Matrix?

Usually, it corresponds to a collection of information stored in an arranged manner. Mathematically, it states to a set of numbers, variables or functions arranged in rows and columns.

Matrices are generally represented by the capital English alphabets like A, B, C……, etc. For example,

$\displaystyle A=\left[ {\begin{array}{*{20}{c}} 3 & 4 & 5 \\ 6 & 7 & 8 \\ 9 & 2 & 3 \end{array}} \right] $

The order of a matrix is defined by the number of rows and columns in a matrix.

Order of a matrix = number of rows × number of columns

$\displaystyle A=\left[ {\begin{array}{*{20}{c}} 1 & 2 & 3 \\ 2 & 5 & 8 \\ 8 & 2 & 3 \end{array}} \right]$

In above example, number of rows is 3 and number of columns is also 3, therefore,

Order of matrix A is 3 × 3.

There are following types of matrices:

Row Matrix
Column Matrix
Unit or Identity Matrix
Null or Zero matrix
Square Matrix
Rectangular Matrix
Diagonal Matrix
Scalar Matrix
Negative of a Matrix
Transpose of a Matrix
Symmetric Matrix
Skew-Symmetric Matrix.

The basic operations on matrices are:

Addition of Matrices
Subtraction of Matrices
Product of Matrices

What is a Vector?

Vectors can be considered as an array of numbers having two independent properties that is magnitude and direction where the order of the numbers also matters.

A vector is the simplest linear algebraic object. Vectors are usually represented by a lowercase bold letter like x, y. For example,

$\displaystyle x=\left[ {\begin{array}{*{20}{c}} {{{x}_{1}}} \\ {{{x}_{2}}} \\ . \\ . \\ . \\ . \\ . \\ . \\ . \\ {{{x}_{n}}} \end{array}} \right]$

Usually, there are two types of vectors i.e. row vectors and column vectors, examples of which are given below:-

Row Vector

$\displaystyle \left[ {\begin{array}{*{20}{c}} 1 & 2 & 3 \end{array}} \right] $

The length of this vector is 3.

Column Vector

$\displaystyle \left[ {\begin{array}{*{20}{c}} 1 \\ 2 \\ 3 \\ 4 \end{array}} \right] $

The length of this vector is 4.

The addition of two vectors can be performed by adding the corresponding elements of each vector.

The dot product or scalar product of two vectors is the addition of the product of the individual components of the two vectors.

Furthermore, the two vectors ‘a’ and ‘b’ are called orthogonal to each other if their dot product is zero. However, if both the orthogonal vectors have unit norm then they are known as orthonormal vectors.

A set of vectors (v₁, v₂, .., v_n) is known as linearly independent if no vector of the set can be denoted as a linear combination of other vectors.

Simply, you can say that a matrix is a collection of vectors. For example,

$\displaystyle \left[ {\begin{array}{*{20}{c}} 1 & 4 & 1 \\ 2 & 5 & 3 \\ 3 & 6 & 2 \end{array}} \right]$

In above example, there are three vectors. First vector contains 1, 2, 3, second vector contains 4, 5, 6 and third vector contain 1, 3, 2.

Therefore, a matrix has two dimensions M by n vectors, whereas, the vector has a single dimension ‘m’ by 1 while scalar has no dimension as shown below: –

$\displaystyle Matrix\,\,{{A}_{{m\times n}}}=\left[ {\begin{array}{*{20}{c}} {{{a}_{{11}}}} & {{{a}_{{12}}}} & {{{a}_{{13}}}} & {…..} & {{{a}_{{1n}}}} \\ {{{a}_{{21}}}} & {{{a}_{{22}}}} & {{{a}_{{23}}}} & {…..} & {{{a}_{{2n}}}} \\ {{{a}_{{31}}}} & {{{a}_{{32}}}} & {{{a}_{{33}}}} & {…..} & {{{a}_{{3n}}}} \\ . & . & . & . & . \\ . & . & . & . & . \\ . & . & . & . & . \\ . & . & . & . & . \\ . & . & . & . & . \\ . & . & . & . & . \\ {{{a}_{{m1}}}} & {{{a}_{{m2}}}} & {{{a}_{{m3}}}} & {…..} & {{{a}_{{mn}}}} \end{array}} \right]$

$\displaystyle Vector\,\,\,\,\,x=\left[ {\begin{array}{*{20}{c}} {{{x}_{1}}} \\ {{{x}_{2}}} \\ . \\ . \\ . \\ . \\ . \\ . \\ . \\ {{{x}_{m}}} \end{array}} \right]$

$\displaystyle Scalar\,\,=\,\,\,\left[ x \right]$ A scalar has no dimension and direction.

Now, let’s explore the linear algebra and geometry.

A two-dimensional space can be defined by two lines and two lines mean two vectors.

As we have already learned that a matrix is a collection of vectors in linear algebra. So, any two-dimensional space can easily be represented by a matrix.

The following two matrices can be graphically represented in two dimensions as

$\displaystyle \,\left[ {\begin{array}{*{20}{c}} 2 \\ 4 \end{array}} \right]$

$\displaystyle \left[ {\begin{array}{*{20}{c}} {-2} \\ {-4} \end{array}} \right]$

two dimensional matrix

You can see that, the direction is always from the origin of the graph to the end point. The vector -2 and – 4 is exactly opposite of the first one. Therefore, if we take two vector.

$\displaystyle \left[ {\begin{array}{*{20}{c}} 1 \\ 0 \end{array}} \right]$ and

$\displaystyle \,\left[ {\begin{array}{*{20}{c}} 0 \\ 1 \end{array}} \right]$

then by taking these two vectors together we can generate a matrix i.e. $\displaystyle \,\left[ {\begin{array}{*{20}{c}} 1 & 0 \\ 0 & 1 \end{array}} \right]$ which is made up of small portions of two axes x and y as shown in figure.

matrix graph

Linear Algebra Applications in Data Science

Vectorized Code (also known as Array Programming)
Image Recognition
- Deep Learning
- CNNs (Convolutional Neural Networks)
Dimensionality Reduction
- Eigenvalues
- Eigenvector

Next Post: Scalars, Vector and Matrices in Python

Read also: Statistics for Data Science

Logistic Regression (Python) Explained using Practical Example

Zubair Akhtar — Tue, 01 Oct 2019 14:40:46 +0000

Logistic Regression is a predictive analysis which is used to explain the data and relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables. It is mostly used in biological sciences and social science applications. For instance, predict whether received email is spam or not. Similarly, predict whether customer will purchase the product or not.

Statistical gadgets are used to conduct the analysis as logistic regression is bit difficult to interpret as compare to the linear regression.

There are quite a few kinds of logistic regression analysis are:

Binary Logistic Regression – 02 possible outcomes, e.g. email is spam or otherwise.
Multiple Logistic Regression – 03 or more categories with no ordering, e.g. during admission in college, students have various choices among general program, academic program or vocational program.
Ordinal Logistic Regression – 03 or more categories with ordering, e.g. mobile set rating from 1 to 5.

Logistic Regression Model

Practical example of Logistic Regression

Import the relevant libraries and load the data.

For quantitative analysis, we must convert ‘yes’ and ‘no’ entries into ‘0’ and ‘1’ as shown in figure.

Now we are going to visualize our data, we are predicting job. Therefore, the job is our Y variable and Code (use for education) will be our X variable.

Here, we observed that for all the observations below the outcomes is zero or they are jobless, whereas, for all the persons above the process are successfully got the job. Now we are going to plot a regression line as shown in below figure.

Linear regression is awesome technique but here it is not suitable for this kind of analysis as this regression does not know that our values are bounded between 0 and 1. Our data is non-linear, therefore, we must have to use non-linear approach. Hence, now we are going to plot a logistic regression curve.

This function depicts the probability of getting job, given an educational code. When the education is low, the probability of getting job is 0 or nill, whereas, the education is high, the probability of getting job is 1 or 100%.

It is clear from the above snap that, when the education is ‘BA’ the probability of getting job is about 60%.

Logistic Regression Summary is shown in below figure.

MLE is stands for Maximum likelihood estimation.

Likelihood function

It is a function that guess how likely it is that the model at hand defines the real fundamental relationship of the variables. Larger the likelihood function, larger the probability that our model is precise.

Maximum likelihood function tries to maximize the likelihood function. Computer going through various values till finds an appropriate model for which the likelihood is the optimum. When there is no more improvement is possible, it will just stop the optimization.

Pseudo R-squared (Pseudo R-squ) is mostly useful for comparing variation of the same model. Different models have the different pseudo R-squares. If the value of Pseudo R-square lies between 0.2 and 0.4, it is considered decent.

LL-Null is stands for Log Likelihood-null. The LL (log-likelihood) of a model which has no independent variables.

LLR is stands for Log Likelihood Ratio which measures if our model is statistically different from LL-Null.

Calculating the accuracy of the model

In order to find the accuracy of the model, we use the results_log.predict() command that return the value predicted by our model. Also apply some formatting to see the results more readable by using this command

np.set_printoptions(formatter={‘float’: lambda x: “{0:0.2f}”.format(x)})

Here, value less than 0.5 means chances of getting jobs is below 50% and the value 0.93 means the chances of getting job is 93%.

Now, we compare the actual value of the model with predicted value

If 90% of the predicted values of the model match with the actual values of the model, we say that the model has 90% accuracy.

In order to compare the predicted and actual values in form of table we use the results_log.pred_table() command as shown in figure.

This result is bit difficult to understand, so we take these results in form of confusion matrix, as shown in below figure

Let’s clear this confusion matrix, for 3 observations the model predicted 0 and the actual vale was also 0, similarly, for 9 observations the model predicted 1 and the actual value was also 1, therefore, the model did its good job here.

Furthermore, for 2 observations the model predicted 0 whereas, the actual value was 1, similarly, 1 observation the model predicted 1 and the actual value was 0, therefore, here the model got confused.

Finally, it depicts from these confusion matrix, the model made an accurate estimation in 12 out of 15 cases which means our model works with (12/15)*100 = 80% accuracy.

We can also calculate the accuracy of the model by using this code

cm = np.array(cm_df)
accuracy_model = (cm[0,0]+cm[1,1])/cm.sum()*100
accuracy_model

logistic regression python explained

How to install Anaconda (Python Distribution) on Windows

Zubair Akhtar — Mon, 09 Sep 2019 15:53:58 +0000

Anaconda Installation and setup guide for Windows. You will learn how to install Python on windows, step by step.

Anaconda is most widely used open source distribution to perform Python/R for data science and machine learning. Data Scientists can easily analyze the data and visualize the results by using Python/R.

Anaconda Download Link

How to install Anaconda (Python Distribution) on Windows

Select your operating system and download the software. Here we are installing anaconda on windows 10 64-bit operating system. Choose yours.

Let’s move to step by step process for installing python on your system.

Download the latest version of python 3.7

Now run the downloaded application.

Proceed to next step

Select users for whom you want to install anaconda.

Select the directory where you want to proceed installation.

You can sent can set the environment variables. Select the second option to detect anaconda as primary python on system.

Proceed to installation.

The installation has been completed. Now you can launch the Anaconda Navigator

42 Must Know Data Science Interview Questions and Answers

Tariq Aziz Rao — Mon, 02 Sep 2019 15:55:12 +0000

To get the Data Scientist job, you must have grip on practical as well as theoretical knowledge of data science. You should be fully prepared before going through interview. For your convenience, we have gathered 42 data science interview questions and their answers. These will enable you grab the basic concepts of data science.

Further Reading: Introduction to Data Science (Beginner’s Guide)

Data Science Interview Questions

Q1. What is Data Science?

Data Science deals with the processes of data mining, cleansing, analysis, visualization, and actionable insight generation. Data Science is the mining and analysis of relevant information from data to solve analytically complicated problems. It is the most widely used technique between Artificial Intelligence and Machine Learning. For example, when you logged on any e-commerce website and browsed some categories and products before purchase, you are generating data, which will be helpful for analysts to know your behavior about purchase.

Q2. What are the differences between supervised and unsupervised learning?

In supervised learning, all the data is labeled and the algorithms forecast the output from the input data, whereas, in unsupervised learning, all data is unlabeled and algorithms study to inherent structure from the input data.

Supervised machine learning can be categorized into the following:-

Classification – where the output variable is a category like black or white, plus or minus. Naïve Bayes, Support Vector Machine, Decision Tree are the most popular supervised machine learning algorithms.
Regression – where the output variable is a real value like quantity.

Un-supervised machine learning can be categorized into the following: –

Clustering – where you find out the inherent groupings like grouping clients by procuring behavior. K-means clustering, hierarchical clustering, and density-based spatial clustering are more popular clustering algorithms.
Association – where you find out rules that label large slices of your data.

Q3. What are Recommender Systems?

A subclass of data sifting frameworks that are intended to anticipate the inclinations or evaluations that a client would provide for an item. Recommender systems are generally utilized in music, pictures, research, news, articles, social labels, and so on.

Q4. Can you utilize machine learning for time series analysis?

Yes, machine learning can be utilized for time series analysis but it depends on the applications.

Q5. How will you assess the statistical importance of an insight? whether it is a real insight or just by chance?

By utilizing Hypothesis Testing, we can assess the statistical significance of an insight.

Q6. For text analytics, Python or R which one would you give the preference?

Python is the best choice for text analytics as it has Pandas library that provides easy to use data structures and better performance data analysis gadgets.

Q7. Which method is utilized to forecast categorical responses?

Supervised machine learning i.e. Classification technique is widely utilized in mining for classifying data sets.

Q8. What are the basic expectations to be made for linear regression?

Statistical independence of errors, normality of error distribution, linearity and additivity.

Q9. What is the difference between Data Science and Machine Learning?

Data Science deals with the processes of data mining, cleansing, analysis, visualization, and actionable insight generation, whereas, Machine Learning is the part of Data Science which enables the system to process datasets autonomously without any human interference by utilizing various algorithms to work on massive volume of data generated and extracted from numerous sources.

Q10. What is the formula to calculate R-square?

R-Square can be calculated as:-

1 – (Residual Sum of Squares/ Total Sum of Squares)

Q11. What basic knowledge required for Data Scientist?

Data Scientist must have the basic knowledge of mathematics, computer programming and statistics to solve the complex data problems in an efficient way to boost the business revenue.

Q12. Names of basic models of Machine Learning?

There are two basic models of Machine learning are:-

Supervised Machine Learning
Unsupervised Machine Learning

Q13. Do you know about Interpolation and Extrapolation?

Interpolation is assessing a value from two known values from a list of values, whereas, extrapolation is assessing a value by extending a known set of values or evidences.

Q14. What are the basic benefits of Data Science?

Data Science helps in finding and refining of target viewers. It ensure better communication between service providers and service utilizers. Also improved business value and better risk analysis

Q15. Do you know power analysis?

Power Analysis is an experimental design method for determining the effect of a given sample size.

Q16. What are the basic expertise required for Data Science?

Mathematics
Statistics
Programming Skills
Data warehousing
Machine Learning
Software Engineering
Data visualization & communication

Q17. What is Collaborative filtering?

It is used by the recommender systems to find patterns or information by collaborating viewpoints, several data sources and various agents.

Q18. What are the top tools utilized in Data Science?

R (a language for statistical computing and graphics)
Python
Tableau
Keras
Jupyter Notebook

Q19. Are expected value and mean value different or otherwise?

No difference, but the terms are used in different situations. Generally, mean is referred when we talking about a probability distribution or sample population, while, expected value is referred in a random variable situation. For sampling data, mean value is the only value that comes from the sampling data, whereas, expected value is the mean of all the means (the value that is built from several samples). For distributions, mean value and expected value are the same regardless of the distribution, under the condition that the distribution is in a similar population.

Q20. What are the main processes of Data Science?

Data Exploration:
Modeling:
Model Testing:
Model deployment:

Q21. Is data cleaning plays an important role in analysis?

Yes, data cleaning is played an important role in analysis as the number of data sources increases, so, the time is consumed in cleaning data also increases due to the number of sources and the volume of data generated in these sources. About 80% of the time increased for just cleaning data, so, it is an important part of analysis.

Q22. Name any industry players of Data Science?

Google – Google hire best data scientists from all over the world and offers the absolute best data science pay rates.

Amazon – Amazon is a worldwide online business and distributed computing mammoth that is contracting data scientists on a major scale. They hire a data scientist to get some answers concerning the client mentality, upgrade the geographical contact of both the web based business area and cloud space among different business-driven objectives.

Visa – It is online money related portal for the majority of the organizations and Visa does exchanges in the scope of several million throughout a day. Because of this, the necessity for data scientists is colossal at Visa to create more income, check false exchanges, and alter the items and administrations according to the client prerequisites.

Q23. What is the difference between univariate, bivariate and multivariate analysis.

Univariate, Bivariate and Multivariate analysis are descriptive statistical analysis techniques that can be distinguished on the number of variables involved at a given point of time. For instance, the pie charts of sales based on area involve only one variable, so, it is known as univariate analysis. If the analysis goes to understand the difference between two variables at a time as in a scatter plot, then it is known as bivariate analysis. For instance, analyzing the volume of sale and spending can be measured as an instance of bivariate analysis. Multivariate analysis deals with more than two variables.

Q.24 What is the difference between Cluster and Systematic Sampling?

Cluster sampling – It is a technique which can be utilized used when it becomes hard to study the target population spread across an extensive area and simple random sampling cannot be functional.

Systematic sampling – It is a statistical technique which can be utilized where elements are nominated from an ordered selection frame. Equal probability is the best example of systematic sampling.

Q25. Do gradient descent methods always converge to the same point?

Gradient descent methods don’t always converge to the same point as in few cases it reaches a local minima or a local optima point but we don’t reach the global optima point as it based on the data and starting situations.

Q26. What is the basic purpose of A/B Testing?

Basically, A/B Testing is a statistical hypothesis testing for randomized research with two variables A and B. The basic purpose of A/B Testing is to recognize any changes to the web page in order to increase or maximize the result of interest. For instance, recognizing the click-through rate for a banner advertisement.

Q27. What is Machine Learning?

Machine Learning is the part of Data Science which enables the system to process datasets autonomously without any human interference by utilizing various algorithms to work on a massive volume of data generated and extracted from numerous sources. A social media platform i.e. Facebook is a decent example of machine learning implementation where fast and furious algorithms are used to gather the behavioral information of every user on social media and recommend them appropriate articles, multimedia files and much more according to their choice.

Q28. What are the applications of Data Science?

Internet Search Engines
Speech Recognition
Recommender Systems
Self-driving cars
Image Recognition
Comparative analysis of Price
Fraud and risk detection
Gaming
Robotics
Airline route planning

Q29. What is the difference between a Test Set and a Validation Set?

Validation set is used for parameter selection and to avoid overfitting of the model being made, so, it can be considered as a part of the training set, whereas, the test set is used for testing or assessing the performance of a trained machine learning model. Furthermore, training set is to fit the parameters while the validation set is to tune the parameters.

Q30. How can you assess a good logistic model?

Various techniques are being used to assess the outcome of a logistic regression analysis-

By utilizing Classification Matrix to see the true negatives and false positives.
Harmony which helps identify the ability of the logistic model to distinguish between the event happening or not.

Q31. What is the basic objective of clustering?

The basic aim of clustering is to group the related entities in a way that the entities within a group are alike to each other but the groups are dissimilar from each other. In K-Means clustering, “K” defines the number of clusters.

Q.32 What is the difference between Eigen Value and Eigen Vector?

Eigen Vectors are used for understanding linear transformation and we usually calculate the eigenvector for correlation or covariance matrix, whereas, Eigen Value can be referred to as the strength of the transformation in the direction of Eigen Vector.

Q33. What steps are involved in making a Decision Tree.

Take the whole data set as input.
Look for a split that maximize the division of the classes. A split is any test that divides the data into two sets.
Apply the split to the input data (divide step).
Re-apply steps I to II to the separated data.
Stop when you meet some stopping criteria.
This step called pruning. Clean up the tree if you went too far doing splits.

Q34. Do you know about selective bias.

Selection bias is a problematic situation in which error is launch due to a non-random population section.

Q35. What types of biases can occur during sampling?

Selection bias
Survivorship bias
Under coverage bias

Q36. What are the different kernel functions in Support Vector Machine?

Four types of kernels in Support Vector Machine.

Linear Kernel
Polynomial kernel
Sigmoid kernel
Radial basis kernel

Q37. What is pruning in Decision Tree?

The process of removing sub-nodes of a decision node is called pruning or reverse process of splitting.

Q38. What is deep learning?

It is a sub-field of machine learning inspired by structure and role of brain called Artificial Neural Network (ANN). Deep learning is an extension of Neural Network while there are a lot of algorithms under machine learning like Linear Regression, Support Vector Machine (SVM), Neural Network, etc.

Q39. What statistical methods are useful for data-scientist?

Bayesian method
Markov process
Simplex algorithm
Mathematical optimization
Spatial and cluster processes
Rank statistics, percentile, outlier’s detection
Imputation techniques

Q40. What tools are utilized for data analysis?

RapidMiner
Tableau
KNIME
Google Fusion Tables
Google Search Operators
Solver
io
NodeXL
OpenRefine

Q41. What are the properties of clustering algorithms?

Hard and soft
Hierarchical or flat
Iterative
Disjunctive

Q42. In how many domains Time Series Analysis performed?

Time Series Analysis can be performed in the following two domains:-

Time domain
Frequency domain.

This article will also be helpful for you in interview preparation. 7 Step Process to Ace Data Science Interviews.

Data Science interview questions with answers

Download Data Science Interview Questions pdf

Different Types of Probability Distribution (Characteristics & Examples)

Tariq Aziz Rao — Mon, 26 Aug 2019 15:04:02 +0000

What is distribution?

A distribution represent the possible values a random variable can take and how often they occur.

Mean – it represent the average value which is denoted by µ (Meu) and measured in seconds.

Variance – it represent how spread out the data is, denoted by σ² (Sigma Square). It is pertinent to note that it cannot be measured in seconds square which make no sense,therefore, variance is measured by Standard Deviation which is the square root of variance √σ² and has the same unit as means.

There are two kinds of data i.e. population data and sample data.

Population and Sample Data Notation:

The more overfilled the mid of the distribution, the more data falls within that interval as show in figure

The fewer data falls within the interval, the more spread the data is, as shown in figure

Notation of Distributions:

Y – Actual outcome

y – one of the possible outcomes

P(Y=y) – Probability distribution which is equal to p(y)

Types of Probability Distribution Characteristics, Examples, & Graph

Types of Probability Distributions

Two major kind of distributions based on the type of likely values for the variables are,

Discrete Distributions
Continuous Distributions

Discrete Distribution Vs Continuous Distribution

A comparison table showing difference between discrete distribution and continuous distribution is given here.

Discrete Distributions	Continuous Distribution
Discrete distributions have finite number of different possible outcomes	Continuous distributions have infinite many consecutive possible values
We can add up individual values to find out the probability of an interval	We cannot add up individual values to find out the probability of an interval because there are many of them
Discrete distributions can be expressed with a graph, piece-wise function or table	Continuous distributions can be expressed with a continuous function or graph
In discrete distributions, graph consists of bars lined up one after the other	In continuous distributions, graph consists of a smooth curve
Expected values might not be achievable	To calculate the chance of an interval, we required integrals

Notation Explanation:

Here, X is variable, ~ tilde, N is types of distribution and ( µ, σ²) are its characteristics.

1. DISCRETE DISTRIBUTIONS:

Discrete distributions have finite number of different possible outcomes.

Characteristics of Discrete Distribution

We can add up individual values to find out the probability of an interval
Discrete distributions can be expressed with a graph, piece-wise function or table
In discrete distributions, graph consists of bars lined up one after the other
Expected values might not be achievable
P(Y≤y) = P(Y < y + 1)

In graph, the discrete distributions looks like as,

Examples of Discrete Distributions:

Bernoulli Distribution
Binomial Distribution
Uniform Distribution
Poisson Distribution

1.1 Bernoulli Distribution

In Bernoulli distribution there is only one trial and only two possible outcomes i.e. success or failure. It is denoted by y ~Bern(p).

Characteristics of Bernoulli distributions

It consists of a single trial
Two possible outcomes
E(Y) = p
Var(Y) = p × (1 – p)

Examples and Uses:

Guessing a single True/False question.
It is mostly used when trying to find out what we expect to obtain a single trial of an experiment.

1.2 Binomial Distribution

A sequence of identical Bernoulli events is called Binomial and follows a Binomial distribution. It is denoted by Y ~B(n, p).

Characteristics of Binomial distribution

Over the n trials, it measures the frequency of occurrence of one of the possible result.
E(Y) = n × p
P(Y = y) = C(y, n) × p^y× (1 – p)^n-y
Var(Y) = n × p × (1 – p)

Examples and Uses:

Simply determine, how many times we obtain a head if we flip a coin 10 times.
It is mostly used when we try to predict how likelihood an event occurs over a series of trials.

1.3 Uniform Distribution

In uniform distribution all the outcomes are equally likely. It is denoted by Y ~U(a, b). If the values are categorical, we simply indicate the number of categories, like Y ~U(a).

Characteristics of Uniform Distribution

In uniform distribution all the outcomes are equally likely.
In graph, all the bars are equally tall
The expected value and variance have no predictive power

Examples and Uses:

Result obtained after rolling a die
Due to its equality, it is mostly used in shuffling algorithms

1.4 Poisson Distribution

Poisson distribution is used to determine how likelihood a certain event occur over a given interval of time or distance. It is denoted by Y ~ Po( λ ).

Characteristics of poisson distribution

It measures the frequency over an interval of time or distance.

Examples and Uses

It is used to determine how likelihood a certain event occur over a given interval of time or distance.
Mostly used in marketing analysis to find out whether more than average visits are out of the ordinary or otherwise.

2. CONTINUOUS DISTRIBUTIONS:

Continuous distributions have infinite many consecutive possible values.

Characteristics of Continuous Distributions

We cannot add up individual values to find out the probability of an interval because there are many of them
Continuous distributions can be expressed with a continuous function or graph
In continuous distributions, graph consists of a smooth curve
To calculate the chance of an interval, we required integrals
P(Y = y) = 0 for any distinct value y.
P(Y

Examples of Continuous Distributions

Normal Distribution
Chi-Squared Distribution
Exponential Distribution
Logistic Distribution
Students’ T Distribution

2.1 Normal Distribution

It shows a distribution that most natural events follow. It is denoted by Y ~ (µ, σ²). The main characteristics of normal distribution are:

Characteristics of normal distribution

Graph obtained from normal distribution is bell-shaped curve, symmetric and has shrill tails.
68% of all its all values should fall in the interval, i.e. (µ – σ , µ+ σ )
E(Y) = µ
Var(Y) = σ²

Examples and Uses

Normal distributions are mostly observed in the size of animals in the desert.
We can convert any normal distribution into a standard normal distribution. Normal distribution could be standardized to use the Z-table.

Where, σ ensures standard deviation is 1 and µ ensures mean is 0.

2.2 Chi-Squared Distribution

Chi-Squared distribution is frequently being used. It is mostly used to test wow of fit. It is denoted by Y ~ X²(k).

Characteristics of Chi-Squared distribution

The graph obtained from Chi-Squared distribution is asymmetric and skewed to the right.
It is square of the t-distribution.
E(Y) = k
Var(Y) = 2k

Examples and Uses:

It is mostly used to test wow of fit.
It comprises a table of known values for its CDF called the x² – table.

2.3 Exponential Distribution

It is usually observed in events which considerably change early on. It is denoted by Y ~ Exp(λ).

Characteristics of exponential distribution

Probability and Cumulative Distributed Functions (PDF & CDF) plateau after a certain point.
We do not have a table to known the values like the Normal or Chi-Squared Distributions, therefore, we mostly used natural logarithm to change the values of exponential distributions.

Examples and Uses

It is mostly used with dynamically changing variables, such as online websites traffic.

2.4 Logistic Distribution

It is used to observe how continuous variable inputs can affect the probability of a binary result. It is denoted by Y ~ Logistic(µ, s).

Characteristics of logistic distribution

The Cumulative Distributed Function picks up when we reach values near the mean.
The lesser the scale parameter, the faster it reaches values close to 1.

Examples and Uses

It is mostly used in sports to predict how a player’s or team’s feat can conclude the result of the match.

2.5 Students’ T Distribution

Students’ T Distribution or simply called T Distribution is used to estimate population limitation when the sample size is small and population variance is not known. It is denoted by Y~ t(k).

Characteristics of Students’ T Distribution

A small sample size estimation of a normal distribution
Its graph is symmetric and bell-shaped curve, however, it has large tails.

Examples and Uses

It is used in examination of a small sample data which usually follows a normal distribution.

Download: Types of Probability Distribution pdf

Probability for Data Science

Zubair Akhtar — Thu, 22 Aug 2019 17:53:24 +0000

Probability is simply defined as a chance of something happening or the likelihood of an event is to happen.

Probability played a vital role in decision making that’s why companies’ executives based on it before going to take any decision prior to investment.

Probability for Data Science

An event can be likely or unlikely and having a specific outcome or several outcomes. Probability can be expressed numerically, percentage or fraction (33%, 1/6, 0.30).

Probability of an event A is denoted as P(A). General formula of probability is given below: –

Probability of an event A is lies between 0 ≤ P(A) ≤ 1
If P(A) > P(B) then event A is more likely to occur than event B.
If P(A) = P(B) then events A and B are equally likely to occur.

Example: What is the probability of getting tail when tossing a coin?

Solution:

Sample Space = {Head, Tail}
Total number of possible outcomes = 2
Number of favorable outcomes = 1 (because of only one head or tail)

probability of getting tail

Experimental Probability:

It is the ratio of the number of success trials to the total number of trials is performed.

For instance, if a dice is rolled 500 times and the number ‘6’ occurs 100 times, then the experimental probability that ‘6’ shows up on the dice is 100/500=0.2

Let’s discuss trial and experiment as both of these terms are different from each other.

Trial – In trial, we observe an event occurring and recording its outcome, whereas,

Experiment –is the collection of one or more trials

Trial	Experiment
Rolling a dice and recording the outcome	Rolling a dice 6 times and recording the 6 individual outcomes

Above described theoretical probability cannot be calculated in some cases, so we need to depend on experimental probability. Experimental probability describes the probability of an event happening when an experiment is conducted. It is commonly used in research and experiments.

Expected Value:

It is a specific outcome that we expect to occur when we run an experiment. Expected value can be Boolean, numerical, categorical or some other its depend on the type of event which is going to occur. The basic expected value formula is given below: –

E(X) = P(X) × n

For example, E(A) = P(A) x n = 0.50 x 30 = 15

If there are multiple probabilities then expected value formula will be E(X) = ∑X P(X)

For instance, when we roll a six-sided die, it has an equal chance i.e. 1/6 of landing on 1, 2, 3, 4, 5, or 6. So, we can calculate as follow:

(1/6 × 1) + (1/6 × 2) + (1/6 × 3) + (1/6 × 4) + (1/6 × 5) + (1/6 ×6) = 3.5

If we roll a six-sided die an infinite amount of times, we will get an average value equals 3.5.

Probability Frequency Distribution

Probability Frequency Distribution is a collection of the probabilities for each possible outcome. It is a way to express how frequently an even can occur.

Example:

If there are 30 girls, 15 had black hair, 5 had brown hair, 5 had blond hair and 5 had red hair. Find out the probability a girl has neither blond nor red hair.

Solution:

Hair Color	Frequency	Probability
Black	15	15/30
Brown	5	5/30
Blond	5	5/30
Red	5	5/30

Probability of a girl having black hair = 15/30

Probability of a girl having brown hair = 5/30

Total number of girls who have black or brown hair = 15/30 + 5/30 = 20/30 Therefore,20 out of 30 girls have either black or brown hair.

Complements:

A’ or A^cis a complement of A which means everything an event is not

A + A^c= Sample space

A + A^c= 1

A = 1 – A^c

(A^c)^c = A

Similarly,

A^c = B + C

It is important to note that sum of all possibilities must be equal to one P(A) + P(B) + P(C) = 1 (which means 100% certain)

We can explain the above equation with this example, a coin having two sides i.e. A & B, here, A is denoted as Head and B is denoted as Tail, then

If P = 1, absolute certainty
If P = 1.5, does not make sense
If P < 1, Event not guaranteed to be occurred

Example:

When we roll a die five times we get, 1, 2, 3, 5, 6 and 4 not appeared.

Solution:

Here,

A → 1, 2 , 3 , 5 , 6 B’ → 4 (not 4)

P(A) = P(1) + P(2) + P(3) + P(5) + P(6) = 1/6 + 1/6 + 1/6 + 1/6 + 1/6 = 5/6 P(B’) = P(4) = 1/6 (complement of B shows the absence of 4)

Then,

P(B) = 1 – P(B’) = 1 – 1/6 = 5/6

Therefore, P(A) = 5/6 = P(B)

Combinatorics

It is a branch of mathematics which deals with the combination of objects belonging to a specific finite set. The important parts of combinatorics are,

Permutations
Variations
Combinations

i. Permutations

It is a number of different possible ways we can arrange a set of elements where elements can be object, digit or people.

P(n) = n × (n-1) × (n-2) ×………………………..× 1 = n!

There is no repetition. For instance, if we have to arrange 3 students in a row then we have P(3) = 3! = 6 ways to arrange the students.

Factorial:

It is simply the product of a series of integers from 1 to n. It is nothing more than a notation which is denoted by a sign !

n! = 1 × 2 × 3 × …………………………………..× n

5! = 1 × 2 × 3 × 4 × 5

Note:

0! = 1
If n < 0 then n! does not exist
There is no factorial of a negative number

Important properties:

Example:

How many two letters arrangements could be made from the letters in “SPEAK”.

Solution:

We know that,

Two dependent task possibilities:

If we can perform an operation in two different ways i.e. m and n which are dependent on each other then the two operations can be performed in m × n ways.

Example:

If a stadium has 4 gates then how many ways a person can enter the stadium through one gate and come out from another gate?

Solution: In this situation, person has a choice to enter through 4 gates and come out from 3 gates, therefore, total number of ways are 4 × 3 = 12

Two independent task possibilities:

If we can perform an operation in two different ways i.e. m and n which are independent on each other then the two operations can be performed in m + n ways.

Example:

In a classroom, there are 20 students in which 12 are boys and 8 are girls. Class teacher intends to select a monitor of the class which is either a girl or boy then how many ways the class teacher can make the selection of monitor?

Solution:

In this situation, class teacher has a choice to select a monitor from 12 boys and 8 girls, so, number of ways are 12+8 = 20

ii. Variations

Variation is the total number of different possible ways we can pick and arrange the same element of a given set.

Variation with repetition:

Where, n = numbers of different elements we have available p = total number of elements we are going to arrange

Example:

If we have three alphabets a, b & c and 2 positions in which we can arrange them then how many different possible ways to pick and arrange these alphabets?

Solution:

iii. Combinations

It represents the number of different possible ways we can pick anumber of elements of a set.

Combination with repetition:

Combination without repetition:

For instance, pick up 4 students out of 10 students to send them for quiz program, then

Caution:

All the different permutations of a single combination are different variation.

If you pick more elements then you have few combinations.

Symmetry of combinations:

We apply symmetry of combinations in order to avoid calculating factorial of large numbers and to simplify calculations.

For instance, pick up 6 students out of 10 students who don’t attend the quiz program, then

Sometime, a combination can be a mixture of various smaller individual events, in such cases, we simply multiplying the number of options available for each individual event. For instance, if we go for lunch in a restaurant, there are 3 different kinds of juices and 12 dishes then we simply multiple 3 x 12 = 36

BAYESIAN NOTATIONS:

It is a collection of elements having certain values and every event has a set of outcomes that satisfy it. A set which has no value called null set or empty set which is denoted as Ø.

Element is denoted by small letter like ‘x’, whereas, set is denoted by capital letter like ‘A’,

BAYESIAN NOTATIONS

Multiple Events:

In multiple events, there are two or more events.

Events never touch:

If two events never touch at all as shown in figure

Means, these events never happened simultaneously. If event A occur then guarantees that event B is not occurring and if event B occur then guarantees that event A is not occurring.

Events partially intersect:

Events partially intersect or overlap means two events can occur at the same time as shown in figure

Events completely overlap:

Events completely overlap means one event can only occur when other event occur as well as shown in figure

In this situation, if event A does not occur then there is a guarantee that event B will not be occurred but in case of event B not occurring does not guarantee event A not occur.

More preciously, if an outcome is not part of a set then we assured that it cannot be the part of any of its subsets. Similarly, an outcome not being part of some subset does not exclude from the total of the larger set.

INTERSECTIONS:

The intersections of two or more events show the set of outcomes that are favorable for both events A and B concurrently. Generally, we use intersection in such cases where both events happened simultaneously.

UNIONS:

The union of two or more events shows a combination of all outcomes that are performed for either A or B.

MUTUALLY EXCLUSIVE SETS:

Mutually exclusive sets are not allowed to have any overlapping elements. They have the empty set as their intersection. If the intersection of any number of sets is empty set, then they must be mutually exclusive.

COMPLEMENT SET:

Complement set is a set in which all values that are the part of the sample space but not part of the set. Complement set is not equal to mutually exclusive set because complement sets are always mutually exclusive sets but not vice versa. Let us try to clear this concept with this example,

If set A contain all odd numbers and set B contain all even numbers as shown in figure

Dependent Events:

If the likelihood of an event A happening is affected another event B happening then we say that A and B are dependent events. Here, outcome of event A depend on the outcome of another event B.

Independent Events:

If the likelihood of an event A happening is not affected another event B happening then we say that A and B are independent events. Here, outcome of event A does not depend on the outcome of another event B.

CONDITIONAL PROBABILITY:

The likelihood of an event A is occurring, given event B has already happened. The formula of conditional probability is given below: –

conditional probability

P (A | B), it can be read as “conditional probability of A, given B”

If P(B) > 0 then event B is occurred
If P(B) = 0 then event B would never occurred

It is important to note that, P (A | B) is not same as P (B | A)

ADDITIVE LAW:

The probability of the union of two sets A and B is equal to the sum of its individual probabilities of each event minus probability of their intersection.

Additive Law

MULTIPLICATION RULE:

This rule is used to calculate the probability of the intersection based on the conditional probability.

Example:

If an event B occur in 60% of the time then P(B) = 0.6 and event A occur in 30% of the time B occurs then P(A|B) = 0.3 then,

P(A|B) × P(B) = 0.3 × 0.6 = 0.18

They would simultaneously occur in 18% of the time.

BAYES’ LAW OR BAYES’ THEOREM:

This law is helpful to understand the relationship between two events by calculating the different conditional probabilities. It is used in medical research to find out the fundamental relationship between symptoms. For instance, 60% of patients with headache wear glasses while 35% of patients with eyesight issues have headache.

Mathematically, Bayes’ theorem is given by,

Probability for data science

View Part (2) :

Different Types of Probability Distribution
(Characteristics & Examples)

Download: Probability for Data Science pdf