python – DatabaseTown https://databasetown.com Data Science for Beginners Tue, 14 Mar 2023 14:50:51 +0000 en-US hourly 1 https://wordpress.org/?v=6.4.2 https://databasetown.com/wp-content/uploads/2020/02/dbtown11-150x150.png python – DatabaseTown https://databasetown.com 32 32 165548442 Transpose of a Matrix in Python https://databasetown.com/transpose-of-a-matrix-in-python/ https://databasetown.com/transpose-of-a-matrix-in-python/#respond Thu, 13 Feb 2020 15:41:41 +0000 https://databasetown.com/?p=3103 A transpose of a matrix is obtained by interchanging all its rows into columns or columns into rows. It is denoted by \(\displaystyle {{A}^{t}}\) or \(\displaystyle {{A}^{‘}}\). For example,

If  \(\displaystyle A=\left[ {\begin{array}{*{20}{c}} 1 & 2 & 3 \\ 4 & 3 & 5 \\ 2 & 6 & 2 \end{array}} \right]\,\) then  \(\displaystyle A=\left[ {\begin{array}{*{20}{c}} 1 & 4 & 2 \\ 2 & 3 & 6 \\ 3 & 5 & 2 \end{array}} \right]\)

In transpose of a matrix, the values of matrix are not changing, only their positions are changing. When we take the transpose of a same vector two times, we again obtain the initial vector. Further, A m x n matrix transposed will be a n x m matrix as all the rows of a matrix turn into columns and vice versa.

Let’s start a practical example of taking a transpose of a matrix in python…

First, we import the relevant libraries in Jupyter Notebook. We use NumPy, a library for the python programming that allows us to work with multidimensional arrays and matrices along with a large collection of high-level mathematical functions to operate on these arrays as shown below,

Transpose of a Matrix

Transpose of a Matrix in Python

Let’s see more examples of transpose of a matrix …

Transpose of a Matrix in Python
Transpose of a Matrix in Python

In above examples, you can see that, only rows of two matrices B and C are changed into columns and columns into rows.

Let’s see an example of taking transpose of a scalar and a vector.

transpose of a scalar and vector

It transpires from above examples that when we take a transpose of a scalar or vector, we obtain the same result because in python one dimensional array do not get transposed.

However, if we intend to get the transpose of a vector then we will reshape it into 4 x 1 matrix or 2-dimensional array, let’s do it practically,

transpose of a vector

Further reading

]]>
https://databasetown.com/transpose-of-a-matrix-in-python/feed/ 0 3103
Dot Product of Two Matrices in Python https://databasetown.com/dot-product-of-two-matrices-in-python/ https://databasetown.com/dot-product-of-two-matrices-in-python/#respond Wed, 05 Feb 2020 17:13:22 +0000 https://databasetown.com/?p=3039 The product of two matrices A and B will be possible if the number of columns of a Matrix A is equal to the number of rows of another Matrix B. A mathematical example of dot product of two matrices A & B is given below.

If

\(\displaystyle A=\left[ {\begin{array}{*{20}{c}} 1 & 2 \\ 3 & 4 \end{array}} \right]\)

and

\(\displaystyle B=\left[ {\begin{array}{*{20}{c}} 3 & 2 \\ 1 & 4 \end{array}} \right]\)

Then,

\(\displaystyle AB=\left[ {\begin{array}{*{20}{c}} 1 & 2 \\ 3 & 4 \end{array}} \right] \left[ {\begin{array}{*{20}{c}} 3 & 2 \\ 1 & 4 \end{array}} \right]\)

\(\displaystyle AB=\left[ {\begin{array}{*{20}{c}} {1\times 3+2\times 1} & {1\times 2+2\times 4} \\ {3\times 3+4\times 1} & {3\times 2+4\times 4} \end{array}} \right]=\left[ {\begin{array}{*{20}{c}} {3+2} & {2+8} \\ {9+4} & {6+16} \end{array}} \right]\)

\(\displaystyle AB=\left[ {\begin{array}{*{20}{c}} 5 & {10} \\ {13} & {22} \end{array}} \right]\)

Let’s start a practical example of dot product of two matrices A & B in python. First, we import the relevant libraries in Jupyter Notebook.

Dot Product of two Matrices

Let’s see another example of Dot product of two matrices C and D having different values.

If all the diagonal elements of a diagonal matrix are same, then it is called a Scalar Matrix. We can also take the dot product of two scalars which result will also a scalar, like this

Linear Algebra is mostly concerned with operations on vectors and matrices. Let’s take an example of dot product of one scalar and one vector…

It is clear from above snap that, the result obtained after taking dot product of a scalar and a vector is also a vector because a scalar value i.e. 2 is multiplied with each value of a vector i.e. 1, 2, 3 & 4 and we obtained a vector having values 2, 4, 6 & 8.

]]>
https://databasetown.com/dot-product-of-two-matrices-in-python/feed/ 0 3039
Matrix Addition and Subtraction in Python https://databasetown.com/matrix-addition-and-subtraction-in-python/ https://databasetown.com/matrix-addition-and-subtraction-in-python/#respond Tue, 28 Jan 2020 17:45:04 +0000 https://databasetown.com/?p=3003 Matrix Addition and Subtraction in Python programming language is performed like the normal algebraic operations.

Before discussing these operations, it is necessary to introduce a bit about Algebra which has been taken from the Arabic word Al-Jabar, afterward, this word turned into Algebra.

Algebra is a branch of Mathematics that provides an easy solution to many complex mathematical problems especially when quantity is represented by a sign without any arithmetical value.

Addition of Matrices

Generally, the addition of two matrices A and B is possible if they have the same orders. The addition of two matrices A and B is denoted by A + B. Mathematical example of addition of two matrices is given here.

If \(\displaystyle A=\left[ {\begin{array}{*{20}{c}} 1 & 2 & 3 \\ 4 & 5 & 6 \end{array}} \right]\) and \(\displaystyle B=\left[ {\begin{array}{*{20}{c}} 2 & 3 & 4 \\ 5 & 2 & 4 \end{array}} \right]\)

Then,

\(\displaystyle A+B=\left[ {\begin{array}{*{20}{c}} 1 & 2 & 3 \\ 4 & 5 & 6 \end{array}} \right]+\left[ {\begin{array}{*{20}{c}} 2 & 3 & 4 \\ 5 & 2 & 4 \end{array}} \right]\) \(\displaystyle =\left[ {\begin{array}{*{20}{c}} {1+2} & {2+3} & {3+4} \\ {4+5} & {5+2} & {6+4} \end{array}} \right]=\left[ {\begin{array}{*{20}{c}} 3 & 5 & 7 \\ 9 & 7 & {10} \end{array}} \right]\)

Let’s start a practical example of addition of two matrices in python…

First, we import the relevant libraries in Jupyter Notebook as shown below,

import numpy as np

We use NumPy, a library for the python programming that allows us to work with multidimensional arrays and matrices along with a large collection of high-level mathematical functions to operate on these arrays.

Now we perform the addition of two matrices A and B like this,

addition of matrix in python

Subtraction of Matrices

The subtraction of one matrix from another matrix will be possible if they have the same orders and subtraction of two matrices A and B is denoted by A – B. Mathematical example of subtraction to two matrices is given below

If

\(\displaystyle A=\left[ {\begin{array}{*{20}{c}} 1 & 2 & 3 \\ 5 & 6 & 7 \end{array}} \right]\) and \(\displaystyle B=\left[ {\begin{array}{*{20}{c}} 2 & 5 & 3 \\ 4 & 1 & 8 \end{array}} \right]\)

Then,

\(\displaystyle A-B=\left[ {\begin{array}{*{20}{c}} 1 & 2 & 3 \\ 5 & 6 & 7 \end{array}} \right]-\left[ {\begin{array}{*{20}{c}} 2 & 5 & 3 \\ 4 & 1 & 8 \end{array}} \right]\)

\(\displaystyle =\left[ {\begin{array}{*{20}{c}} {1-2} & {2-5} & {3-3} \\ {5-4} & {6-1} & {7-8} \end{array}} \right]=\left[ {\begin{array}{*{20}{c}} {-1} & {-3} & 0 \\ 1 & 5 & {-1} \end{array}} \right]\)

Let’s start a practical example of subtraction of two matrices in python. We have already imported the relevant libraries in Jupyter Notebook as mentioned above, therefore, now we perform the subtraction of two matrices A and B like this,

subtraction of matrix in python

Let’s see an example of addition of two vectors but we must care about the length of these two vectors that should be same.

addition of two vectors in python

An example of subtraction of two vectors having same length is given below,

subtraction of two vectors in python

Related post: Linear Algebra for Data Science

]]>
https://databasetown.com/matrix-addition-and-subtraction-in-python/feed/ 0 3003
Implementing Support Vector Machine (SVM) in Python https://databasetown.com/implementing-support-vector-machine-svm-in-python/ https://databasetown.com/implementing-support-vector-machine-svm-in-python/#respond Tue, 05 Nov 2019 16:41:33 +0000 https://databasetown.com/?p=2766 Machine Learning is the most famous procedure of foreseeing the future or arranging data to help individuals in settling on essential choices.

The algorithms are trained over models through which they gain information from past encounters so as to make forecasts about what’s to come.

There are three types of Machine learning i.e. supervised learning, unsupervised learning and reinforcement learning.

In this article, I want to acquaint you with a predominant machine learning technique known as Support Vector Machine (SVM).

Before we start it formally, it is essential to know about supervised machine learning: –

Supervised Machine Learning

In supervised machine learning, a labeled dataset is used. You must have input variables (X) and output variables (Y) then you apply an appropriate algorithm to find the mapping function from input to output.

Y = f(X)

Supervised machine learning can be categorized into the following:-

  1. Classification – where the output variable is a category like black or white, plus or minus. Naïve Bayes (NB), Support Vector Machine (SVM) and Decision Tree (DT) are the most trendy supervised machine learning algorithms.
  2. Regression – where the output variable is a real value like weight, dollars, etc. Linear regression is used for regression problems.

Support Vector Machine

Support Vector Machine (SVM) belongs to a supervised machine learning algorithm which is mostly used for data classification and regression analysis.

We can perform linear and non-linear classification with the help of Support Vector Machine.

SVM Classifier splits the data into two classes using a hyperplane which is basically a line that divides a plane into two parts.

svm-implementation

Applications of Support Vector Machine in Real Life

As you already know Support Vector Machine (SVM) based on supervised machine learning algorithms, so, its fundamental aspire to classify the concealed data.

It is most popular due to its memory efficiency, high dimensionality and versatility. There are several applications of SVM in real life some of them are mentioned here.

  • Face detection
  • Image classification
  • Reorganization of Handwriting
  • Geo and environmental sciences
  • Bioinformatics
  • Text categorization
  • Protein fold and remote homology detection
  • Generalized predictive control

Examples of SVM Kernels

  1. Polynomial kernel – it is mostly used in image processing.
  2. Linear Splines kernel in one-dimension – it is used in text categorization and is helpful in dealing with large spare data vectors.
  3. Gaussian Kernel – it is used when there is no preceding information about the data.
  4. Gaussian Radial Basis Function (RBF) – It is commonly used where there is no previous knowledge about the data.
  5. Hyperbolic Tangent Kernel – it is used in neural networks.
  6. Bessel Function of the First kind Kernel – it is used to eliminate the cross term in mathematical functions.
  7. Sigmoid Kernel – it can be utilized as the alternative for neural networks.
  8. ANOVA Radial Basis Kernel – it is mostly used in regression problems.

Support Vector Machine (SVM) implementation in Python:

Now, let’s start coding in python, first, we import the important libraries such as pandas, numpy, mathplotlib, and sklearn.

import pandas as pd
import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt

Load the dataset:

Now we load the dataset i.e. apples_and_oranges.csv which is already placed in the same folder where svm.ipynb file saved and also check the dataset what is inside the file. See this figure.

df = pd.read_csv('apples_and_oranges.csv')
df

We can also represent this data frame as a scatter plot.

plt.xlabel('weight')
plt.ylabel('size')
plt.scatter(df['weight'], df['weight'],color="green",marker='+', linewidth='5')
plt.scatter(df['size'], df['size'],color="blue",marker='.' , linewidth='5')

Split the dataset of Apples and Oranges into training and test samples with a ratio of 80% & 20%.

from sklearn.model_selection import train_test_split
train_set, test_set = train_test_split(df, test_size=0.2)

Now we classify the predictors and target.

x_train = train_set.iloc[:,0:2].values
y_train = train_set.iloc[:,2].values
x_test = test_set.iloc[:,0:2].values
y_test = test_set.iloc[:,2].values

We can also check the length of train_set and test_set by using this code

When we initialize the Support Vector Machine (SVM) and fitting the training data, we obtain.

from sklearn.svm import SVC
model = SVC(kernel='rbf', random_state = 1)
model.fit(x_train, y_train)

Now, we will check the accuracy of our model.

model.score(x_test, y_test)

Wao… our model worked perfectly as it provides 100% accuracy but this may not happen all the time especially in the case where a large number of features are involved.   

Now, we will predict the class of a fruit whose weight is 55 and size is 4.

model.predict([[55,4]])

Another check to predict the class of a fruit whose weight is 60 and size is 5.50.

model.predict([[60,5.50]])

Hence, it is clear from above that the Support Vector Machine (SVM) is an elegant and dominant algorithm.            

We can also use another kernel i.e. linear and check the model score like this.

model_linear_kernal = SVC(kernel='linear')
model_linear_kernal.fit(x_train, y_train)
model_linear_kernal.score(x_test, y_test)

Problem Statement No.1:

Train a Support Vector Machine (SVM) Classifier by using any suitable dataset and then find out the accuracy of your model by utilizing rbf and linear kernels.

You can download dataset here.

]]>
https://databasetown.com/implementing-support-vector-machine-svm-in-python/feed/ 0 2766
Machine Learning With Python – A Real Life Example https://databasetown.com/machine-learning-with-python-a-real-life-example/ https://databasetown.com/machine-learning-with-python-a-real-life-example/#respond Thu, 24 Oct 2019 16:42:28 +0000 https://databasetown.com/?p=2728 In this article we are going to discuss machine learning with python with the help of a real-life example. Before we proceed towards a real-life example, just recap the basic concept of Linear Regression.

Usually, Linear Regression is used for predictive analysis. It is a linear approximation of a fundamental relationship between two (one dependent and one independent variable) or more variables (one dependent and two or more independent variables).

Read also: 4 Types of Machine Learning

The main processes of linear regression are to get sample data, design a model that works finest for that sample, and make prediction for the whole dataset. Linear Regression is mainly used for trend forecasting, finding the strength of forecasters and predicting an effect.

There are various types of Linear Regression Analysis in which, Simple Linear Regression (One dependent variable and one independent variable), Multiple Linear Regression (one dependent variable and two or more independent variables), and Logistic Linear Regression (one dependent variable and two plus independent variables) are commonly used.

Let’s start with Simple Linear Regression with one dependent variable and one independent variable.

On the basis of the given data we will build a machine learning model that will predict the price of one Kg mangoes in upcoming years i.e. 2020 and 2021.

year mangoes_price (in Rs.)
2011 40
2012 50
2013 55
2014 60
2015 65
2016 70
2017 75
2018 80
2019 90

We can represent the values in aforementioned table as a scatter plot and then draw a straight line that best fits values on chart as shown in figure.

We can also draw multiple lines like this but we definitely select the one where the total sum of error is lowest.

Total sum of error can be calculated as

total sum of error formula
total sum of error formula

We have already learned in mathematics during high school days, y=mx+b, therefore, mangoes prices can be represented by the following equation.

Mangoes_price = m × year + b

Here, m is slope or gradient and b is intercept.

Now, let’s start coding in python, first we import the important libraries, such as pandas (for data manipulation in a tabular form and analysis), numpy (allows us to work with multidimensional arrays and matrices along with a large collection of high-level mathematical functions to operate on these arrays), mathplotlib (a 2D plotting library for python programming which is specially designed for visualization of NumPy computation) and sklearn (formally known as scikit-learn for data mining and data analysis) as shown in figure.

import pandas as pd
import numpy as np
from sklearn import linear_model
import matplotlib.pyplot as plt

Load the dataset: N

ow we load the dataset i.e. mangoes_price.csv which is already placed in the same folder where Simple Linear Regression.ipynb file saved and also check the dataset what is inside the file as shown in the figure.

df = pd.read_csv('mangoes_price.csv')
df

We can also represent that data frame as a scatter plot as shown here.

%matplotlib inline
plt.xlabel('year')
plt.ylabel('mangoes_price')
plt.scatter(df.year,df.mangoes_price,color='blue',marker='.', linewidth='5')

The basic purpose of this plotting data points on a scatter plot chart to find the linear relationship between variables, if the linear relationship found between these variables then we will use the Linear Regression Model.

In this scenario, there is a linear relationship between year and mangoes_price because price of mangoes increased with the passage of time. Before creating a linear model, we will create a new data frame in which we will drop a column (mangoes_price) as the linear model except for 2-D array.

new_df = df.drop('mangoes_price',axis='columns')
new_df

Also, check the price of mangoes like this

mangoes_price = df.mangoes_price
mangoes_price

In order to train the model, we will create an object of Linear Regression class and call a fit() method like this

reg_model = linear_model.LinearRegression()
reg_model.fit(new_df,df.mangoes_price)

We will predict the price of mangoes in the year-2020 and 2021.

reg_model.predict([[2020]])

Now, we manually check the model how it is being predicted this value. Therefore, we will find the slope (coefficient) and intercept like this

reg_model.coef_
reg_model.intercept_

As we already know, y = mx + b, where, ‘m’ is a slope and ‘b’ is an intercept. Hence, after putting the values of coefficient and intercept in the above equation and obtained an equal value of one Kg mangoes in year 2020 that our model has already predicted, result shown in figure

2020*5.66666667 + (-11353.333333333334)

This means that our linear model work good, now we will check its accuracy,

reg_model.score(new_df,mangoes_price)

Woo… our model works perfectly as it provides 98.80% accuracy.

Now, we will generate a csv file (in which only year mentioned but no mangoes price) with list of mangoes price predictions, like this

year_df = pd.read_csv("year.csv")
year_df
price = reg_model.predict(year_df)
price
year_df['mangoes_price']=price
year_df

Comparison of these actual and predicted prices of manages during the last five years i.e. 2015 to 2019 are given below.

S # Year Actual Price of per Kg mangoes (in Rs.) Actual Price of per Kg mangoes (in Rs.)
1 2015 65 65.00
2 2016 70 70.66
3 2017 75 76.33
4 2018 80 82.00
5 2019 90 87.66

Lastly, we will save this result in a new csv file namely price_prediction.csv.

year_df.to_csv("price_prediction.csv")

As we already know, “Practice makes a man perfect”, therefore, we have two problem statements for you to do some exercises to get the optimum grab on this technique.

Problem Statement No.1:

You are required to build a Regression Model and predict the price of Lux Soap in the upcoming year i.e. 2020. Download the file lux_price.csv

Problem Statement No.2:

You are required to build a Regression Model and predict the per capita income of the citizens of a country in the previous years (1990 & 1994). Download the file country_income.csv

Further Reading:

]]>
https://databasetown.com/machine-learning-with-python-a-real-life-example/feed/ 0 2728
What is Clustering & its Types? K-Means Clustering Example (Python) https://databasetown.com/clustering-types-k-means-clustering-example-python/ https://databasetown.com/clustering-types-k-means-clustering-example-python/#respond Mon, 07 Oct 2019 15:35:19 +0000 https://databasetown.com/?p=2673 Cluster Analysis

Cluster is a group of data objects that are similar to one another within the same cluster, whereas, dissimilar to the objects in the other clusters.

Cluster analysis is a technique used to classify the data objects into relative groups called clusters.

Clustering is an unsupervised learning approach in which there are no predefined classes.

The basic aim of clustering is to group the related entities in a way that the entities within a group are alike to each other but the groups are dissimilar from each other.

In K-Means clustering, “K” defines the number of clusters. K-means Clustering, Hierarchical Clustering, and Density Based Spatial Clustering are more popular clustering algorithms.

Examples of Clustering Applications:

  • Cluster analyses are used in marketing for the segmentation of customers based on the benefits obtained from the purchase of the merchandise and find out homogenous groups of the consumers.
  • Cluster analyses are used for earthquake studies.
  • Cluster analyses are used for city planning in order to find out the collection of houses according to their house type, worth and geographical locality.

Major Clustering Approaches:

Major clustering approaches are described as under: –

Partitioning Clustering

In this technique, datasets are subdivided into a set of k-groups (where k is the no. of groups, which is predefined by the analyst).

K-means is the well-known clustering technique in which each cluster is represented by the center of the data points belonging to the cluster.

K-medoids clustering is an alternative technique of K-means, which is less sensitive to outliers as compare to k-means.

K-means clustering method is also known as hard clustering as it produces partitions in which each observation belongs to only one cluster. 

Hierarchy Clustering

Hierarchy Clustering is used to identify the groups in the dataset but the analyst does not require to pre-specify the number of clusters to be generated.

The result obtained from this clustering is tree-based representation of the objects, which is recognized as a dendrogram. Furthermore, observations can also sub-divided into groups by slicing the dendrogram at the desired resemblance level.

Fuzzy Clustering

Fuzzy clustering is also known as soft clustering which permits one piece of data to belong to more than one cluster.

Fuzzy clustering is frequently used in pattern recognition. Fuzzy C-means clustering algorithm is commonly used worldwide.  

Density-based Clustering (DBSCAN)

DBSCAN stands for Density-based spatial clustering of applications with noise. It is a method that has been introduced by Ester et al. in 1996 that can be utilized to find out the clusters of any shape in a dataset having noise and outliers.

The main advantage of DBSCAN is that there is no need to specify the number of clusters to be generated by the user.

Grid-based Clustering

This clustering approach utilizes a multi-resolution grid data structure having high processing speed with a small amount of memory consumption.

Model-based Clustering:

In this clustering approach, it is assumed that the data is coming from a dispersal that is a combination of two or more clusters.

Model based clustering is utilized to resolve the issues that can arise in K-means or Fuzzy K-means algorithms.

Difference between Classification and Clustering

ClassificationClustering
Classification technique is widely utilized in mining for classifying datasets where the output variable is a category like black or white, plus or minus. Cluster is a group of data objects that are similar to one another within the same cluster, whereas, dissimilar to the objects in the other clusters. Cluster analysis is a technique used to classify the data objects into relative groups called clusters.
Naïve Bayes, Support Vector Machine, Decision Tree are the most popular supervised machine learning algorithms. Clustering is unsupervised learning in which there are no predefined classes.

Process of applying K-mean Clustering

  • Choose the number of clusters
  • Specify the cluster seeds
  • Assign each point to a centroid
  • Adjust the centroid

Pros and Cons of Clustering

K-means

  • Pros: It is simple to comprehend, work better on small as well as large datasets. This clustering technique is fast and efficient.
  • Cons: There is a dire need to select the number of clusters

Hierarchical Clustering

  • Pros: The ideal number of clusters can be acquired by the model itself.
  • Cons: Hierarchical clustering is not suitable for large datasets.

K-Means Clustering Example (Python)

These are the steps to perform the example.

Import the relevant libraries.

import libraries

Load the data

Now we load the data in .csv format in the same folder where clustering.ipynb file saved and also check the data what is inside the file. Look at this figure.

load the data

In order to map the data, we will create a new variable data_mapped which is equal to data.copy() and data_mapped[‘continent’] equal to data_mapped[continent].map and also Africa to 0, Asia to 1, Europe to 2, North America to 3 and South America to 4 as shown in this figure.

Further, we will select the features that we intend to utilize for clustering as below

In the above picture, we select three columns and left only one column i.e. country.

Perform K-Mean Clustering

perform k means clustering

In above span, we perform K-mean clustering with 5 clusters and the results shown in below figure.

Now we create a data frame i.e. data_with_clusters which is equal to data. Furthermore, we add an extra column i.e. Cluster which is equal to identified_clusters, as shown in figure

It is clear from the above picture that Angola, Burundi & Benin in cluster 0, Aruba, Anguilla, Antigua & Barb in cluster 1, Albania, Aland, Andorra, Austria & Belgium in cluster 2 and Afghanistan, United Arab Emirates & Azerbaijan in cluster 3.

Finally, we are going to plot a scatter plot in order to obtain a map of the real world. We will take the Longitude along the y-axis and Latitude along the x-axis.

These clusters are based on geographical location, therefore, the result is shown in this figure.

k means clustering with python
]]>
https://databasetown.com/clustering-types-k-means-clustering-example-python/feed/ 0 2673
Logistic Regression (Python) Explained using Practical Example https://databasetown.com/logistic-regression-python-explained-using-practical-example/ https://databasetown.com/logistic-regression-python-explained-using-practical-example/#respond Tue, 01 Oct 2019 14:40:46 +0000 https://databasetown.com/?p=2638 Logistic Regression is a predictive analysis which is used to explain the data and relationship between one dependent binary variable and one or more nominal, ordinal, interval or ratio-level independent variables. It is mostly used in biological sciences and social science applications. For instance, predict whether received email is spam or not. Similarly, predict whether customer will purchase the product or not.

Statistical gadgets are used to conduct the analysis as logistic regression is bit difficult to interpret as compare to the linear regression.

There are quite a few kinds of logistic regression analysis are:

  1. Binary Logistic Regression – 02 possible outcomes, e.g. email is spam or otherwise.
  2. Multiple Logistic Regression – 03 or more categories with no ordering, e.g. during admission in college, students have various choices among general program, academic program or vocational program.
  3. Ordinal Logistic Regression – 03 or more categories with ordering, e.g. mobile set rating from 1 to 5. 

Logistic Regression Model

Logistic Regression Model

Practical example of Logistic Regression

Import the relevant libraries and load the data.

import relevant libraries

For quantitative analysis, we must convert ‘yes’ and ‘no’ entries into ‘0’ and ‘1’ as shown in figure.

Now we are going to visualize our data, we are predicting job. Therefore, the job is our Y variable and Code (use for education) will be our X variable.

Here, we observed that for all the observations below the outcomes is zero or they are jobless, whereas, for all the persons above the process are successfully got the job. Now we are going to plot a regression line as shown in below figure.

Linear regression is awesome technique but here it is not suitable for this kind of analysis as this regression does not know that our values are bounded between 0  and 1. Our data is non-linear, therefore, we must have to use non-linear approach. Hence, now we are going to plot a logistic regression curve.

This function depicts the probability of getting job, given an educational code. When the education is low, the probability of getting job is 0 or nill, whereas, the education is high, the probability of getting job is 1 or 100%.

It is clear from the above snap that, when the education is ‘BA’ the probability of getting job is about 60%.

Logistic Regression Summary is shown in below figure.

MLE is stands for Maximum likelihood estimation.

Likelihood function

It is a function that guess how likely it is that the model at hand defines the real fundamental relationship of the variables. Larger the likelihood function, larger the probability that our model is precise.

Maximum likelihood function tries to maximize the likelihood function. Computer going through various values till finds an appropriate model for which the likelihood is the optimum. When there is no more improvement is possible, it will just stop the optimization.

Pseudo R-squared (Pseudo R-squ) is mostly useful for comparing variation of the same model. Different models have the different pseudo R-squares. If the value of Pseudo R-square lies between 0.2 and 0.4, it is considered decent.

LL-Null is stands for Log Likelihood-null. The LL (log-likelihood) of a model which has no independent variables.

LLR is stands for Log Likelihood Ratio which measures if our model is statistically different from LL-Null.

Calculating the accuracy of the model

In order to find the accuracy of the model, we use the results_log.predict() command that return the value predicted by our model. Also apply some formatting to see the results more readable by using this command

np.set_printoptions(formatter={‘float’: lambda x: “{0:0.2f}”.format(x)})

Here, value less than 0.5 means chances of getting jobs is below 50% and the value 0.93 means the chances of getting job is 93%.

Now, we compare the actual value of the model with predicted value

If 90% of the predicted values of the model match with the actual values of the model, we say that the model has 90% accuracy.

In order to compare the predicted and actual values in form of table we use the results_log.pred_table() command as shown in figure.

This result is bit difficult to understand, so we take these results in form of confusion matrix, as shown in below figure

Let’s clear this confusion matrix, for 3 observations the model predicted 0 and the actual vale was also 0, similarly, for 9 observations the model predicted 1 and the actual value was also 1, therefore, the model did its good job here.

Furthermore, for 2 observations the model predicted 0 whereas, the actual value was 1, similarly, 1 observation the model predicted 1 and the actual value was 0, therefore, here the model got confused.

Finally, it depicts from these confusion matrix, the model made an accurate estimation in 12 out of 15 cases which means our model works with (12/15)*100 = 80% accuracy.

We can also calculate the accuracy of the model by using this code

cm = np.array(cm_df)
accuracy_model = (cm[0,0]+cm[1,1])/cm.sum()*100
accuracy_model
logistic regression python explained
logistic regression python explained
]]>
https://databasetown.com/logistic-regression-python-explained-using-practical-example/feed/ 0 2638
Basics of Python for Data Science https://databasetown.com/basics-of-python-for-data-science/ https://databasetown.com/basics-of-python-for-data-science/#respond Thu, 19 Sep 2019 14:52:34 +0000 https://databasetown.com/?p=2481 Introduction to Python

Python is an open-source, high-level and general-purpose programming language which was created by Guido Van Rossum and released in 1991. Python 3.7 was released on 27.06.2018. It is very powerful, fast, friendly and easy to learn programming language. From here to onward we will use Python 3.

Jupyter

The Jupyter Notebook is an open-source web application. It permits you to create and share documents which contain live code, visualizations, etc. Jupyter used for data cleaning and transformation, numerical simulation, statistical modeling, data visualization, machine learning, and so on.

Jupyter’s Interface – Dashboard: Snap of Jupyter’s Interface – Dashboard.

You can see in below picture the interface is very simple and easy to understands.

We will click on ‘New’ button to create a new Notebook Python 3, a new text file, folder or terminal. Therefore, we click on Notebook Python 3 to create it.

To execute the code press Shift + Enter

Generally, code option is selected but if you intend to write comments/message or documentation then you will select markdown cell option, this text will not be executed as shown in the figure.

From here to onward we mostly focused on practical examples to understand the basic concepts.

Variables      

Variables declaration in python is very simple. Look at the figure that will show you how to declare variables in python.

‘x’ is declared with initial value of 12. When we run the code, the output value ’12’ is printed.

Numbers and Boolean Values:

Integer

Positive or negative whole numbers but without decimal points e.g. 3, 6, -4, -12, etc.

Float

Real numbers having decimal point like 3.6, -4.12, etc.

Boolean Value

It is two values ‘true’ or ‘false’, 0 or 1, on or off.

Strings

Strings consist of a sequence of characters like ‘test’, ‘book’, etc. Firstly, we have written Karachi without any quotes or print statement. The result is error.

Now put the same string in quotes ‘Karachi’, or by using print (‘Karachi’) output success.

Example of the concatenation of two strings is shown next. ‘+’ sign is used to concatenate the two strings.

Arithmetic Operators

Arithmetic operations are as simple as shown here. Simple operations of addition, multiplication, division, and multiplication are given. You may have noticed x = 3 **2. The out is 9. What does it mean? It means that power of 3 is raised to 2.

The last operation with % operator gives use remainder 2.

SHORT KEYS:

            Few important shortcuts are given as under: –

Key Purpose
Ctrl + Enter Code execution
Shift + Enter Code execution with new line
Double click on D Select cell or input field and then double click on D to delete the cell or input field
\ (back slash) Line continuation

Comparison, Logical and Identity Operators:

Comparison Operators

Operator Description
= = Double equal signs verify the equality on both left and right sides
! = It verify left and right sides are not equal
Greater than sign
Less than sign
> = Greater than and equal to
< = Less than and equal to

Logical Operators

Operator Description
And Checks whether the two statements around it are ‘True’
Or Checks whether at least one out of two statements is ‘True’
Not It leads to the opposite of the given statement if ‘true’ leads to ‘false’ and vice versa.

Identity Operators

Word Description
Is Checks whether the two statements around it are ‘True’
Is not Checks whether the two statements around it are ‘True’

Operator precedence:

Not,  And & then OR operator. A practical example of these operators are shown below,

5 == 6
4 == 4
4 != 5
4 is 5
4 is not 5

Adding Comments

Adding comments is an essential part of any programming language that will be helpful during coding and for future utilization of code. For commenting, hashtag # is used, see this example.

# This is just comment that will not effect on coding at all
x = 3 * 12
x

Indexing Elements:

Syntax: “name of variable”[index of element]. In indexing, square brackets [] must be used instead of parentheses () and braces {}.

It is pertinent to note that Python starts counting from 0 instead of 1. In our above example, the 4th letter is ‘s’.

Conditional Statements (If, Elseif):

Basic Syntax of 'if' statement
if condition
  conditional code
if 5 == 5:
   print ('Right')
syntax of 'if else statement
if condition
  condition code
else
  else code
if 5 == 6:
   print ('Right')
else:
   print ('Wrong')
syntax of "if elif' statement
if condition
   conditional code
elif
   conditional code
else
   else code
if 6 > 5:
   print ('Yes')
elif 6 < 5:
   print ('No') 
else:
   print ('Blank')

While Loop:

while loop python
a = 0
while a <= 10:
    print (a)
    a = a + 2

Creating a Function with a parameter:

Creating functions
def sample(x):
    return x + 5

sample(3)

Creating multiple functions with a few parameters:

Creating multiple functions with a few parameters in python
def sample(x,y,z):
    result = x + y + z
    print ('parameter x equals', x)
    print ('parameter y equals', y) 
    print ('parameter z equals', z)
    return result 

sample (2,4,6)

Built-in Functions in Python:

Function  Description
int() Convert its arguments in integer data type
float() Convert its argument in float data type
str() Convert its argument in string data type
abs() Get the absolute value of its argument
sum() Calculate the sum of all the elements
max() Give the highest value from a sequence of a numbers
min() Give the minimum value from a sequence of a numbers
len() It return the number of elements in an object
round(x,y) It returns the float of its argument(x), rounded to a specified number of digits(y) after the decimal point
pow(x,y) It return x to the power of y

Lists:

It is a type of sequence of data points like, integers, strings or floats.

lists in python
students = ['Shahid', 'Khalid', 'Ali']
students

students[1]
students[-1]

Tuples:

It is also a type of sequence of data points but tuples cannot be modified.

tuples in python
a = (12, 4, 6)
a

x, y, z = 2, 4, 6
z

Dictionaries:

It is another way of storing data.

Dictionaries in python
dict = {'d1': "Apple", 'd2': "Mango"}
dict['d1']

Built-in Methods (Append and Extend):

Built-in Methods (Append and Extend):
Passengers =['Shahid', 'Khalid', 'Ali']
Passengers

Passengers.append("Rizwan")
Passengers

Passengers.extend(["Jan"])
Passengers
Syntax:            
object.method()
]]>
https://databasetown.com/basics-of-python-for-data-science/feed/ 0 2481
How to install Anaconda (Python Distribution) on Windows https://databasetown.com/anaconda-python-distribution-installation-setup-guide-for-windows/ Mon, 09 Sep 2019 15:53:58 +0000 https://databasetown.com/?p=2436 Anaconda Installation and setup guide for Windows. You will learn how to install Python on windows, step by step.

Anaconda is most widely used open source distribution to perform Python/R for data science and machine learning. Data Scientists can easily analyze the data and visualize the results by using Python/R.

Anaconda Download Link

How to install Anaconda (Python Distribution) on Windows

Select your operating system and download the software. Here we are installing anaconda on windows 10 64-bit operating system.  Choose yours.

Let’s move to step by step process for installing python on your system.

Download the latest version of python 3.7

Now run the downloaded application.

Proceed to next step

Select users for whom you want to install anaconda.

Select the directory where you want to proceed installation.

You can sent can set the environment variables. Select the second option to detect anaconda as primary python on system.

Proceed to installation.

The installation has been completed. Now you can launch the Anaconda Navigator

Read also: A Beginner’s Guide to Data Science

]]>
2436