Decisions with Data


A practical approach to Data Science

Machine learning pipelines

Below we will introduce the Pipeline class, a tool to chain together processing steps in a machine learning workflow. Real world applications of machine learning normally consists of sequential processing steps. Pipelines allows us to string together multiple steps into a single Python object that is attached to scikit-learn’s fit, predict, and transform. When doing model evaluation using cross-validation and parameter tuning using grid search, the Pipeline class captures all the processing steps for proper evaluation, condenses the code, and reduces likelihood of making mistakes. Building a machine learning model can lead to confusion and disorganization due to its nature of multiple combinations and steps. Pipelines can help elimate overcomplication and simplify our workflow.


Introduction to Neural Networks

What is a neural network?

A neural network is a group of algorithms that certify the underlying relationship in a set of data similar to the human brain. The neural network helps to change the input so that the network gives the best result without redesigning the output procedure. A neural network, generally consists of three different layers. An input layer, hidden layer(s), and output layer.


Introduction to Experimental Design - A/B Testing

Companies large and small, continuously run experiments in order to stay competitive, attract new customers, retain current customers and last but not least, increase revenue. Data scientists are able to help in evaluating experiments and tests new or existing features, implement, and come to a conclusion on which features are better suited for the occasion and provide recommendations, in order to streamline decision making.


Data Science process from Start to Finish

When I started my data science journey, I was in for a rude awakening. I dove neck deep into heavy statistics, learning new computing languages, and detailed data science process. One of the things I wished I learned at the very beginning is, a combination of the Data science process using a dataset example. That is exactly what I plan on doing in this blog. I’ll be providing what a general learner data science notebook will look like with examples. The dataset I’m using is the Northwind database. Which is a free, open-source dataset created by Microsoft containing data from a fictional company.


The Importance of Preprocessing

According to a forbes article, data preparation, cleaning, preprocessing, wrangling, or whatever term you use, takes up roughly 80% of a data scientists time.