Tutorial: Data Analysis and Predictive Modeling in Python

Speaker: Andressa Sivolella
ThoughtWorks Brasil

Download the presentation slides

Over the past few years, Python got a dedicated library for data analysis and predictive modeling, due to its strong community support. The main reason is because Python is easy to learn and can be well integrated with other databases and tools, such as Spark and Hadoop.

This workshop will guide you through data science main steps, which includes, reading, analyzing, visualizing and making predictions, split into:

  1. Exploratory analysis: first of all, it is needed to explore the available dataset.
  2. Performing Data Munging: the available dataset probably contains some issues (as missing values or outliers, which are samples with dissonant values, for example). Before moving on, it is necessary to deal with these cases.
  3. Building a predictive model: after data munging, the available dataset is clean and ready to build a predictive model. At this step, it is possible to train a classifier/regressor and evaluate its performance using Scikit Learn library.