Giovanni Bricconi

My site on WordPress.com

Archive for January 2024

IBM Data Science training

leave a comment »

Learn Python, Pandas, Scipy and Matplotlib quickly, with many practical examples.

For a while I was reading about neural networks, mainly research papers or books. I think this has been very good because I learned much about successful architectures and how they really work. It allowed me to understand a bit what’s behind chatgpt and other deep learning tools.

It is good to have the theory, but at some point you would also like to experiment a bit and see what you can do by yourself. This seems difficult with deep neural networks because you have to pay for a long training, and do many experiments with different hyper-parameters. Also I realized I miss experience with many tools used. I come from C++ and Java development, most of the machine learning tools instead are Python based, run in Jupyter notebooks and require math concepts I do not have.

For these reasons I started looking for some training material: having had a good experience with Coursera I decided to pay myself one training available there: IBM data science. It is actually a group of ten different small training:

  • Machine Learning with Python
  • Tools for Data Science
  • Python Project for Data Science
  • Python for Data Science, AI & Development
  • Applied Data Science Capstone
  • Data Visualization with Python
  • Data Science Methodology
  • What is Data Science?
  • Data Analysis with Python
  • Databases and SQL for Data Science with Python

Somebody will not be happy looking at the IBM word, thinking it is just about IBM tools and services. This is not true, I did all the trainings and the IBM specific parts are very limited, and always proposed as optional modules. DB2 or IBM Watson tools are proposed during the training, but always as optional modules and in a non invasive way. If you want you can do nearly all the labs with a local jupyter instance on your laptop. In the end they have been really fair in proposing the content, and all the labs are well implemented. The cloud environment they proposed worked smoothly and was reliable.

The training is full of closed questions quiz – you have to listen carefully to the videos, but the questions are not very difficult. Doing the whole training required me roughly 3 weeks for 8 hours a day: of course it is a remote training and you can follow it little by little whenever you have time, but indeed it is not a small quick training!

You can start with a limited python knowledge, and little by little you become proficient with Pandas, scipy, numpy, dash, jupyter… It has been very useful to me. The contents are mainly practical, I actually need to find something more about the math needed to evaluate the quality of a model.

The final chapter “Applied data science capstone” put all the puzzle pieces together and makes you work on a case study where you will: do data collection, data wrangling, exploratory data analysis, data visualization, write Python code to create machine learning models including support vector machines, decision tree, evaluate the results of machine learning models for predictive analysis, compare their strengths and weaknesses.

If you want to learn quickly machine learning tools like Python, Pandas, scipy and matplotlib IBM data science is definitely something to try.

Written by Giovanni

January 7, 2024 at 11:21 am