---
title: 1- Introduction
author: "Bilel MOULAHI"
date: "2023-03-21"
format: 
  html:
    code-fold: true
jupyter: python3
---

# Introduction

Welcome to our data science bootcamp, an intensive journey into the fascinating world of data science. Throughout this course, we'll delve into the essentials of machine learning and deep learning. We'll explore the Python programming language, along with popular data analysis libraries and frameworks that are indispensable to any data scientist.

Our course emphasizes a hands-on approach, enabling you to apply the concepts you've learned to real-world data sets through a series of guided projects. These projects are designed to reinforce your understanding and provide you with practical experience. 

In addition to these projects, the course is centered around a fundamental belief that to become a proficient data scientist, one must acquire a deeper and more comprehensive understanding of the models used on a daily basis. To achieve this deeper understanding, it is essential to build these models from the ground up, piece by piece, layer by layer. This is why coding models from scratch plays a significant role in our curriculum. By constructing each model yourself, you will gain invaluable insights into their inner workings and develop the ability to adapt them to various real-world scenarios with ease.

Our support extends beyond the bootcamp. To ensure your continued growth and success in data science, we offer a range of resources, including access to online courses, comprehensive documentation, and learning materials to further develop your skills.

The primary goal of this data science bootcamp is to equip you with a robust foundation in data science principles and techniques. We aim to empower you with the knowledge and practical skills needed to address the most challenging aspect of data science: productionizing your models.

Are you prepared to delve into the compelling world of data science? Join us as we embark on this journey together!

## Table of Contents
1. [Course Welcome and Overview](#course-welcome)
    1. [What to Expect and How We'll Navigate the Course](#expectations)
    1. [Expected Background and Prerequisites](#prerequisites)
    1. [Why Our Course Stands Out](#unique-course)
    1. [What Not to Expect from This Course](#not-expectations)
2. [Introducing Your Instructor](#instructor-introduction)
3. [Your Roadmap: The Course Outline](#course-outline)
4. [Diving into Key Concepts, Tools, and Projects](#key-concepts)
    1. [Essential Tools and Libraries (pandas, transformers, streamlit & gradio)](#tools-libraries)
    1. [Preview of Projects and Applications: A Sneak Peek](#projects-preview)
5. [Your Data Scientist Journey and How to Make the Most of It](#data-scientist-journey)
    1. [Recommended Course Path and Timeline](#course-path)
    1. [Navigating the Course: Tailoring Your Learning Experience with Animated Guidance](#course-navigation)
6. [Our Learning Approach: Course Methodology](#learning-approach)
7. [Accessing Course Resources and Code Repository on GitHub](#github-resources)
8. [Flashcards for Efficient Learning](#flashcards)
9. [Acknowledgements and References](#references)


## 1.1 Course Welcome and Overview (Duration: 3 minutes) <a id='course-welcome'></a>
 
Welcome to the course! In this section, we'll set the stage for your learning journey by providing an overview of the course content and structure. We'll also discuss what you can expect from this course, the prerequisites, and why our course stands out from others.

### 1.1.1 What to Expect and How We'll Navigate the Course <a name="expectations"></a>
This course provides a comprehensive guide to the fundamentals of machine and deep learning, covering both theory and practice. Our goal is to help you understand the entire process of creating data science projects, from business understanding to deployment. We aim to give you a solid foundation to advance in a data science career, whether you're a beginner or a practitioner in the field.

Throughout the course, we'll tackle:

    The distinctions between AI, machine learning, and deep learning
    Various job roles in the data field, such as data scientist, data analyst, and data engineer
    Key machine learning concepts and techniques
    Common data science tools and technologies
    Hands-on projects to gain practical experience
    Bridging the gap between data collection, training machine learning models, and deploying AI systems in the real world

Each part of the course will include:

    A problem statement
    Coding from scratch with clear explanations and mapping between theory and code
    Development using standard Python libraries, such as scikit-learn and PyTorch, along with practical examples

We'll start by providing a big-picture context for every concept and explain the underlying math behind the models. To help you gain a deeper understanding, we'll code the models from scratch. All code used in this course is available on GitHub, organized by branches to track your progress, and follows coding standards commonly used in data science.

The course is organized into several weeks or modules, each focusing on a specific set of topics. In addition to the lectures, you'll have the opportunity to participate in hands-on activities, exercises, and projects, allowing you to apply what you've learned and gain practical experience.

The schedule for the course, including specific topics covered in each module and any deadlines or assignments, will be provided in a roadmap. This will help you plan your time accordingly and ensure you make the most of the thorough and practical learning experience this course offers. We hope that our course will serve as a valuable resource for you throughout your data science journey.

### 1.1.2 Expected Background and Prerequisites <a name="prerequisites"></a>
Machine learning practitioners and researchers come from diverse backgrounds, leading to a wide variety of methods and approaches in the field. While we strive to make our explanations as accessible as possible, there are certain prerequisites we assume students have before starting the course.

To fully benefit from this course, you should have:

    High school mathematics: A basic understanding of mathematical concepts, including algebra and probability, will be helpful in grasping machine learning algorithms and techniques.

    Basic Python and programming skills: Familiarity with Python programming and general programming concepts will enable you to effectively engage with the course materials and hands-on exercises.

Our aim is to make the content accessible and enjoyable for learners with diverse backgrounds. The course will contain pointers to material suitable for covering these topics if needed. Many techniques described in the course may exceed this basic background, but we will provide references to source material for deeper understanding when necessary. Throughout the course, we will use abbreviations such as ML (machine learning), NN (neural networks), IR (information retrieval), NLP (natural language processing), and CV (computer vision).

Additionally, you should have:

    Patience and time: Be prepared to invest time and effort into understanding complex concepts and applying them to real-world problems.
    Strong mathematical foundation: While it is possible to learn machine learning without a strong math background, having one will significantly help when tackling complex real-world problems.

By meeting these prerequisites, you'll be well-equipped to make the most of the comprehensive and practical learning experience this course offers.

### 1.1.3 Why Our Course Stands Out <a name="unique-course"></a>

There are numerous very good courses out there, so what makes ours different? Let's break it down:

    Comprehensive coverage: Our course delves into both the practical, real-world applications of data science and the theory behind machine learning. We provide a clear understanding of the data science workflow, model development, and deployment, ensuring that students can create models from scratch with confidence.

    Top Python libraries: We focus on the most popular and widely-used Python libraries for machine learning: PyTorch and scikit-learn. This ensures that you're learning the tools most commonly used in the industry.

    Clarity and simplicity: We take complex concepts and break them down into digestible chunks, making it easier for you apply. Our course is designed to be accessible and accommodating for various learning styles.

    Emphasis on practical applications: We don't just cover theory - our course shows you how to apply your knowledge to build and deploy models in real-world, industrialized products.

    A holistic approach: Our course offers a comprehensive overview of the field, drawing from a wide range of resources, including books, blog posts, tweets, and podcasts.


We strive to deliver a holistic, accessible, and practical learning experience that prepares you for success in the world of data science.

### 1.1.4 What Not to Expect from This Course <a name="not-expectations"></a>
Before diving in, it is crucial to understand what you should not expect from this course:

1. A quick path to mastering machine learning: This course is not designed to be a "learn machine learning in one week" type of program. It is an in-depth course that requires time and effort to build the skills necessary to succeed in the field.

2. A shortcut to success: This is not a course for those seeking instant gratification but rather a long-term learning journey that requires dedication and perseverance.

3. A static resource: The field of machine learning is constantly evolving, and this course is designed to keep up with those changes. As a result, the course material may be updated and adjusted as needed to ensure it remains relevant and up-to-date. This course serves as a personal guide and resource not only for your career but also mine, and we are committed to maintaining its quality and value.

By understanding these points, you'll be better prepared to make the most of this course and and have a realistic view of what to expect. With the right mindset and commitment, this course can provide you with the knowledge and skills needed to thrive in the machine learning field.



## 1.2 Introducing Your Instructor (Duration: 2 minutes) <a id='instructor-introduction'></a>

Hey there! I'm excited to be your instructor for this course, and I'd like to share a bit about my background, experience, and what you can expect from me as your guide through the world of data science and machine learning.

![shady.png](shady.png)

I earned my Doctorate in Computer Science from the University of Toulouse in 2015, with a focus on Information Retrieval. Throughout my academic and professional journey, I have published 21 scientific articles and had been lucky to attend several international conferences in Information retrieval and Machine learning.

Over the years, I've had the opportunity to lecture on various subjects like advanced algorithms, databases, information retrieval, and web development.

I've also worked on some pretty exciting projects, such as applying machine learning and sentiment analysis to detect and prevent suicide in social networks at LIRMM in Montpellier, France. This work was part of a project in collaboration with the University Hospital Center (CHU) of Montpellier.
As a research and development engineer at the University of Montpellier, I developed a mobile app for clinical evaluation, risk prediction, and intervention in suicide risk management. 
I've also had hands-on experience in the industry as a data scientist at ACELYS INFORMATIQUE and Akio, where I developed, experimented with, and industrialized machine learning and big data solutions.

I'm not just about work, though! I love staying up-to-date with the latest technologies in DevOps, back-end and front-end development, and machine learning frameworks. 

I'm here to provide you with a clear understanding of the concepts, tools, and techniques needed to succeed in the field. As an instructor, I not only aim to share my expertise and knowledge with you, but I also look forward to learning from you and this unique experience.

Let's embark on this exciting journey together!


## 1.3 Your Roadmap: The Course Outline (Duration: 4 minutes) <a id='course-outline'></a>

Here, we'll walk you through the course outline and provide a roadmap for your learning journey. This will give you a clear understanding of the topics we'll cover and the order in which they'll be presented, helping you make the most of your time with us.

![./roadmap.png](./roadmap.png)


## 1.4 Diving into Key Concepts, Tools, and Projects (Duration: 7 minutes) <a id='key-concepts'></a>

In this section, we'll provide an overview of the key concepts, tools, and libraries we'll be using throughout the course. We'll also give you a sneak peek at the exciting projects and applications we'll be working on, showcasing the practical applications of the concepts you'll learn.


### 1.4.1 Essential Tools and Libraries (pandas, transformers, streamlit & gradio) <a name="tools-libraries"></a>

Throughout this course, we'll be using several powerful tools and libraries to help you understand and implement machine learning and deep learning concepts. Some of the essential libraries we'll be working with include:

    Pandas: A powerful data manipulation and analysis library, providing data structures and functions needed to work with structured data.
    Scikit-learn: A machine learning library that features various classification, regression, and clustering algorithms.
    PyTorch: An open-source deep learning platform that provides a flexible and efficient way to build, train, and deploy neural networks.
    Transformers: A state-of-the-art NLP library that provides pre-trained models and architectures for various natural language processing tasks.
    Streamlit: A Python library that allows you to quickly and easily create custom web applications to showcase your data, models, and visualizations.
    Gradio: A user-friendly library for creating interactive demos and web applications, enabling you to build and share your machine learning models with minimal effort.
    FastAPI: A modern, fast, web framework for building APIs with Python based on standard Python type hints.
    Docker: A platform that enables developers to automate the deployment of applications inside lightweight, portable containers.

These libraries and tools will be instrumental in bringing our projects to life, and we'll dive deeper into their functionality as we progress through the course.

### 1.4.2 Preview of Projects and Applications: A Sneak Peek <a name="projects-preview"></a>
Throughout this course, we'll be working on a variety of exciting projects and applications that demonstrate the real-world potential of machine learning and deep learning. We'll provide snapshots, demos, and videos of the projects up and running, so you can see the practical applications of the concepts you'll learn.

We'll be covering everything about machine learning and deep learning from scratch. Some examples include:

    tinygrad: A minimalistic tensor library that demonstrates the core concepts of machine learning, such as tensor operations and derivatives.
    micrograd: A simple autograd engine and neural network library for educational purposes.
    makemore: A project that showcases the creation of new data samples using generative models.
    fastai: A library that simplifies the training of neural networks using modern best practices.

We'll explore topics ranging from gradient descent and backpropagation to transformers and stable diffusion from scratch.

To make the learning experience interactive and engaging, we'll be using Gradio and fastpages to host demos and showcase projects, both as standalone web applications and integrated into course materials. This will allow you to experiment with the tools and techniques we discuss in the course in real-time.

Additionally, we'll be utilizing Quarto to convert the course materials to HTML, making it easily accessible and interactive. Interactive widgets will also be incorporated to enhance your understanding of complex concepts.

Lastly, we'll use debugging tools like IntelliJ to demonstrate how algorithms work line by line, providing a deeper understanding of the inner workings of machine learning and deep learning techniques.



## 1.5 Your Data Scientist Journey and How to Make the Most of It (Duration: 3 minutes) <a id='data-scientist-journey'></a>

We understand that every learner is unique, so we'll provide recommendations on how to tailor your learning experience and make the most of this course. We'll share insights on the recommended course path, timeline, and how to navigate through the content based on your background and goals.

### 1.5.1 Recommended Course Path and Timeline <a name="course-path"></a>

### 1.5.2 Navigating the Course: Tailoring Your Learning Experience with Animated Guidance <a name="course-navigation"></a>


## 1.6 Our Learning Approach: Course Methodology (Duration: 4 minutes) <a id='learning-approach'></a>

This section will introduce our course methodology and pedagogical approach. We believe in a hands-on, practical learning experience that encourages active engagement, and we'll share how our teaching methods support this philosophy.


In the development of this data science course, we have adhered to a carefully designed methodology to guarantee that the material is comprehensive, accurate, and up-to-date.

Initially, we conducted an extensive review of the current state of the field to pinpoint the key concepts and techniques that are most pertinent and crucial for students to learn. Subsequently, we devised a curriculum that addresses these concepts and techniques in a logical and coherent manner, making sure that each topic builds upon the foundation of previous topics.

We then chose a variety of teaching methods, such as lectures, discussions, hands-on activities, and projects, to facilitate students' learning and application of the material in a meaningful way. To supplement the lectures and offer additional learning opportunities, we incorporated various resources and materials, including textbooks, videos, and online tutorials.

Throughout the course, we will closely monitor student progress and deliver feedback and support to ensure that everyone has a positive and successful learning experience. We are dedicated to providing the highest quality education and continuously update and enhance the course to keep it current and relevant.

We trust that this methodology will assist you in gaining a profound understanding of data science and help you develop the skills necessary to excel in the field.


## 1.7 Accessing Course Resources and Code Repository on GitHub (Duration: 2 minutes) <a id='github-resources'></a>

In this section, we'll introduce the GitHub repository that contains all the course materials, including code snippets, Jupyter notebooks, and datasets. We'll show you how to access these resources, clone the repository, and keep it up-to-date throughout the course.


## 1.8 Flashcards for Efficient Learning (Duration: 1 minute) <a id='flashcards'></a>

Learning smarter is key to success in this course. In this section, we'll introduce flashcards that you can use to review and reinforce the key concepts covered in the course. These flashcards will help you retain information and make the most of your learning experience.


## 1.9 Acknowledgements and References (Duration: 1 minute) <a id='references'></a>

Finally, we'll conclude the introduction by acknowledging the work of others that have inspired and informed our course content. We'll also provide a list of references and resources that you can consult to further your understanding of the topics covered in this course.


In developing this data science course, we want to express our gratitude and acknowledge the invaluable contributions made by various experts, institutions, and resources in the field. Their work has helped shape our understanding and approach to data science, machine learning, and deep learning.

We would like to extend our appreciation to Jeremy Howard, co-founder of Fast.ai, for his groundbreaking work in making deep learning more accessible through his courses and the Fast.ai library. His dedication to teaching and simplifying complex concepts has inspired our approach to this course.

We are also grateful for the excellent foundational material provided by the Stanford University course on Machine Learning, taught by Andrew Ng. This course has been instrumental in introducing countless students to the world of machine learning and has influenced our curriculum development.

Sebastian Raschka's book "Python Machine Learning" has been a key resource in understanding the practical implementation of machine learning algorithms using Python. We have incorporated some of his insights and techniques in our course material.

We cannot overlook the numerous books, YouTube videos, open-source projects on GitHub, and blog posts that have played a role in shaping our understanding and knowledge of the subject matter. These resources have provided diverse perspectives and best practices that we have integrated into our course.

Our course is a result of these collective efforts, and we have made every effort to bring together the most relevant and up-to-date content to provide a comprehensive learning experience. We hope that this course will help you gain a deep understanding of data science, machine learning, and deep learning, and empower you to excel in the field.

If you find any content within the course that requires proper attribution or reference, please do not hesitate to contact us so that we can make the necessary updates and ensure appropriate credit is given.