Skip to content

tulip-lab/math-foundation-for-data-scientists

Repository files navigation

GitHub watchers GitHub Release Date GitHub commits since latest release (by SemVer) GitHub issues GitHub pull requests

GitHub watchers GitHub forks GitHub stars


Math Foundations for Data Scientists

Prepared by TULIP Lab


💡 Content

Designed primarily for aspiring data scientists, this course (aka unit) lays the foundational groundwork in key areas such as linear algebra, matrix operations, probability theory, and statistical inference, culminating with an introduction to convex optimization.

The course begins with basic Linear Algebra, providing insights into vectors, matrices, and their operations, pivotal for understanding common data representation in data science. It progresses to Matrix Factorization, where students explore matrix decomposition techniques like QR and LU Factorization, crucial for dimensionality reduction and data interpretation. A significant focus is given to Singular Value Decomposition (SVD), a cornerstone technique for data compression and reduction, often employed in Principal Component Analysis (PCA).

In Probability Theory, the course covers fundamental concepts including random variables, distributions, and Bayes Theorem, essential for modeling uncertainty in data science. The Statistical Inference module delves into estimation, hypothesis testing, and regression analysis, empowering students to make informed conclusions from data.

Finally, the course introduces Convex Optimization, a critical area in machine learning, focusing on optimization problems, gradient descent methods, and their applications in algorithmic development.

This course is a blend of theory and practical application, aimed at providing a comprehensive mathematical toolkit for future data scientists.

📒 Sessions

Students will have access to a comprehensive range of subject materials, comprising slides handouts, and relevant readings. It is recommended that students commence their engagement with each session by thoroughly reviewing the pertinent slides handouts and readings to obtain a comprehensive understanding of the content.

Additionally, students are encouraged to supplement their knowledge by conducting independent research, utilizing online resources or referring to textbooks that cover relevant information related to the topics under study.

🗓️ Session Plan

This unit needs a total of 48 class hours, including 40 hours teaching, and 10 hours student presentation/discussion. The unit plan is as below:

🔬
Session
🏷️
Category
📒
Topic
🎯
ULOs
0️⃣ Preliminary 📖 Induction ULO1
1️⃣ Core 📖 Linear Model and Matrix Operations ULO1
2️⃣ Core 📖 Matrxi Factorization (I) ULO1
3️⃣ Core 📖 Matrxi Factorization (II) ULO1 ULO2
4️⃣ Core 📖 Matrxi Factorization (III) ULO1 ULO2
🅰️ Student Work 📖 Selected Topics ULO3
5️⃣ Core 📖 Prime to Probability ULO1 ULO2
6️⃣ Core 📖 Independence ULO1 ULO2
7️⃣ Core 📖 Random Variable ULO1 ULO2
8️⃣ Core 📖 Stochastic Processes ULO1 ULO2
9️⃣ Core 📖 Concentration Inequalities and LLN ULO1 ULO2
🔟 Core 📖 Statistical Inference ULO1 ULO2 ULO3
🅱️ Student Work 📖 Selected Topics ULO3
🏆 Advanced 📖 Convex Optimization ULO1 ULO2

🈵 Assessment

Every cohort might be assessed differently, depending on the specific requirements of your universities.

The assessment of the unit is mainly aimed at assessing the students' achievement of the unit learning outcomes (ULOs, a.k.a. objectives), and checking the students' mastery of those theory and methods covered in the unit.

📖 Assessment Plan

The detailed assessment specification and marking rubrics can be found at: S00D-Assessment. The relationship between each assessment task and the ULOs are shown as follows:

🔬
Task
👨‍🏫
Category
🎯
ULO1
🎯
ULO2
🎯
ULO3
Percentage
1️⃣ Presentation 50% 25% 25% 100%

🗓️ Submission Due Dates

  • 2024 - The final assessment files submissions due date is 🗓️ Saturday, 18/05/2024 (tentative), group of one member only (individual work) for all tasks.

It is expected that you will submit the assessment component on time. You will not be allowed to start everything at the last moment, because we will provide you with feedback that you will be expected to use in future assessments.

㊙️

If you find that you are having trouble meeting your deadlines, contact the Unit Chair.

📚 References

This course recommended several key references:

👉 Contributors

Thanks goes to these wonderful people 🌷

Made with contributors-img.