- This course (unit) was originally designed for various elite class Bachelor students and Research students in some top Asia Pacific universities, including Hunan University, University of Chinese Academy of Sciences, Nanjing University of Science and Technology, Vellor Institute of Technology, SRM Institute of Science & Technology etc. (since 2012).
- Materials in this course include resources collected from various open-source online repositories. You are free to use, change and distribute this package.
- If you found any issue/bug for this site, please submit an issue at tulip-lab/math-foundation-for-data-scientists:
- Pull requests are welcome:
- Subsequent unit 👉 :
- Point of Contact 👉 : Prof. Gang Li
Prepared by TULIP Lab
Designed primarily for aspiring data scientists, this course (aka unit) lays the foundational groundwork in key areas such as linear algebra, matrix operations, probability theory, and statistical inference, culminating with an introduction to convex optimization.
The course begins with basic Linear Algebra, providing insights into vectors, matrices, and their operations, pivotal for understanding common data representation in data science. It progresses to Matrix Factorization, where students explore matrix decomposition techniques like QR and LU Factorization, crucial for dimensionality reduction and data interpretation. A significant focus is given to Singular Value Decomposition (SVD), a cornerstone technique for data compression and reduction, often employed in Principal Component Analysis (PCA).
In Probability Theory, the course covers fundamental concepts including random variables, distributions, and Bayes Theorem, essential for modeling uncertainty in data science. The Statistical Inference module delves into estimation, hypothesis testing, and regression analysis, empowering students to make informed conclusions from data.
Finally, the course introduces Convex Optimization, a critical area in machine learning, focusing on optimization problems, gradient descent methods, and their applications in algorithmic development.
This course is a blend of theory and practical application, aimed at providing a comprehensive mathematical toolkit for future data scientists.
Students will have access to a comprehensive range of subject materials, comprising slides handouts, and relevant readings. It is recommended that students commence their engagement with each session by thoroughly reviewing the pertinent slides handouts and readings to obtain a comprehensive understanding of the content.
Additionally, students are encouraged to supplement their knowledge by conducting independent research, utilizing online resources or referring to textbooks that cover relevant information related to the topics under study.
This unit needs a total of 48 class hours, including 40 hours teaching, and 10 hours student presentation/discussion. The unit plan is as below:
🔬 Session |
🏷️ Category |
📒 Topic |
🎯 ULOs |
---|---|---|---|
0️⃣ | Preliminary | 📖 Induction | ULO1 |
1️⃣ | Core | 📖 Linear Model and Matrix Operations | ULO1 |
2️⃣ | Core | 📖 Matrxi Factorization (I) | ULO1 |
3️⃣ | Core | 📖 Matrxi Factorization (II) | ULO1 ULO2 |
4️⃣ | Core | 📖 Matrxi Factorization (III) | ULO1 ULO2 |
Student Work | 📖 Selected Topics | ULO3 | |
5️⃣ | Core | 📖 Prime to Probability | ULO1 ULO2 |
6️⃣ | Core | 📖 Independence | ULO1 ULO2 |
7️⃣ | Core | 📖 Random Variable | ULO1 ULO2 |
8️⃣ | Core | 📖 Stochastic Processes | ULO1 ULO2 |
9️⃣ | Core | 📖 Concentration Inequalities and LLN | ULO1 ULO2 |
🔟 | Core | 📖 Statistical Inference | ULO1 ULO2 ULO3 |
Student Work | 📖 Selected Topics | ULO3 | |
🏆 | Advanced | 📖 Convex Optimization | ULO1 ULO2 |
Every cohort might be assessed differently, depending on the specific requirements of your universities.
The assessment of the unit is mainly aimed at assessing the students' achievement of the unit learning outcomes (ULOs
, a.k.a. objectives), and checking the students' mastery of those theory and methods covered in the unit.
The detailed assessment specification and marking rubrics can be found at: S00D-Assessment. The relationship between each assessment task and the ULOs are shown as follows:
🔬 Task |
👨🏫 Category |
🎯 ULO1 |
🎯 ULO2 |
🎯 ULO3 |
Percentage |
---|---|---|---|---|---|
1️⃣ | Presentation | 50% | 25% | 25% | 100% |
- 2024 - The final assessment files submissions due date is 🗓️
Saturday, 18/05/2024
(tentative), group of one member only (individual work) for all tasks.
It is expected that you will submit the assessment component on time. You will not be allowed to start everything at the last moment, because we will provide you with feedback that you will be expected to use in future assessments.
㊙️
If you find that you are having trouble meeting your deadlines, contact the Unit Chair.
This course recommended several key references:
- Introduction to Linear Algebra, Gilbert Strang, MIT
- Probability and Statistics: The Science of Uncertainty, Michael J. Evans and Jeffrey S. Rosenthal, University of Toronto
Thanks goes to these wonderful people 🌷
Made with contributors-img.