Haojian Jin

DSC 291 Privacy-sensitive Data Systems

Privacy is changing how we build data systems. Recent regulations (e.g., GDPR, CCPA) require developers to offer greater privacy protections. However, practitioners struggle to turn these high-level privacy principles into low-level code implementation. This course will introduce the widely adopted privacy principles, discuss the system research associated with each principle, and use these principles to analyze a few real-world privacy-sensitive data systems (e.g., Apple AirTag, Covid-19 Contact tracing systems). Students will learn privacy-relevant technologies and integrate perspectives that span product design, software development, cybersecurity, human-computer interaction, and business and legal considerations.

[Winter 2023 website]

DSC 204 Scalable Data Systems

Today, we have an overwhelming variety of tools, including relational databases, NoSQL datastores, stream or batch processors, and message brokers. This class seeks to help novice data scientists to peer under the hood of the systems they will use, and learn how to use and operate them more effectively.

[ Spring 2023 website]

DSC 102 Systems for Scalable Analytics

This course covers the principles of computing systems and tools for scaling data analytics to large datasets. Topics include computer organization, memory hierarchy, basics of operating systems, scalable and parallel computing, cloud computing, design and use of parallel dataflow systems, and the use of deep learning tools. It will cover how relational algebra, SQL, linear algebra, and more general dataflow operations in such systems can be used to perform data preparation and feature engineering for machine learning (ML) at scale, how to scale ML training, how to perform ML model selection and deployment at scale, and how to handle data heterogeneity. It will also introduce the implementation of such data systems and touch upon the latest research in this space.

[Spring 2024 website] [Fall 2023 website]