Introduction to Data Science with Python
📍 程式設計與資料科學導論 本堂課是台大領域專長【計算語言學】的必修入門課。 This course is part of the NTU Specification Program (
Computational Linguistics
).
LING 5505/142 U0860
教師: 謝舒凱
教室: 共 104
Email: shukaihsieh@ntu.edu.tw
時間: 週四 14:20-17:20
🤗 Teaching Assistants
Name | Office Hours | |
---|---|---|
陳品而 (Una) | 週二、週三 14:00 ~ 16:00 *其他時間可回信 | f10142001@ntu.edu.tw |
連大成 (Richard) | 週一、週三 16:00 ~ 18:00 *其他時間可回信 | d08944019@ntu.edu.tw |
陳韋伶 (Linda) | 週二、週三 14:00 ~ 17:00 | d10142007@ntu.edu.tw |
Amy/Joanne/Mia/Irene/Amber | In-class assistance | --- |
📜 Course Description
本課程的目的,是藉由使用生成式 AI 工具的協助學生學習資料科學的基礎,以及相關的 Python 程式設計和計算技能。你將在此課程中學到的內容有:Python 基礎,包括資料類型、表達式、變數和內建資料結構;使用分支、迴圈、函數、物件和類別的程式邏輯;以及使用 Python 函式庫,如 Pandas、Numpy,以及使用 APIs 和網路爬蟲來存取網路資料,進行您的第一個資料科學專案。
With the assistance of generative AI tools, the aim of this course is to guide students in learning the fundamentals of data science, as well as relevant python programming and computational skills.
What you’ll learn in this class: Python Basics including Data Types, Expressions, Variables, and built-in Data Structures; programming logic using Branching, Loops, Functions, Objects & Classes; using Python libraries such as Pandas, Numpy, and accessing web data using APIs and web scraping for your first data science project.
🗝️ Enrollment
Prerequisites(s): No course must be taken prior to this course
Recommended Preparation: Statistics 101
📚 Readings
- McKinney, Wes. (2023). Python for Data Analysis, 3rd. edition.
- Nathan B. Crocker (2023). The AI-Powered Developer. MEAP.
- Artur Guja, Marlena Siwiak, and Marian Siwiak. (2023). Generative AI for Data Analytics. MEAP.
Course Schedule
📍 每週上課形式:講課 + 中場休息 + 助教課
Week | Date | Topic | Lab | HW |
---|---|---|---|---|
1 | September 7, 2023 | Orientation slide | Orientation slide | |
2 | September 14, 2023 | Introducing Data Science slide | Tools and Environment Setup slide | link |
3 | September 21, 2023 | Data and Corpus slide | Data Type; Basic Operations and Statements slide | link ANS |
4 | September 28, 2023 (教師節停課) | |||
5 | October 5, 2023 | More on Statements and File I/O slide | More on Data Types and statements slide | link ANS |
6 | October 12, 2023 | Functions, Modules and Packages slide | Pandas slide | link ANS |
7 | October 19, 2023 | Text manipulation slide | Advanced Pandas slide | link ANS |
8 | October 26, 2023 | ✏️ Midterm | Command-line tutorial slide; colab practice | |
9 | November 2, 2023 | Text collection and preprocessing slide colab | Data Cleaning and Preparation slide | link ANS |
10 | November 9, 2023 | Exploratory Data Analysis slide kaggle notebook | Data Wrangling slide | link ANS |
11 | November 16, 2023 | Exploratory data analysis slide | Plotting and Visualization slide | link ANS |
12 | November 23, 2023 | Time Series | Data aggregation and Group operations slide | ANS |
13 | November 30, 2023 | ML/NLP slide | HTML, CSS, JS, & Web Scraping slide | |
14 | December 7, 2023 | ML/NLP slide | Demo Project Using Streamlit slide | |
15 | December 14, 2023 | Data Project | Plotting Animation + Uploading to Streamlit Cloud slide | |
16 | December 21, 2023 | 💯 Final Presentation |
🏆 Grading
Breakdown
每週個人作業、課堂練習: 40% 期中考試或小專題實作: 30% 期末專題 (小組進行): 30%
Assignment Submission
作業繳交到課程的 Github classroom
Late Assignments
除非有合理請假事由,否則不接受遲交作業。
😥 Plagiarism
AI 時代很難定義何謂抄襲,但是記得學習是自己的事。只為了分數敷衍,害的是自己。
🧠 Final Project
期末專題可以以小組進行,人數 4 人以下。在第 16 週進行口頭報告,第 17 週則繳交書面報告與程式。