Introduction to Data Science with Python

📍 程式設計與資料科學導論 本堂課是台大領域專長【計算語言學】的必修入門課。 This course is part of the NTU Specification Program (Computational Linguistics).

LING 5505/142 U0860

教師: 謝舒凱

教室: 共 104

Email: shukaihsieh@ntu.edu.tw

時間: 週四 14:20-17:20


🤗 Teaching Assistants

Name Office Hours Email
陳品而 (Una) 週二、週三 14:00 ~ 16:00 *其他時間可回信 f10142001@ntu.edu.tw
連大成 (Richard) 週一、週三 16:00 ~ 18:00 *其他時間可回信 d08944019@ntu.edu.tw
陳韋伶 (Linda) 週二、週三 14:00 ~ 17:00 d10142007@ntu.edu.tw
Amy/Joanne/Mia/Irene/Amber In-class assistance ---


📜 Course Description

本課程的目的,是藉由使用生成式 AI 工具的協助學生學習資料科學的基礎,以及相關的 Python 程式設計和計算技能。你將在此課程中學到的內容有:Python 基礎,包括資料類型、表達式、變數和內建資料結構;使用分支、迴圈、函數、物件和類別的程式邏輯;以及使用 Python 函式庫,如 Pandas、Numpy,以及使用 APIs 和網路爬蟲來存取網路資料,進行您的第一個資料科學專案。

With the assistance of generative AI tools, the aim of this course is to guide students in learning the fundamentals of data science, as well as relevant python programming and computational skills.

What you’ll learn in this class: Python Basics including Data Types, Expressions, Variables, and built-in Data Structures; programming logic using Branching, Loops, Functions, Objects & Classes; using Python libraries such as Pandas, Numpy, and accessing web data using APIs and web scraping for your first data science project.


🗝️ Enrollment

Prerequisites(s): No course must be taken prior to this course

Recommended Preparation: Statistics 101


📚 Readings

  • McKinney, Wes. (2023). Python for Data Analysis, 3rd. edition.
  • Nathan B. Crocker (2023). The AI-Powered Developer. MEAP.
  • Artur Guja, Marlena Siwiak, and Marian Siwiak. (2023). Generative AI for Data Analytics. MEAP.


Course Schedule

📍 每週上課形式:講課 + 中場休息 + 助教課

Week Date Topic Lab HW
1 September 7, 2023 Orientation slide Orientation slide  
2 September 14, 2023 Introducing Data Science slide Tools and Environment Setup slide link
3 September 21, 2023 Data and Corpus slide Data Type; Basic Operations and Statements slide link ANS
4 September 28, 2023 (教師節停課)      
5 October 5, 2023 More on Statements and File I/O slide More on Data Types and statements slide link ANS
6 October 12, 2023 Functions, Modules and Packages slide Pandas slide link ANS
7 October 19, 2023 Text manipulation slide Advanced Pandas slide link ANS
8 October 26, 2023 ✏️ Midterm Command-line tutorial slide; colab practice  
9 November 2, 2023 Text collection and preprocessing slide colab Data Cleaning and Preparation slide link ANS
10 November 9, 2023 Exploratory Data Analysis slide kaggle notebook Data Wrangling slide link ANS
11 November 16, 2023 Exploratory data analysis slide Plotting and Visualization slide link ANS
12 November 23, 2023 Time Series Data aggregation and Group operations slide ANS
13 November 30, 2023 ML/NLP slide HTML, CSS, JS, & Web Scraping slide  
14 December 7, 2023 ML/NLP slide Demo Project Using Streamlit slide  
15 December 14, 2023 Data Project Plotting Animation + Uploading to Streamlit Cloud slide  
16 December 21, 2023 💯 Final Presentation    


🏆 Grading

Breakdown

每週個人作業、課堂練習: 40% 期中考試或小專題實作: 30% 期末專題 (小組進行): 30%

Assignment Submission

作業繳交到課程的 Github classroom

Late Assignments

除非有合理請假事由,否則不接受遲交作業。


😥 Plagiarism

AI 時代很難定義何謂抄襲,但是記得學習是自己的事。只為了分數敷衍,害的是自己。


🧠 Final Project

期末專題可以以小組進行,人數 4 人以下。在第 16 週進行口頭報告,第 17 週則繳交書面報告與程式。