MA-INF 4115: INTRODUCTION TO NATURAL LANGUAGE PROCESSING

Winter Semester 2023 – 2024

Content:

What is the Introduction to NLP course about?

This course provides a technical perspective on NLP - methods for building computer software that understands and manipulates human language. Contemporary data-driven approaches are emphasized, focusing on machine learning techniques. The covered applications vary in complexity, including for example Entity Recognition, Argument Mining, or Emotion Analysis.
 

Through lectures, exercises, and a final project, you will gain a thorough introduction to cutting-edge research in NLP, from the linguistic basis of computational language methods to recent advances in deep learning and large language models.
 

Recommended participation requirements:

  • Basic programming knowledge in Python and Machine Learning
  • Basics of Machine Learning
  • Basic knowledge of Python Libraries for ML (NumPy, Scikit-Learn, Pandas)
  • Basics of Probability, Linear Algebra and Statistics

Logistics:

  • Lectures: are on Thursday 10:15 AM - 11:45 AM in B-IT-Max 0.109 (Friedrich-Hirzebruch-Allee 6). ZOOM LINK
  • Exercises: are on Wednesday in B-IT-Max 0.109. You can choose one of the following exercise groups to attend. ZOOM LINK
    • Group 1: 2:15 PM - 3:45 PM (Vahid)
    • Group 2: 4:00 PM - 5:30 PM (Ulvi)
  • Course Materials: will be uploaded every week on eCampus.
  • Contact: Students should ask all course-related questions in our forum discussion on eCampus. For external inquiries, emergencies, or personal matters, you can email us at itnlp.uni.bonn(at)gmail.com.
  • Office Hours: Please reach out to us first via mail to arrange any in-person meeting.
    • Prof. Dr. Lucie Flek: Friedrich-Hirzebruch-Allee 6 (B-IT) – Room: 2.123
    • Vahid Sadiri Javadi: Friedrich-Hirzebruch-Allee 6 (B-IT) – Room: 2.120

NEWS / UPDATES:

  • 21.11.2023: Both Q & A exercises (Group 1 & 2) on 28.11 will take place at 2:15 PM.
  • 20.11.2023: The exercise (Group 2) on 22.11 has been rescheduled to 2:15 PM.
  • 26.10.2023: The exercise (Group 1) on 15.11 will be held ONLY ONLINE ON ZOOM!
  • 17.10.2023: The first exercise starts on Wednesday, 25.10.2023 at 2:15 PM.
  • 17.10.2023: The first lecture starts on Thursday, 26.10.2023 at 10:15 AM.

Instructors:

Prof. Dr. Lucie Flek

flek(at)bit.uni-bonn.de

Head of CAISA Lab

Vahid Sadiri Javadi

vahidsj(at)bit.uni-bonn.de

Course Coordinator

Teaching Assistants:

Farizeh Aldabbas

farizeh(at)uni-bonn.de

Ulvi Shukurzade

ulvi(at)uni-bonn.de


Coursework:

Assignments (Prerequisite for the exam):

Will be uploaded on eCampus.

  • Credits:
    • Assignment 1 (10%): Word Operations
    • Assignment 2 (20%): Text Classification (Scikit-Learn)
    • Assignment 3 (20%): Word Vectors (SpaCy)
    • Assignment 4 (30%): Fine-tuning with LLMs (Hugging Face)
    • Assignment 5 (20%): Hidden Markov Model
  • Deadlines: All assignments are due on Tuesday before the exercise class at 11:59 PM. All deadlines are listed in the schedule.
  • Submission: Assignments should be submitted via eCampus. Further instructions are given in each assignment file. Please do not email us your assignments.
  • Collaboration: Working on assignments in a group of 2 students is allowed. please name your file with both student names. File name: <FirstName_LastName>
  • Grade/ Feedback: You will receive your graded assignment every week on eCampus.
    **NOTE:** You need to achieve at least 50% of the points to be allowed to take the exam.

Final Project (40%):

  • Project Types: Students choose one of the following project types
    • Default Project: Students choose one of the datasets we listed here (New “INTERESTING” datasets are welcome! - but you need to contact Vahid beforehand.), formulate a real-world problem (PF), and try to solve it (PS) by training a model or fine-tuning a pre-trained LLM.
      Submission: [Code + Report for final results]
    • Resource Creation Project: To answer this question: How to generate an NLP dataset from any internet source? Students design a pipeline to build and annotate a dataset. They should define at least one NLP downstream task for their dataset.
      Submission: [Crawling script + Dataset + Report for final results]
    • Robustness and Reproducibility Project: To measure the ability of a model or an NLP system to perform consistently and accurately across a wide range of inputs and conditions, students collect and annotate an evaluation set in a new domain with 100 – 200 instances and test at least two existing models (e.g., from GitHub) with the new evaluation set.
      Submission: [Crawling script + Code + Evaluation set + Report for final results]
  • Submission: Depending on which project type the students choose, they submit each project component on eCampus in the following format:
    • PF: A PDF file with this name: Team_<Team number>.pdf
    • PP: A PDF file with this name: Team_<Team number>.pdf
    • PS + PR: A ZIP file containing all the necessary files with this name: Team_<Team number>.zip
  • Deadlines: All deadlines for PF, PP, and PS + PR are listed in the schedule.
  • Mentors: Every team has a mentor, who gives feedback and advice during the project.
  • Computing resources:
    • CS Faculty: You can add your Student ID to this list.GSG will provide you with additional computing resources on behalf of the CAISA lab.
    • Saturn Cloud: You can use 150 hours a month free of 64GB RAM and GPU instances. Check this out.
    • Google Colaboratory: Colab is a hosted Jupyter Notebook service that provides free access to computing resources, including GPUs and TPUs. Check this out.
  • Using external resources: You can use any machine learning or deep learning framework you like (Scikit-learn, PyTorch, TensorFlow, etc.). You may use any existing code, libraries, etc., and consult papers, books, online references, etc. for your project. However, you must cite your sources in your final project report.
  • Team:
    • Team size: Students should do final projects in teams of 3 up to 5 people. Larger teams are expected to do correspondingly larger projects.
    • Building a team: You can either find your teammates on your own or ask us to find teammates for you. You may join the CS Master Bonn Discord Server.
    • Submission:
      • Please send us the list of your team members via itnlp.uni.bonn(at)gmail.com in the following format:
        Subject: ITNLP - WS2023 - <Matr. Nr.>
        Team Speaker:   <Name>, <Matr. Nr.>, <Mail Addr.>
        Team Members: <Name>, <Matr. Nr.>, <Mail Addr.>
        <Name>, <Matr. Nr.>, <Mail Addr.>
         
      • In case, you need a teammate, please mail us at itnlp.uni.bonn(at)gmail.com.
        Subject: ITNLP - WS2023 - Looking for a team
        <Name>, <Matr. Nr.>, <Mail Addr.>
  • Deadline: is listed in the schedule.
  • Contribution: In the final report we ask for a statement of what each team member contributed to the project. Team members will typically get the same grade, but we may differentiate in extreme cases of unequal contribution. You can contact us in confidence in the event of unequal contribution.

Exam (60%):

  • Exam dates: will be announced as soon as we receive the rooms and dates from the examination office.
  • Allowed material: Calculator is permitted.

Allocation:

  • 3 + 1 SWS
  • Master in Media Informatics: 6 ECTS credits
  • Master in computer science at University of Bonn: MA-INF 4115 6 CP
  • Students must register for the exam on POS/BASIS.

Literature:

  • J. Eisenstein: Introduction to Natural Language Processing
  • Jurafsky, Daniel, and James H. Martin. "Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition."
  • S. Bird, E. Klein, E. Loper; Natural Language Processing with Python

Schedule:

WeekDateDescriptionEventsDeadlines
Week 1Lecture (Thu Oct 26)   
Exercise (Wed Oct 25)Introduction & Python basics  
Week 2Lecture (Thu Nov 2)   
Exercise (Wed Nov 1)HOLIDAY  
Week 3Lecture (Thu Nov 9)   
Exercise (Wed Nov 8)Word operations and feature extraction using
Pandas & Sklearn
Assignment 1
OUT
Team Members
DUE
Week 4Lecture (Thu Nov 16)   
Exercise (Wed Nov 15)Linear classification using
TF - IDF
Assignment 2
OUT
Assignment 1
DUE
Week 5Lecture (Thu Nov 23)   
Exercise (Wed Nov 22)Word embeddings using spaCyAssignment 3
OUT
Assignment 2
DUE
Week 6Lecture (Thu Nov 30)   
Exercise (Wed Nov 29)Q & A: PF + PS Problem Formulation
DUE
Week 7Lecture (Thu Dec 7)   
Exercise (Wed Dec 6)Dies academicus
(No Exercise)
 Assignment 3
DUE
Week 8Lecture (Thu Dec 14)   
Exercise (Wed Dec 13)Transformers and Generative Models I  
Week 9Lecture (Thu Dec 21)   
Exercise (Wed Dec 20)Transformers and Generative Models IIAssignment 4
OUT
 
Week 10Lecture (Thu Jan 11)   
Exercise (Wed Jan 10)POS tagging & HMMsAssignment 5
OUT
Assignment 4
DUE
(02.01.24)
Week 11Lecture (Thu Jan 18)   
Exercise (Wed Jan 17)Project development Assignment 5
DUE
Week 12Lecture (Thu Jan 25)  Poster
DUE
Exercise (Wed Jan 24)Project development  
Week13Lecture (Thu Feb 01)   
Exercise (Wed Jan 31)Project Presentation (Poster)