MA-INF 4115: INTRODUCTION TO NATURAL LANGUAGE PROCESSING

Winter Semester 2024 – 2025

Content:

What is the Introduction to NLP course about?

This course provides a technical perspective on NLP - methods for building computer software that understands and manipulates human language. Contemporary data-driven approaches are emphasized, focusing on machine learning techniques. The covered applications vary in complexity, including for example Entity Recognition, Argument Mining, or Emotion Analysis.
 

Through lectures, exercises, and a final project, you will gain a thorough introduction to cutting-edge research in NLP, from the linguistic basis of computational language methods to recent advances in deep learning and large language models.
 

Recommended participation requirements:

  • Basic programming knowledge in Python and Machine Learning
  • Basics of Machine Learning
  • Basic knowledge of Python Libraries for ML (NumPy, Scikit-Learn, Pandas)
  • Basics of Probability, Linear Algebra and Statistics

Logistics:

  • Lectures: are on Thursday 10:15 AM - 11:45 AM in B-IT-Max 0.109 (Friedrich-Hirzebruch-Allee 6). ZOOM LINK
  • Exercises: are on Wednesday in B-IT-Max 0.109. You can choose one of the following exercise groups to attend. ZOOM LINK
    • Group 1: 2:15 PM - 3:45 PM (Vahid)
    • Group 2: 4:15 PM - 5:45 PM (David)
  • Course Materials: will be uploaded every week on eCampus.
  • Contact: Students should ask all course-related questions in our forum discussion on eCampus. For external inquiries, emergencies, or personal matters, you can email us at itnlp.uni.bonn(at)gmail.com.
  • Office Hours: Please reach out to us first via mail to arrange any in-person meeting.
    • Prof. Dr. Lucie Flek: Friedrich-Hirzebruch-Allee 6 (B-IT) – Room: 2.123
    • Vahid Sadiri Javadi: Friedrich-Hirzebruch-Allee 6 (B-IT) – Room: 2.126
    • David Kaczér: Friedrich-Hirzebruch-Allee 6 (B-IT) – Room: 2.120

NEWS / UPDATES:

  • 18.10.2024: The first exercise starts on Wednesday, 23.10.2024 at 2:15 PM.
  • 18.10.2024: The first lecture starts on Thursday, 24.10.2024 at 10:15 AM

Instructors:

Prof. Dr. Lucie Flek

flek(at)bit.uni-bonn.de

Head of CAISA Lab

Vahid Sadiri Javadi

vahidsj(at)bit.uni-bonn.de

Course Coordinator

David Kaczér

dkaczer(at)uni-bonn.de

Exercise Instructor

Teaching Assistants:


Coursework:

Assignments (Prerequisite for the exam):

Will be uploaded on eCampus.

  • Credits:
    • Assignment 1 (25%): Word Operations & Text Classification
    • Assignment 2 (20%): Word Vectors
    • Assignment 3 (30%): Fine-tuning with LLMs
    • Assignment 4 (25%): Result Analysis
       
  • Deadlines: All assignments are due on Tuesday before the exercise class at 11:59 PM. All deadlines are listed in the schedule.
  • Submission: Assignments should be submitted via eCampus. Further instructions are given in each assignment file. Please do not email us your assignments.
  • Collaboration: Working on assignments in a group of 2 students is allowed. please name your file with both student names. File name: <FirstName_LastName>
  • Grade/ Feedback: You will receive your graded assignment every week on eCampus.
    **NOTE:** You need to achieve at least 50% of the points to be allowed to take the exam.

Final Project (40%):

  • Project Topic: Students choose one of the following subtasks from SemEval 2025 Task 10 
    • Subtask 1: Entity Framing
    • Subtask 2: Narrative Classification
    • Subtask 3: Narrative Extraction
  • Project components:
    • Problem Formulation (PF) (10%)
    • Problem Solving (PS) (15%)
    • Project Poster (PP) (5%):
      • [Guideline for Poster]
    • Project Report (PR) (10%):
      • [Guideline for Final Report]
  • Submission: You submit each project component on eCampus in the following format:
    • PF: A PDF file with this name: Team_<Team number>.pdf
    • PP: A PDF file with this name: Team_<Team number>.pdf
    • PS + PR: A ZIP file containing all the necessary files with this name: Team_<Team number>.zip
  • Deadlines: All deadlines for PF, PP, and PS + PR are listed in the schedule.
  • Mentors: Every team has a mentor, who gives feedback and advice during the project.
  • Computing resources:
    • CS Faculty: You can add your Student ID to this list.GSG will provide you with additional computing resources on behalf of the CAISA lab.
    • Saturn Cloud: You can use 150 hours a month free of 64GB RAM and GPU instances. Check this out.
    • Google Colaboratory: Colab is a hosted Jupyter Notebook service that provides free access to computing resources, including GPUs and TPUs. Check this out.
  • Using external resources: You can use any machine learning or deep learning framework you like (Scikit-learn, PyTorch, TensorFlow, etc.). You may use any existing code, libraries, etc., and consult papers, books, online references, etc. for your project. However, you must cite your sources in your final project report.
  • Team:
    • Team size: Students should do final projects in teams of 3 up to 5 people. Larger teams are expected to do correspondingly larger projects.
    • Building a team: You can either find your teammates on your own or ask us to find teammates for you. You may join the CS Master Bonn Discord Server.
    • Submission:
      • Please send us the list of your team members via itnlp.uni.bonn(at)gmail.com in the following format:
        Subject: ITNLP - WS2024 - <Matr. Nr.>
        Team Speaker:   <Name>, <Matr. Nr.>, <Mail Addr.>
        Team Members: <Name>, <Matr. Nr.>, <Mail Addr.>
        <Name>, <Matr. Nr.>, <Mail Addr.>
         
      • In case, you need a teammate, please mail us at itnlp.uni.bonn(at)gmail.com.
        Subject: ITNLP - WS2024 - Looking for a team
        <Name>, <Matr. Nr.>, <Mail Addr.>
  • Deadline: is listed in the schedule.
  • Contribution: In the final report we ask for a statement of what each team member contributed to the project. Team members will typically get the same grade, but we may differentiate in extreme cases of unequal contribution. You can contact us in confidence in the event of unequal contribution.

Exam (60%):

  • Exam dates: will be announced as soon as we receive the rooms and dates from the examination office.
  • Allowed material: Calculator is permitted.

Allocation:

  • 3 + 1 SWS
  • Master in Media Informatics: 6 ECTS credits
  • Master in Computer Science: MA-INF 4115 6 CP
  • Students must register for the exam on POS/BASIS.

Literature:

  • J. Eisenstein: Introduction to Natural Language Processing
  • Jurafsky, Daniel, and James H. Martin. "Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition."
  • S. Bird, E. Klein, E. Loper; Natural Language Processing with Python

Schedule:

WeekDateDescriptionEventsDeadlines
Week 1Lecture (Thu Oct 24)Intro, Types of NLP tasks, Preprocessing  
Exercise (Wed Oct 23)Introduction & Python basics  
Week 2Lecture (Thu Oct 31)Word vectors, Embeddings  
Exercise (Wed Oct 30)Word operations and Text Classification using
Sklearn
Assignment 1
OUT
 
Week 3Lecture (Thu Nov 7)Neural Nets - RNNs, Attention, ELMo  
Exercise (Wed Nov 6)Word Embeddings using spaCyAssignment 2
OUT
Team Members   |   Assignment 1
DUE                              DUE
Week 4Lecture (Thu Nov 14)Transformer overall, BERT  
Exercise (Wed Nov 13)Transformers and Generative Models I Assignment 2
DUE
Week 5Lecture (Thu Nov 21)BERT continued (specifics), BPE  
Exercise (Wed Nov 20)Q & A: PF Problem Formulation
DUE
Week 6Lecture (Thu Nov 28)Other encodings, Bertology  
Exercise (Wed Nov 27)Transformers and Generative Models IIAssignment 3
OUT
 
Week 7Lecture (Thu Dec 5)GPT2, T5, XLNet  
Exercise (Wed Dec 4)Dies academicus
(No Exercise)
 Assignment 3
DUE
Week 8Lecture (Thu Dec 12)Decoding strategies  
Exercise (Wed Dec 11)

Result Analysis & Interpretation + Q & A: PS

Assignment 4
OUT
 
Week 9Lecture (Thu Dec 19)from "Transformers" to "LLMs", Instruction tuning, Reasoning, Alignment  
Exercise (Wed Dec 18)Project development Assignment 4
DUE
Week 10Lecture (Thu Jan 9)Guest lecture - Multilingual LLMs  
Exercise (Wed Jan 8)Project development  
Week 11Lecture (Thu Jan 16)Repetition for the exam  
Exercise (Wed Jan 15)Project development  
Week 12Lecture (Thu Jan 23)Guest lecture - LLM agents  
Exercise (Wed Jan 22)Project development Poster
DUE
Week13Lecture (Thu Jan 30)Project Presentation (Poster)  
Exercise (Wed Jan 29)Project development