Max amount of FITech students: 10 (1st part) & 5 (2nd part)
Explore generative AI’s potential in our accessible course: Learn to fine-tune large language models and embrace ethical AI principles for real-world impact.
Course contents
This course provides a detailed exploration into the practical application, ethical considerations, and open-source landscape of fine-tuning large language models (LLMs) for students with basic programming skills.
Beginning with foundational data management techniques, the curriculum emphasizes the critical role of high-quality, privacy-conscious dataset preparation, specifically tailored to enhance open-source LLMs. Through detailed instruction on the architecture and functionality of these models, students will gain insight into the vast potential and challenges of working with open-source AI technologies.
Throughout the course, participants will engage in practical exercises and projects that apply fine-tuning techniques to open-source LLMs, emphasizing the creation of AI systems that are not only technologically advanced but also ethically sound, transparent, and secure. This hands-on experience is designed to equip students with the technical skills necessary to contribute to and innovate within the open-source AI community.
Learning outcomes
After the course, the students are able to
- understand the fundamental architecture, operation, and potential of Generative AI and Large Language Models, recognizing the importance of developing and deploying trustworthy AI systems
- acquire skills in dataset creation, cleaning, and preparation, emphasizing the importance of data quality and relevance for effective model training, with a focus on leveraging open-source LLMs
- apply practical fine-tuning techniques to large language models, focusing on creating systems that are fair, transparent, secure, and aligned with ethical guidelines, utilizing open-source tools and frameworks for model enhancement
- evaluate the ethical implications of deploying large language models, advocating for and implementing practices that ensure their trustworthiness and beneficial use in society, particularly through the use of open-source models to foster innovation and accessibility
- implement safeguards for an open-source language model, ensuring its responsible use, protecting against misuse, and maintaining the integrity and security of the model within various applications.
Course material
- Goodfellow, I., Bengio, Y. and Courville, A., 2016. Deep learning. MIT press. Provides foundational knowledge in deep learning, relevant for understanding the underlying principles of LLMs.
- Hapke, H., Howard, C. and Lane, H., 2019. Natural Language Processing in Action: Understanding, analyzing, and generating text with Python. Simon and Schuster. Offers practical insights into working with text data, crucial for LLM applications.
- Vakkuri, V., Kemell, K.K., Jantunen, M., Halme, E. and Abrahamsson, P., 2021. ECCOLA—A method for implementing ethically aligned AI systems. Journal of Systems and Software, 182.
- More literature, articles and papers will be delivered during the course.
- Software and Tools:
- Python programming language: Essential for all practical exercises and projects within the course.
- Jupyter notebooks: Used for coding exercises, allowing for interactive development and documentation of code, notes, and equations.
- Google Colab: Provides a cloud-based Python programming environment with access to free GPU resources, beneficial for training models without requiring personal hardware.
- Datasets for Practice (optional)
- Kaggle datasets: A wide variety of text datasets suitable for fine-tuning LLMs on specific tasks or domains.
- Hugging face datasets library: Offers easy access to numerous datasets, ideal for experimenting with different types of natural language processing tasks.
- Online Courses and Tutorials (optional)
- Hugging Face’s transformers library documentation and tutorials: Detailed guides on using one of the most popular open-source libraries for LLMs, ideal for practical fine-tuning exercises.
- Coursera’s “Deep Learning Specialization” by Andrew Ng: Covers deep learning fundamentals, with some courses focusing on sequence models and natural language processing.
edX’s “Ethics and Law in Data and Analytics”: Provides insights into ethical and legal considerations in handling data, relevant for AI projects.
Teaching schedule
You can complete 5 to 10 ECTS from this course by applying to the 1st part or to both parts.
1 part 7.1.-25.2.2026 (5 ECTS)
- Completing core modules and practical labs.
- Weekly lectures from 14.1.2026 16.00–18 until 11.2.2026.
- Weekly excercises (labs) from 15.1.2026 10.00–12.00 until 12.2.2026. The submission deadline of the exercises is announced in the beginning of the course.
1 part 7.1.-25.2.2026 + 2 part 18.2.-31.3.2026 (10 ECTS)
- Completing core modules and practical labs, and an in-depth Capstone project focused on applying trustworthiness principles in LLM fine-tuning. The learnings are applied to industry inspired group projects.
Completion methods
Evaluation through practical assignments, participation in discussions on trustworthiness, and a practical project that applies fine-tuning techniques to build a trustworthy LLM application.
More information in the Tampere University study guide.
You can get a digital badge after completing this course.