Master's thesis projects

Master's thesis projects at the Machine Learning Group

If you are enrolled in a master's program and interested in writing your thesis in collaboration with the UiT Machine Learning Group we have proposed some interesting projects you can get involved with.

Photo: Nataliia Anisimova/Mostphotos

Constructing a Norwegian De-identification tool for clinical text

Published: 12th October 2020

Today the research using data hungry machine learning methods are growing however not on less resourced languages, such as Norwegian and Swedish and specifically not on clinical text since the patient records contain personal information that may reveal the patients identity and this is not allowed both by ethical reasons but also because of EU GDPR legislation. Therefore, a master thesis topic is proposed to create a Norwegian De-identification tool for clinical text. The tool should perform identification by using NER (Named Entity Recognition) methods for identifying so called PHI (Protected Health Information) personal names, locations, phone numbers, ages, dates, social security numbers etc. and the replace and obscure the found sensitive entities.

Norwegian training data are for example available open resources such as NorNe (Norwegian Named Entities corpus). For the evaluation one may use the Norwegian synthetical clinical text, NorSynthClinical-PHI.

Prerequisites:

  • Programming skills in any programming language but preferably, Python, Perl or Java.

  • Some experience with machine learning platforms.

  • Natural language skills, maybe in Norwegian language or having access to someone knowing Norwegian that can evaluate the results.

Contact person: Professor Hercules Dalianis, Norwegian Centre for E-health Research, Tromsø


Building and annotating a corpus of Norwegian biomedical text

Published: 12th October 2020

Today the research of biomedical text mining is growing, however not for less resourced languages, such as Norwegian and Swedish, there is a lack of resources. Therefore, a master thesis topic is proposed to create a corpus of Norwegian biomedical text, a starting point is to use Tidsskriftet for Den norske legeforening, download the files, extract the text and create the corpus. When the corpus is created annotate it using automatic methods as for example using MeSH or ICD-10 diagnosis codes assignment that are available for both English, Norwegian and Swedish. The aim is to automatically categorise the text in their scientific topics. Tools to use can be lemmatisation, text matching as well as word embeddings, etc

Prerequisites:

  • Programming skills in any programming language but preferably, Python, Perl or Java.

  • Some experience with machine learning platforms.

  • Natural language skills, maybe in Norwegian language or having access to someone knowing Norwegian that can evaluate the results.

Contact person: Professor Hercules Dalianis, Norwegian Centre for E-health Research, Tromsø


Building and evaluating a Norwegian NegEx negation detection system for clinical text

Published: 12th October 2020

Today the research of biomedical text mining is growing and less resourced languages, such as Norwegian and Swedish, are falling behind in the development. Therefore, a master thesis topic is proposed to create a negation detection system for Norwegian. A negation detection system in the clinical context is a system that can find symptoms that are negated in a text. The physician when reasoning about the diagnosis is excluding symptoms and therefore these are not valid.

To build a negation detection system the student can either use an available system for English language (NegEx or ConText, both written in Python) and port it to Norwegian. A NegEx system finds negated symptoms in clinical text and tags them automatically.

There exists a small Norwegian Biomedical Negation Corpora which can be used for evaluation but it needs to be extended with more annotated negations.

One of the difficulties in this project is to create the symptom list for NegEx. The student can either use the Norwegian ICD-10 diagnosis code list and extract the symptoms from there or, for example, use word2vec to extract the diagnosis from the Norwegian biomedical corpora.

Prerequisites:

  • Good programming skills in any programming language but preferably Python.

  • Natural language skills in Norwegian or any Scandinavian language or having access to someone knowing Norwegian or any Scandinavian language who can evaluate the results.

Contact person: Professor Hercules Dalianis, Norwegian Centre for E-health Research, Tromsø


Non-negative least squares

Published: 13th October 2020

Background: Non-negative least squares (NNLS) is a methodology that frequently appears in image processing. The classical treatment is found in Lawson and Hanson (1974), who present an active-set method that is still the go-to algorithm in modern applications. Some improvements have been made (e.g. Bro and De Jong (1997) and Myre et al. (2019)), but the efficiency of the active-set method is theoretically limited. Alternative approaches have recently been proposed, e.g. Gazzola and Wiaux (2017), and these could offer a significant improvement in processing speed when handling large datasets subject to noise.

Research questions: NNLS can be treated through quadratic programming (QP) and could therefore lend itself well to other approaches such as interior point, gradient descent or Lagrangian methods. A systematic treatment of its properties and the various algorithms for solving it would be useful for many image processing applications and could be used to process hyperspectral images and magnetic resonance image data more efficiently than is possible with the active-set methods.

Recommended prerequisites: FYS-2021, FYS-3012

Bibliography:

  1. Bro, R., & De Jong, S. (1997). A fast non-negativity-constrained least squares algorithm. 11(5), 393-401. doi:10.1002/(sici)1099-128x(199709/10)11:5<393::Aid-cem483>3.0.Co;2-l

  2. Gazzola, S., & Wiaux, Y. (2017). Fast Nonnegative Least Squares Through Flexible Krylov Subspaces. 39(2), A655-A679. doi:10.1137/15m1048872

  3. Lawson, C. L., & Hanson, R. J. (1974). Solving least squares problems. Englewood Cliffs, N. J. : Prentice-Hall, Inc.

  4. Myre, J. M., Lascu, I., Lima, E. A., Feinberg, J. M., Saar, M. O., & Weiss, B. P. (2019). Using TNT-NN to unlock the fast full spatial inversion of large magnetic microscopy data sets. Earth, Planets and Space, 71(1), 14. doi:10.1186/s40623-019-0988-8

Contact person: Karstein Heia, Nofima and Fred Godtliebsen, UiT Machine Learning Group


Computer-aided assessment of anatomy and thickness of mandibular cortex in the young population living in Northern Norway using data from Fit Futures 3

Published: 12th October 2020

Background: Anatomy and thickness of the mandibular cortex on panoramic radiographs have been extensively studied as a predictor of systemic osteoporosis. Panoramic radiographs taken for dental reasons are a frequent examination that may contribute to osteoporosis diagnosis. However, it is still unknown the average thickness and anatomy of the mandibular cortex and its distribution across the population of different ages (1). So far, most of the studies assessing cortical thickness and anatomy were carried out in menopausal women. It is mostly unclear how the thickness of the mandibular cortex changes over time. Steadily more studies use computer-aided mandibular cortical thickness assessment as it gives more precise measurements and less observer-dependent variation.

Aims: The current project aims to analyze mandibular cortical thickness and anatomy in the young population in the Fit Future 3 study. The analyses will be carried out automatically with machine-learning methods in close collaboration with the Machine-learning group at the Arctic University of Norway. The tentative research question is: what are the average mandibular cortical thickness and its distribution in the given population? How does it change over time? As the Fit Future study perform repeated measurements from the same participants, it will be possible to analyze long-term changes in the mandibular cortical thickness and their relation to skeletal bone mineral density and osteoporosis.

Materials and Methods: Data from the Fit Futures 3 study (2021) will be used. The study population will consist of men and women ages 20-25-year-old. The mandibular cortical thickness will be measured used as proposed by Devlin and Horner(2). It will be measured on the line drawn through the middle of mental foramen and perpendicular to the tangent to the inferior mandible border: from the inferior mandible border to the cortex's inner edge. (See on the cropped panoramic radiograph below) The machine learning group will apply existing and novel algorithms to find the mandibular cortex's upper and lower edge and measure the distance between them.

Recommended background : The candidate should have background in machine learning and statistical methodology in addition to good programming skills.

References:

  1. Roberts M, Yuan J, Graham J, Jacobs R, Devlin H. Changes in mandibular cortical width measurements with age in men and women. With other metabolic bone diseases. 2011;22(6):1915-25.

  2. Horner K, Devlin H. The relationship between mandibular bone mineral density and panoramic radiographic measurements. J Dent. 1998;26(4):337-43.

Contact persons: Fred Godtliebsen, UiT Machine Learning Group; Napat Limchaichana Bolstad, Department of Clinical Dentistry, UiT


Population counting using Drone Images for Marine Surveys

Published: 12th October 2020

Marine surveys require use of valuable resources (expert's time and boats). UiT in collaboration with Norwegian Polar Institute and University of Southern Denmark is working towards developing a solution for performing population counting based on images captured from flying a drone. The initial plan is to develop a supervised learning based methodology for detecting the number of porpoises in an image. Later on, the plan is to further develop the framework to accommodate for other similar mammal species (with fewer training samples).

Prerequisites: FYS-2021, FYS-3033

Contact person: Puneet Sharma, UiT Machine Learning Group


Investigating visual attention as visual vocabulary

Published: 12th October 2020

With the current state-of-the-art deep learning models, we can build an object based representation of an image. Current eye tracking datasets provide an extensive capture of the image locations over time (in the form of eye fixations). In this project, the student will first investigate whether our visual attention builds an interpretable vocabulary of the scene. She/he will then study if this visual vocabulary is consistent for a particular observer or a particular image and, finally, if it can account for inter-personal differences.

Prerequisites: FYS-2021, FYS-3012

Contact person: Puneet Sharma, UiT Machine Learning Group


Adaptive sampling for atmospheric profiling using HALE UAVs

Published: 12th October 2020

Recent development within electric power generation and propulsion technology has realized the possibility for developing a new class of UAVs. HALE or HAPS (High Altitude Long Endurance, High Altitude Pseudo Satellites) UAVs has, in some cases, the potential for replacing expensive satellite technology within reusable platforms capable of sustainable flight and long-term measurements. A new HALE UAV is under development in a recently funded Polish-Norwegian research cooperation (Figure 1). The project LEADER (Long-Endurance UAV for assessing atmospheric pollution profiles) will develop new technology for high resolution sampling of atmospheric pollutants transported at high altitudes. The platform will contain measurement equipment for measuring wind, temperature, pressure and aerosol concentration in real time. The mission, if you choose to accept it, is to develop methods and algorithms for optimizing the sampling strategy for determining the total concentration, temporal and spatial distribution of the aerosols being measured. Some key issues to address will be:

  • Creating and updating 3D maps or distributions in real time based on sparse local measurements

  • Generate new flight trajectories for optimized sampling based on the current distribution of aerosols and local wind field

  • Optimize the flight trajectory with respect to energy consumption. Use the estimated local wind field and a simple aerodynamic model of the aircraft.

A good reference to start with:

Reymann, C., Renzaglia, A., Lamraoui, F. et al. Adaptive sampling of cumulus clouds with UAVs. Auton Robot 42, 491–512 (2018). https://doi.org/10.1007/s10514-017-9625-1

Contact persons: Agnar H. Sivertsen, NORCE and Fred Godtliebsen, UiT

Image: HALE UAV under development in the LEADER project.


Early detection of caries from x-ray image data

Published: 12th October 2020

X-ray images of the teeth represent an important tool for detecting caries at an early stage. At present, this calls for time-consuming work performed by the dentist. The underlying idea in the proposed task is to investigate whether ML and statistical methods can be used to develop a decision support tool for dentists in their daily work. To this end, we will use a large number of data available through the FIT FUTURE PROJECT. The project will be performed in close collaboration with researchers at Department of Clinical Dentistry at UiT. The work will contain two main tasks:

Task 1: ML and statistical methods will be used to detect caries from the most recent observed x-ray data of the patient.

Task 2: The approach in task 1 will be combined with previous data from the same patient. By this approach, it will also be possible to search for potential changes between consultations at the dentist.

Background: The candidate should have strong background in machine learning and statistics. Analysis of image data is an important ingredient of the work and good programming skills are also needed.

Contact persons: Fred Godtliebsen, Jonas Nordhaug Myhre and Thomas Haugland Johansen at UiT Machine Learning Group; Napat Limchaichana Bolstad and Anna Teterina at Clinical Dentistry UiT


Safe AI using Bayesian Deep Learning

Published: 12th October 2020

Current decision support tools are usually designed by using expert knowledge or data driven techniques. However, these methods are mostly dependent on the high level of understanding of the subject or a dataset with unrealistic high quality to achieve optimal or desired performances. Many real-world problems are highly complex, which require new techniques that can model uncertainties and making decisions based on the availability and quality of data. Approaches toward building a personalized decision support tools include developing a prediction model of the risk and outcome, or deriving safe and effective data driven decision algorithms. With the development of artificial intelligence, deep learning has been used extensively in modelling and prediction. The combination of deep learning with Bayesian inferencing allows information and uncertainties to be accurately estimated from the training data. The AI agent needs to be designed carefully such that it can safely explore the environment and propose actions that are both risk-averse and robust. Integrating deep learning, Bayesian inferencing with reinforcement learning framework will bring great opportunities to solve the problem and contribute toward a safe AI.

Background: A background in Bayesian inference, deep learning and reinforcement learning would be ideal, but a general background in machine learning and statistical methodology will be sufficient. Good programming skills are required.

Contact persons: Fred Godtliebsen, UiT Machine Learning Group and Phuong Ngo, Norwegian Centre for E-health Research

Reference:

[1] Ngo, P. and Godtliebsen, F., “Data-Driven Robust Control Using Reinforcement Learning,” 2020. [Online]. Available: https://arxiv.org/pdf/2004.07690.pdf.


An example of how robust and safe AI was used in controlling blood glucose for patiens with type-1 diabetes [1]. Case 1 to case 4 represent systems for lowest uncertainty to highest uncertainty, respectively. The target blood glucose level is 80 mg/dL. Safe actions will ensure blood glucose not going much lower than this value, which can lead to hypoglycemia, a very dangerous situation. During risky situations or when there is higher uncertainty in the data, risk reduction must be more emphasized than obtaining performance.


Learning from limited labeled data

Published: 12th October 2020

Most successful applications of deep learning have in the past relied largely on supervised approaches, fueled by large amounts of labeled data. However, in many application domains, such as for example the medical domain, obtaining labels can be challenging (requiring expert knowledge, costly etc,). This project aims to develop new deep learning based algorithms in order to learn from limited labels. Potential directions will explore approaches for zero-shot and few-shot learning, clustering and/or domain adaptation.

Prerequisites: FYS-2021, FYS-3012, FYS-3033

Contact person: Michael Kampffmeyer, UiT Machine Learning Group


Co-registration of multimodal medical image

Published: 12th October 2020

Positron emission tomography (PET), magnetic resonance (MR) and computed tomography (CT) are examples of medical imaging technologies that have become very important tools both for diagnostics and therapy. The individual imaging modalities provide insight in different aspects of the patient’s anatomy and physiology, but together they offer a more complete picture and can help reveal the underlying medical truth.

In this project the student will address a practical problem that complicates the joint use of multiple medical image modalities: Even though images are captured near simultaneously, it is often nontrivial to align them such that the different image layers can be placed on top of each other and compared pixel by pixel. Organs and issue may be in relative motive due to breathing and other movements of the patient, which must be corrected before the images can be used together in further analysis. However, this is inherently difficult when the images are taken by multiple scanners using different physical measurement principles.

The work will build upon previous research at the Machine Learning Group, where methods have been developed for comparing pixels abd objects of multimodal images. These techniques will be incorporated in deep neural network architectures to create a model for automatic co-registration (that is, alignment) of multimodal images.

Prerequisites: FYS-2010, FYS-2021, FYS-3012, FYS-3033

Contact person: Stian Normann Anfinsen, UiT Machine Learning Group


Probabilistic analysis of power systems and distributed generation

Published: 12th October 2020

Master’s project are available in relation to the Machine Learning Group’s energy analytics initiative, where the purpose is to deliver machine learning algorithms that can support the transition to efficient, secure and flexible power systems with high penetration of renewable energy. Our current work focuses on the development of deep learning methods for load forecasting, power flow analysis, grid stability analysis, flexibility asset management, and integration of distributed generation (renewable energy). The tools we use are for instance temporal convolutional networks for analysis of time series data and graph neural networks for network modelling. In addition, we aim to make all models probabilistic, such that do not predict only point estimates, but also uncertainties in terms of prediction intervals, such that the forecasts can be used for risk-based management of power grids and smart grids. More information is found in the description of our energy analytics activities:

https://machine-learning.uit.no/home/energy-analytics

Prerequisites: FYS-2010, FYS-2021, FYS-3012, FYS-3033

Contact person: Stian Normann Anfinsen, UiT Machine Learning Group