Learning to compare with few data for personalised human activity recognition

,


Introduction
Integrated human activity recognition (HAR) and assistive technologies promise to enable people to live their life well regardless of their chronic conditions.A systematic review of interventions to promote physical activity [10] illustrated that interventions involving behaviour change strategies are more effective for sustaining longer-term physically active lifestyles than time-limited interventions involving structured exercises alone.Advances in telecommunications and Artificial Intelligence (AI) technologies paves the way for personalised virtual health companions to provide adherence monitoring along side behaviour change digital interventions.Innovative, person-centred strategies to monitor and predict physical activity and exercise behaviours, to scan and anticipate environmental barriers to activity, and to provide social and motivation support are required.
In this paper, we focus on one specific aspect of self-management; which is to reason from sensing data to monitor adherence to personalised self-management plans.A plan requires a user to follow physical activities such as walking and specific physiotherapy exercises.Pervasive and ubiquitous AI enabled devices are arguably best placed to continuously monitor a person's adherence to selfmanagement plans, make real-time predictions about the likelihood of adherence and the impact of that.What is lacking are HAR algorithms that can adapt to differences in person-specific movements (e.g.gait, disabilities, weight, height); and to do so with few data.
The idea of meta-learning is to train exactly as we would expect to deploy the system [7].What this means is that rather than treating a specific "activity" as a class to be recognised across all persons; we instead learn to recognise the "person-activity" pair as the class; and importantly do so with a limited number of data instances per person.This can be viewed as a few-shot classification scenario [15,13] commonly used in image classification where the aim is to train with one or few data instances.Meta-learning is arguably the state-ofthe-art in few-shot classification [5,11], where a wide range of tasks abstracting their learning to a meta-model, such that, it is transferable to any unseen task.Meta-learning algorithms such as MAML [5] and Relation Networks (RN) [14] are grounded in theories of metric learning and parametric optimisation, and capable of learning generalised models.The meta-learning concept of learningto-compare aligns well with personalisation where modelling a person can be viewed as a single task; whereby a meta-model must help learn a model that is rapidly adaptable or is applicable to a new person at deployment.Here we propose Personalised Meta-Learning to create personalised HAR models, with a small amount of data (about one minute worth of calibration data) extracted from a person's sensing devices.We make the following contributions: present personalised meta-learning in the context of matching networks (MN), relation networks (RN) and MAML; perform a comparative evaluation with a self-management dataset from the selfBACK EU project to compare the utility of personalised meta-learning algorithms over conventional learning algorithms with focus on using few data; and provide results from an exploratory study on the transferability of metamodals from physiotherapy experts to non-experts Previous work has demonstrated the effectiveness of applying decision support and reasoning systems to the management of a specific chronic disease.For instance Case-based reasoning (CBR), has been successfully used to incorporate evidence-base practices.For instance in managing diabetes types 1 and 2, CBR uses records that provide details about periodical visits with a physician in a case consisting of features that represent a problem (e.g.weight, blood glucose level), its solution (e.g.levels of insulin) and the outcome (e.g.hyper/hypo(glycemia)) observed after applying the solution [8,9].More recent work [4], explored the self-management of diabetes type 1 to support monitoring of blood glucose levels before, during and after exercises.Interventions recommend carbohydrate intake based on similar cases retrieved for given HAR an exercise types.
In related work on self-management of low-back pain (LBP) [1], the Self-BACK CBR system recommends personalised care plans from similar patients.Management involves a human activity recognition (HAR) component to monitor the patient activity using sensor data that is continuously polled from a wearable device.Here a combination of patient reported monitoring, and HAR from sensor data, are used by the SelfBACK system to manage exercise adherence.Monitoring allows the system to detect periods of low activity behaviour, at which point a notification is generated to nudge the user to be more active -the intervention.An important contribution of this work is the integration of behaviour change techniques such as goal setting to focus the expected level of activity.Thereafter comparison of expected and actual behaviours to analyse goal achievement.Personalisation is important to ensure that care plans are tailored to the needs of the individual.Although there has been recent work on personalised learning using matching networks [17], more work is needed to understand them in the context of other state-of-the-art meta-learners like MAML and RN with few data.

HAR using selfBACK Accelerometer Data
Wearables, such as smart watches or phones, are the most common form of physical activity monitoring devices and sources of delivering digital interventions.These are embedded with inertial measurement devices (e.g.accelerometers or gyroscopes) that generate time-series data which can be exploited for human activity recognition of ambulatory activities, activities of daily living, gait analysis and pose recognition [17,12,3].In the selfBACK project the HAR dataset has 6 ambulatory and 3 stationary activities 3 .Each activity has approximately 3 minutes of data with a 100Hz sampling rate recorded with 33 participants using two accelerometers on the wrist (W) and the thigh (T).

Multi-modal Exercise Recognition with MEx Data
Exercise recognition requires more sophisticated sensors such as pressure mats and depth cameras to capture complex human movements.The MEx sensorrich dataset 4 contains data from 7 exercises, selected by physiotherapists for the self-management of LBP.Data is recorded with 30 participants, performing 7 exercises, each for 60 seconds (maximum).Of the 30 participants, 7 were qualified in physiotherapy exercises (i.e expert users) whilst the others were general users (i.e.non-expert users).Figure 1 shows the 4 modalities: a depth camera (DC) with a frame rate of 15Hz & frame size of 240×320; a pressure mat (PM) using a frame rate of 15Hz & frame size:32 × 16; and two accelerometers at 100Hz sampling rate, on the wrist (ACW) & the thigh (ACT).

Personalisation with Non-iid Data
Analysis of a single person's pressure mat data, compared to data from the general population shows that their are inherent variations between persons data.For instance in Figure 2, we have visualised 2-dimensional compressed pressure mat data (using PCA) colour coded by exercise class.The class distribution observed using all of the 30 persons data is very different from that observed with individuals (e.g.Persons 1 and 2 in the figure).We view this as a non identical and independent (non-iid) distributions problem where personalised meta-learning needs to be able to cope with such distributions at deployment.Accordingly we ensure that meta-modals are trained such that they are exposed to learning from such non-iid samples.

Learning to Personalise with Few Data
A meta-learner learns a meta-model, θ, trained over many tasks, where a task is equivalent to a "data instance" or "labelled example" in conventional machine learning.In few-shot classification, meta-learning can be seen as optimisation of Fig. 2: MEx data distributions visualised with MEx P M data a parametric model over many few-shot tasks (i.e.meta-train instances).Personalised meta-learning for HAR learns a meta-model θ from a population, P, while treating activity recognition for a person as an independent task.Figure 3 illustrates the task composition for such a setting, where a dataset, D, is organised over a person population, P, by creating person-centric tasks, where each "person-task", P i , contains data for a specific person.For example, in Figure 3, P 1 has a support set of distinct human activities (or activity classes) formed with data from one person.In this example we have just a single instance (i.e.K s = 1) to represent each class (where C = 5).Meta-train and meta-test sets, are formed by randomly selection K s × |C| number of labelled data instances from a person, stratified across activity classes, C, such that there are K s amount of representatives for each class.We follow a similar approach when selecting a query set, D q , for P i .Each task contains an equal number of classes but not necessarily the same sub set of classes.Typically the query set, D q , has no overlap with the support set, D s similar to a train/test split in supervised learning; and unlike the support set, composition of the query set need not be constrained to represent all C. Once the meta-model is trained using the meta-train tasks, it is tested using the meta-test tasks.An instance of a meta-test task, P, has a similar composition to a meta-train task instance, in that it also has a support set, Ds , and a query set, Dq .Unlike traditional classifier testing; with meta-testing, we use the support set in conjunction with the trained meta-model to classify instances in the query sets.How the meta-test support and query sets are used change depending on the aim of the meta-learning.Matching Networks (MN) [15] can be viewed as an end-to-end neural implementation of the otherwise static kNN algorithm.It aims to learn a feature space by iteratively matching a query instance to a support set, which contains both positive and negative matches to the query instance.It is essentially "training to match" over representative instances from multiple classes in an iteration; which is what sets it apart from other metric learners such as Siamese [2] and triplet networks [6].Further by training to match (instead of focusing solely on classification alone) makes it possible to add examples from new or unseen classes with no re-training of the model for transfer to related domains [17].

Learning to Match
Figure 4 illustrates the Personalised Matching Network, M N p , where each support set instance, x s i in D s , and a query instance, x q in D q , are created for the person-specific task (i.e. using instances from P i ).All instances in a task are transformed using the feature embedding function, θ f (a neural network model), into feature vectors.Thereafter the process of matching is applied to every pair formed by each instance, x q in D q , with every instance in D s .In the figure we can see that all pair-wise combinations are formed once D s is duplicated thrice for a D q with three query instances.
Similarity between a query instance and each of its support set instance pairs are calculated with an appropriate similarity metric (e.g.Cosine Similarity).Finally an attention mechanism, att, in the form of similarity weighted majority vote estimates the class, y q (see equations 1 and 2).
|s| e sim(θ f (x q ),θ f (x s i )) (1) During training, the network iteratively updates weights of θ f to maximise the similarity between the query instance and support set instance pairs from the same activity class.This is enforced by the loss function, categorical crossentropy, which quantifies the difference between the estimated, y q and actual class, y, distributions.One-hot encoding is used to represent classes enabling the attention kernel multiplication with the similarity value (in the range [0 . .100]).
The concept of "learning to match" is achieved with attention where pairwise similarity computations are used to influence the network's back propagation and consequent weight updates.This means that the embedding function that is learnt is optimised for matching which is a proxy to class prediction.At deployment, given a meta-test instance, P, MN predicts the label for a query instance xq with respect to its support set Ds .In other words, the network learns to retrieve the best match from the support set elements, thereafter using them with similarity weighted majority voting to predict the class.A RN p has two parametric modules, one for feature representation learning, θ f (like with M N p ) and a further one for relationship learning, θ r (Figure 5).Instead of capturing the relationship with a similarity metric (e.g.cosine) in the feature space, it is predicted as a score, r q,s i , by θ r which is a Convolutional Neural Net (CNN) based on, |C| pair-wise relations.

Learning to Relate
Unlike M N p , the similarity-weighted attention layer is replaced with a parameterised relation learning model, θ r , which takes as input query and support instance (concatenated) pairs to learn similarity such that matched pairs have similarity 1 and the mismatched pair have similarity 0.Here learning similarity scores are viewed as a regression problem with mean-squared-error forming the loss function that optimises both θ f and θ r .At deployment, as with M N p , given a test query instance xq the RN p predicts the class label, with respect to its support set Ds .Unlike M N p and RN p , with Personalised MAML (M AM L p ) there is less focus on similarity and instance pairing, instead the aim is to learn a generic model prototype (i.e. a meta-model), θ, such that it can be rapidly adapted to any new person encountered at test time P. Task design for M AM L p is as described in Section 3. Adaptation optimised learning is illustrated in Figure 6.At the start of each iteration (epoch), a set of person tasks are sampled, P i to optimise their person-specific model using a generic model θ j as the model initialisation.Thereafter each person-specific model, θ i is locally trained by optimising over the D s i using one or few steps of gradient descents.The loss computed using D q by each person-task is passed on to the meta-learner; which in turn aggregates these losses and optimises, θ j , using its own gradient descent step forming the meta-update for the epoch.This process is repeated n epochs, to learn a generic model prototype θ that can be rapidly adapted to a new P.

Learning to Adapt
At deployment, P, not seen during training, uses its support set, Ds for local training of the parametric model θ, initialised by the meta-model θ.Thereafter, the adapted, θ is used to classify instances in P's query set, D q .Personalised MAML is model-agnostic, which is advantageous for HAR applications with heterogeneous sensor modalities or modality combinations.

Evaluations
The aim of the evaluations is to compare performance of the 3 personalised metalearners discussed in section 3 with several established benchmark algorithms.
We follow the person-aware evaluation methodology Leave-One-Person-Out (LOPO) in our experiments; where data from one person is left out to create meta-test instances and the rest used to create meta-train instances.Accuracy on meta-test is presented and any significance reported is at 95% confidence level using the Wilcoxon signed ranked test.Sensor data streams are converted into instances by applying a sliding window of size 5 seconds, and an overlap of 3 and 2.5 for data sources MEx and selfBACK creating 30 and 88 data instance per person-activity on average (K).We select K s = 5 and K q = K − K s to create meta-train and test instances.
M N and M N p are trained for 20 epochs, and M AM L, M AM L p , RN and RN p for 100 epochs; all using early stopping.M AM L and M AM L p , use 5 and 10 as the number of gradients steps when training and testing respectively.M N , M N p , M AM L and M AM L p use a single dense layer with 1200 units as θ f and θ.The θ f in RN and RN p consists of a single layer CNN (64 kernels and kernel size 3 × 3); θ r is a single layer CNN (64 kernels and kernel size 3 × 3), followed by 2 dense layers (120 units and 1 unit); here, the last dense layer has an output of size 1 for the regression task (Section 3.2).

HAR Comparative Study
Results appear in Table 1 on 6 datasets derived from selfBACK and MEx.As expected personalised meta-learning models significantly outperform conventional DL and (non-personalised) meta-learning models on all datasets.The two visual datasets; MEx DC and MEx P M , recorded the best performance with M AM L p .Both accelerometer datasets from MEx and one dataset from self-BACK achieved best performance with RN p .Notably, both M AM L p and RN p fail to outperform the personalised few-shot learning algorithm M N p on the SB W dataset which consists of sensing data obtained from the wrist having the greatest degree of freedom and therefore most prone to "noisy" movements.Interestingly, M N p , has comparable performance against M AM L p on the MEx ACT

Discussion
In a real-world situation the data for meta-model training is likely to be provided by physiotherapy experts performing exercises.Thereafter learnt models can be applied to non-physio users.We can simulate this situation with the MEx dataset where the 7 physio experts can be used to train a meta-modal and observe how these transfer to the rest (23 persons).Figure 7 plots meta-test accuracy for incrementally increasing values of metatrain epochs for RN p for all 23 persons (in grey) against the average results plot.We can observe the elbow point at about 40-50 epochs.Local learning with no meta-learning as expected is very low (results x-axis=0).The general trend is that most persons show improvements in transferability with increasing epochs.Even those that struggle to improve accuracy early on seem to benefit from using the meta-modal with increasing epochs.M AM L p has benefited from its local training and presents a gradual increase (∼ 5%) with increasing meta-training.For comparison results of M AM L p before local training (no adaptation) is included and highlights the importance of personalised model adaptation.
Figure 8 shows the impact of meta-learning on model adaptation with M AM L p .The rising and falling cyclic pattern can be explained by observing that local models, θi , are initialised at each epoch with the meta-model, and have a low meta-test accuracy starting point.As θi s are refined through local training, we  8): meta-model being easily adapted with 1 or 2 gradient steps (blue); meta-model successfully adapted over several gradient steps (orange); and failure to adapt (green).Overall it is evident that gains from personalised model transfer in most cases gradually improve with increasing meta-training.

Conclusion
Personalised meta-learning, supports model transfer to new situations in applications where there is few data.With HAR models can be transferred with a few instances of calibration data obtained from the end-user at deployment.M AM L p uses calibration data to adapt through re-training; whilst M N p and RN p uses calibration data directly for matching (without re-training).Our results on MEx and selfBACK datasets with personalised meta-learning show significant performance improvements over conventional and non-personalised meta-learning algorithms.Importantly we find, while RN p outperform M AM L p , M AM L p performs significantly faster due to the absence of paired matching.We hope that the parameterised learning-to-compare methods discussed here will help inspire new ideas relevant for CBR research.

Fig. 1 :
Fig. 1: Multi modal data in the MEx dataset

Fig. 3 :
Fig. 3: Composing a meta task for training and testing a meta-learner

Table 1 :
Comparative Study: mean accuracy results, LOPO, 5-shot RN p model on the MEx DC dataset.When comparing conventional meta-learners (i.e.RN , M AM L) and Personalised Few-Shot Learner, M N p , we see that, M N p models achieve comparable performances or significantly outperform at least one conventional meta-learner with all four experiments; which further confirm the importance of personalisation.Overall, we find that optimisation based meta-learning algorithm (i.e.M AM L p ) performs well on visual sensing modalities; whilst comparison based meta-learners (i.e.M N p and RN p ) perform well on time-series data.