跳至正文

代写Multiclass Classification and Multicluster binary classification作业assignment

This project consists of three assessed components: a group report (group submission), an accompanying
code archive (group submission) and an individual reflective report (individual submission). The
guidance on these three parts is provided in Sections 2, 3 and Section 4 respectively. The group report
counts 35% towards your final grade of the module. The code counts 15% to the final grade. The
individual report counts 30% to your final grade and is due one week after the deadline for your group
report and code submission.

On completion, each group should submit files with the following titles:
• fomlads group report .pdf containing a group report
• fomlads code archive .zip containing a zip archive of the associated code
Individuals should also submit a file titled
• fomlads individual report .pdf containing their individual report
In the titles above and should be replaced with your group id and student
id respectively. For group submissions, only one person out of the group needs to submit. Every student
needs to individually submit the individual report.
DO NOT include your names anywhere in the group report, code or individual report.

RT1: Model/algorithm comparison
Machine Learning practitioners often have many algorithms at their disposal to solve a particular problem.
However, it is not always clear a priori, which model/algorithm works the best for the problem at hand.
This research template involves choosing one multiclass classification dataset with 5 or more classes,
one data representation for that dataset and one evaluation metric. Your objective is to evaluate
a number of models to see which performs best and understand a little about why. For this research
template you should use between 3 and 4 models (not more), including at least one lab model, and
at least one external model.
You will need to briefly explain the principles of any external classification model you use in your
report. This is not a tutorial explanation, but a high level description that gives the intuition behind
what is happening and the characteristic outcomes of the approach, e.g. under what conditions it works
well and under what conditions it fails.

RT2: Dimensionality Reduction
One of the most vital ingredients to a machine learning algorithm is the choice of representation. In
this research template, you are going to explore how dimensionality reduction techniques can be used

RT3: Multicluster linear classification
In this research template, you should explore the use of clustering to partition your data, so that separate
linear models can be trained on each cluster. The argument here is that some classification datasets
may best be modelled with a non-linear classifier, but this can be approximated to a collection of
linear classifiers applied to different regions (and that these regions can be discovered with a clustering
algorithm)微信risepaper. This research template involves choosing one dataset with a binary target, one data
representation, one evaluation metric and one clustering approach from the following list:
• K-means, see the lecture notes. You may use the SciKit implementation of sklearn.cluster.KMeans
• Agglomerative Clustering, see the lecture notes. You may use the SciKit implementation of
sklearn.cluster.AgglomerativeClustering, but you will need to cut the hierarchy to give a
flat clustering.
• DBSCAN, see [EKXM96] or [SSE+17]. You may use the SciKit implementation of sklearn.cluster.DBSCAN

1.2.1 Choosing a dataset
You must choose a one dataset and this requires you to search out an appropriate dataset online.
Good sources for datasets are from Kaggle (https://www.kaggle.com/datasets) or UCI (https:
//archive.ics.uci.edu/ml/index.php). The dataset should ideally have no more than 100000
datapoints, but you can always take a larger dataset and then take a subset for your experiments.
However, you will need to make this subset available to the markers, so take the subset first save it down
to file, work with that file, and submit that file as part of your code submission. Your final code should
run on a desk computer (CPU only) in less than 15 minutes to obtain all the necessary results presented
in your report from scratch.

Model/Algorithm
This is something we have spent some time on in the lectures and labs. How to model a classifier on
your data and how to fit the best parameters for that model. The lab models – those described in the
lectures are:
• Fisher’s linear discriminant
• Logistic Regression
• Shared Covariance Model
Although, we have implementations for these in the fomlads library, you may instead use the SciKit
implementations of these models, but you may only use the fit, predict and predict prob methods for
these models.
Under some circumstances, you may also use one (or more) of the following external models (again
this depends on your choice of research template – see Section 1.1). For these models, you are allowed
to directly use the Sklearn API but consult Section 1.5 for the allowed functionality. The appropriate
library function is indicated for each.
• Lasso regression – see here
• Elastic net – see here
• Random Forest Classifier – see here.
• Support Vector Machine (SVM) – see here.
Treat SVMs with different kernel types as different models. In particular, a SVM with a linear
kernel is a linear model (it fits a linear decision boundary). A SVM with a gaussian kernel is
non-linear. Polynomial kernels can be slow and perform less well than gaussian kernels.
• K-Nearest Neighbours (kNN) – see here.
• Multilayer Perceptron (MLP) – see here.
You can use at most one hidden layer.
When using either lab models or external models some effort to select best performing hyperparameters
is expected. This can include model hyperparameters, which influence the family of functions
fit (such as RBF location and width, random forest tree depth or branching factor, MLP activation function
or number of hidden units) as well as training hyperparameters (such as gradient stepsize or
regularisation parameter). Note that some models have a high number of model hyperparameters and
you may not be able to evaluate everything. A general rule of thumb is that at most 3 hyperparameters
per model should be investigated.
If using external models then some effort should be made in the report to demonstrate an understanding
of the model, and the influence of each model hyperparameter on the model (e.g. in terms of
flexibility of function and/or variance/bias)代写微信risepaper. For each of the models above, there are suggested references
on the associated SciKit learn documentation page which you can read. You should only read up on the
models you use.