Homepage | Course content | Spanish |
We have talked about two types of regression: linear and binary
We also covered how to use gradient descent to optimize a linear regressor and a logistic (i.e. binary) classifier.
Linear and binary classification are extremelly powerful. However, sometimes we want to separate data into C
distinct categories. This is known as multilabel (or multiclass) classification.
As an example, imagine you want to classify the different vowels of human speech.
The Persian Consonant Vowel Combination (PCVC) dataset contains audio for persian vowels, pronounced in combination with a preceding consonant.
Imagine that we want to use this dataset to develop a classification algorithm that can hear and identify which persian vowel is being said.
You can read more about this dataset in the original publication.
Softmax is a function that allows you take an input and use it to generate “probabilities” across each of the C
classes.
The equation of softmax is
If we want to use softmax to classify datapoints , then (where is subtracted for computational stability purposes) and is our matrix with the parameters we are optimizing to carry out classification.
Last class we saw that we can use the binary cross entropy loss to optimize a logistic regression classifier.
To optimize a softmax (i.e. multiclass) classifier, we will use the categorical cross entropy loss .
, the ground truth, is a one-hot vector, where only an entry is the number and all other entries are . Each index in represents a different category. The index with the number indicates which class the corresponding datapoint (i.e. ) truly belongs to.
So far, the optimization routines we have implemented (linear and logistic regression) can give different solutions for the parameters.
This is due to the objective function being only constrained by the comparison between a predicted value and its ground truth .
We can add a term to the loss to regularize the parameters , so that we impose a condition over the type of values that can be present in .
The most common type of regularization is the squared L2 norm, which results in the loss function , where is the “regularization strength” term.
There are other types of regularizaton though.
Due Mar 15 at 11:59PM (Eastern Standard Time)
When you are done, you MUST post a picture on our course subreddit with the performance and confusion matrix you got on the evaluation set (example)
In this homework you will work with raw audio signals.
Standardizing audio signals is challenging when the number of datapoints is limited.
What would happen if we standardize a raw audio samples as we standardized other features?
As a result, our standarization will have to be different.
For this assignment, we will standardize each datapoint to have samples with zero mean and floating-point values normalized to be in range of -1
and 1
.
When training data is limited, we can use a few techniques to expand the number of datapoints.
librosa.effects.pitch_shift
.audiomentations
provides you with readily-available functions to augment audio data.© Iran R. Roman 2022