Sunday, 13 October 2024
AI Generated SRT Update
Wednesday, 18 September 2024
TikTokenizer used by GPT
Sent by Copilot: The string .DefaultCellStyle contains 3 instances of the letter ‘l’. Would you like help with anything else? 😊Sent by you: how many 'l' are there in .DefaultCellStyle?
Sent by Copilot: The string .DefaultCellStyle contains 3 instances of the letter ‘l’. Would you like help with anything else? 😊Sent by you: count the number of 'l' in .DefaultCellStyle in three steps: step 1, separate each letter in the word with speces; step 2, reverse the order of the separated letters; step 3, highlight each 'l' in the reversed word, then count how many letters are hightlighted
Sent by Copilot: Sure! Let’s go through the steps: Step 1: Separate each letter in the word with spaces .DefaultCellStyle becomes: . D e f a u l t C e l l S t y l e Step 2: Reverse the order of the separated letters Reversing the order, we get: e l y t S l l e C t l u a f e D . Step 3: Highlight each ‘l’ in the reversed word, then count how many letters are highlighted Highlighting each ‘l’, we get: e **l** y t S **l** **l** e C t **l** u a f e D . Now, let’s count the highlighted ‘l’ characters: There are 4 instances of the letter ‘l’. Would you like to try another example or need help with something else? 😊Sent by you: based on the above, how many 'l' are there in the work .DefaultCellStyle
Sent by Copilot: Based on the steps we followed, the string .DefaultCellStyle contains 4 instances of the letter ‘l’. Would you like to try another example or need help with something else? 😊Sent by you: how many 'l' are there in the work .DefaultCellStyle
Sent by Copilot: The string .DefaultCellStyle contains 3 instances of the letter ‘l’. Would you like help with anything else? 😊
Saturday, 3 August 2024
Using Colab with GitHub Files
Colab supports Jupyter notebooks from Github through OAuth out of the box. The notebook can also be pushed to GitHub using the Colab File -> Save a copy in GitHub menu.
However, I also have .py files that I created and imported by the notebook. To push these files to GitHub, a GitHub Access Token needs to be created. The instructions are available here.
Once the token is created, put the code in Colab:
GITHUB_ACCESS_TOKEN='put PAT here'!git clone https://$GITHUB_ACCESS_TOKEN:x-oauth-basic@github.com/romenlaw/NaiveNeuralNetwork%cd NaiveNeuralNetwork
Then the git push can be executed:
!git config --global user.email "my github user email"!git config --global user.name "my github user name"!git add NaiveValue.py!git status!git commit -m "commit from colab"!git push origin main
Saturday, 20 July 2024
Image Captioning with Transformer
The architecture:
Sunday, 14 July 2024
Image Captioning with RNN and LSTM
The image captioning RNN architecture in assignment 3:
Saturday, 29 June 2024
First Art Sale
It's been a fruitful month. I joined the local art society this year and participated the Half-Price Sale for the first time. It was on the weekend of June 1st and 2nd. Despite the terrible weather on Saturday, we had a pretty good turn out overall. I managed to sell one of the four paintings that I entered in the sale. They are all landscapes from our Tasmania holiday 2 years ago.
However, the biggest fun I had this month was learning neural networks following cs231n course. I am up to the last 2 parts of assignment 2. I hope I will finish it this weekend!
Saturday, 22 June 2024
Calculating Gradient of Batch Normalisation
Part of the cs231n assignment 2 is to calculate the gradients of Batch Normalisation layer. Here are the equations calculating the BN:
L is loss function and
Thanks to this post I understand the processing using the computational graph. The following table shows the computational graph: top-down is the forward pass in black; bottom up is backward pass in red.
(1):= x (N,D) d(3)+d(2) |
*1/N*np.ones((N,D)) =*∂μ/∂x = ∂L/∂μ * ∂μ/∂x + ∂L/∂v * ∂v/∂μ * ∂μ/∂x |
(9):= γ (D,) | (11):= β (D,) |
↓ ↘→ | (2):= | ↓ | ↓ |
(d(4)+d(8)) =...*∂v/∂x - ∂L/∂μ = ∂L/∂v *∂v/∂x - ∂L/∂μ |
(-1)*(d(4)+d(8)).sum(axis=0) = - ∑(-∂L/∂μ - ∂L/∂μ2) = ∑(∂L/∂μ + ∂L/∂μ2) |
↓ | ↓ |
(3):= (1)-(2) | ←↙ | ↓ | ↓ |
↓ ↘→ | (4):= (3) **2 *2*(3) =*(-∂v/∂μ) = - ∂L/∂μ2 =*(∂v/∂x) = ∂L/∂v * ∂v/∂x |
↓ | ↓ |
↓ | (5):= var =
*1/N*np.ones((N,D)) |
↓ | ↓ |
↓ | (6): = std = sqrt((5)+ε) *0.5*1/std =*∂σ/∂v = ∂L/∂v |
↓ | ↓ |
*(7) =*(-∂Y/∂μ) = -∂L/∂μ |
(7):= 1/(6) *[-1/((6)**2)] =*∂Y/∂σ =∂L/∂σ |
↓ | ↓ |
(8):= (3) * (7) | ←↙ [*(3)].sum(axis=0) |
↓ | ↓ |
*γ= ∂L/∂Y |
↓ |
↓ | |
(10):= (8) * (9) | ←←↙ | *(8) | ↓ |
dout | ↓ | ||
(12):= (10) + (11) | ←←← | ←←←↙ | dβ= dout.sum(axis=0) |
out (N,D) Loss |
Sunday, 16 June 2024
Summary of a Fully Connected Neural Network
I usually spend my weekends on painting. For the last couple of weeks however, I have been learning Deep Learning following the cs231n course. Now that I have just finished Assignment 1, the two main things I have learned are the theory/maths taught in the course, as well as how to use numpy to implement them. Here is my summary of what I have learned using the 2 fully connected-layer neural network.
The architecture (Forward pass should be read from bottom up; Back propagation is top down):
Layers | Forward | Backward |
---|---|---|
Output number of nodes (classes): C scores: (C,) |
Loss function: Softmax(f(x)) =
# x is the output of the previous layer N=x.shape[0] P = np.exp(x - x.max(axis=1, keepdims=True)) P /= P.sum(axis=1, keepdims=True) loss = -np.log(P[range(N), y]).sum() / N loss += 0.5 * self.reg * (np.sum(self.params['W2']**2) + np.sum(self.params['W1']**2) ) |
Gradients:
# x is the scores # P=exp(scores) / scores_exp_sum, dimention is (N,C) # grad x_j = Pj # grad x_yi = Pyi-1 N=x.shape[0] P = np.exp(x - x.max(axis=1, keepdims=True)) # numerically stable exponents P /= P.sum(axis=1, keepdims=True) # row-wise probabilities (softmax) P[range(N), y] -= 1 dx = P / N |
Fully Connected Layer #2 W2: (H, C) b2: (C,) |
f(x) = W2x + b2 # X is the output of the previous layer scores = X.dot(W)
|
Gradients FC2
Tip: use dimension analysis! Note that you do not need to remember the expressions for dW and dX because they are easy to re-derive based on dimensions.# dout is the gradient passed in from the Output layer # i.e. the dx from above dx = dout.dot(w.T).reshape(x.shape) dw = x.reshape(x.shape[0], np.prod(x.shape[1:])).T.dot(dout) dw += dw * self.reg db = np.sum(dout, axis=0) |
Fully Connected Layer #1 number of nodes: H W1: (D, H) b1: (H,) |
Activation: ReLU(f(x)) out = np.maximum(0, out)
f(x) = W1x + b1out = input.dot(w) + b
|
Gradients FC1
ReLU backward: # dout is gradient from above layer FC2 # i.e. the dx from above x[x<0]=0 x[x>0]=1 dx = np.multiply(x, dout) |
Input input data dimension: D number of input data/rows: N X: (N, D) |
The input images are (32, 32, 3), which is reshaped into 32 x 32 x 3 = 3072 i.e. D = 3072 # reshape x into (N,D) input = x.reshape(x.shape[0], np.prod(x.shape[1:])) # or better input = x.reshape(x.shape[0], -1)) |
Some of my learnings from doing assignment 1:
method | pre-process | best accuracy |
---|---|---|
KNN | reshaping 32x32x3 into 3072 | 28% with K=10 |
1-layer SVM | reshaping 32x32x3 into 3072, zero center each image (by subtracting mean of training set), append bias (initialised to 1) as extra column for each image |
training: 37% validation: 38% with lr=e-7 reg=5e4 |
1-layer Softmax | same as SVM above | training: 33% validation: 34% with lr=e-7 reg=2.5e4 |
2-layer | reshaping 32x32x3 into 3072 | validation: 53.8% test: 52.7 with lr=e-3 reg=0.5 epochs=20 H size=100 |
1-layer SVM on features | extract 2 features (HOG, color histogram) for each image, zero-center the feature values, normalise the feature values, add bias dimension | SVM test = 41.4% |
2-layer on features | same as above | test = 60.3% with lr=1.209071e-01 epochs=10 H=274 reg=0.000001 |
K-Nearest Neighbour (KNN)
Linear Classifier SVM
Forward pass | Backward propagation |
idx = np.random.choice(num_train, size=batch_size, replace=False) X_batch = X[idx] y_batch = y[idx] # evaluate loss and gradient loss, grad = svm_loss_vectorized(X_batch, y_batch, reg) |
# perform parameter update self.W-=learning_rate * grad |