USA + 1 206 2798153 | UK + 44 121 7900441 | AUS + 61 73 1860300

High-Quality Assignment by Professional Writers

  • Expert academic writers delivering high-quality, customized assignments.

  • Affordable pricing with student-friendly discounts and no hidden charges

  • On-time delivery guaranteed for all assignments.

  • 24/7 customer support to assist with queries anytime.

  • No AI and No Plagiarism

  • 15000+ happy customers, 500+ Qualified Writers, 40000+ Assignments Delivered

start  4.6

start  4.7

start  4.5

Know the Assignment price in a few minutes

Total file size must not exceed 20MB. An email will be sent to care@bestassignmenthelp.com.

Get your Assignments done in just 3 simple steps...

1  UPLOAD YOUR ASSIGNMENT


  • One Click professional help

  • Convenient 24x7x365 Support

  • Verified experts around globe

  • Any Subject Assignment help

2  RECEIVE QUOTES


  • Quote by Email, Live chat

  • Make the Payment

  • Affordable Pricing

  • Unlimited support for any order

3  GET ASSIGNMENTS


  • Delivery before Deadline

  • 100% Satisfaction Guaranteed

  • Realtime, Live chat & Whatsapp Support

  • 97.13% Positive Client Feedback

CSC 737: Assignment 3

Subject: Business Studies

Keywords : CSC 737: Assignment 3


Question:

CSC 737: Assignment 3

Total: 100 points

Problems

1. (30 points) In which cases would you want to use each of the following activation functions: SELU, leaky ReLU (and its variants), ReLU, tanh, logistic, and softmax? Justify your answer in each case.

2. (20 points) Does dropout slow down training? Does it slow down inference (i.e., making predictions on new instances)? What is Monte Carlo (MC) dropout? Does MC dropout slow down training and inference? Explain your answers.

3. (10 points) Is it okay to initialize all the weights to the same value as long as that value is selected randomly using He initialization? Justify your answer.

4. (20 points) Provide a paper review on the following paper:

Link: https://arxiv.org/pdf/1906.06821.pdf

5. (20 points) Provide a review for the following paper:

Jin, X., Xu, C., Feng, J., Wei, Y., Xiong, J., Yan, S. (2016). Deep Learning with S-Shaped Rectified Linear Activation Units. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1). Link: https://https://ojs.aaai.org/index.php/AAAI/article/view/10287

or, On the guest lecture by Dr. Belkhouche on Nov. 16. Your review should include summary and critical analysis. Critical analysis should include what you like and dislike about the paper, is there any questions regarding the content. Your summary should not exceed 500 words.

General Instructions

  • Submit a single PDF or Word file for your submission.
  • Type in your answers mostly. Do not hand-write, scan and upload.
  • All problems need to be indexed as per their indexing in the assignment.
  • Provide list of references which include books, web-sites, people and papers.
  • No late assignment accepted.

Solution:

REVIEW PAPER

 

 

Table of Contents

1...................................................................................................................................... 3

2...................................................................................................................................... 6

3...................................................................................................................................... 8

4...................................................................................................................................... 9

5.................................................................................................................................... 10

References.................................................................................................................... 12

 

 

 

Question 1.

This SELU activation is a self-normalising form used in a neural network. This is used in components of neural networks based on biases, weights and activations, with “a mean of zero” and a normalisation in “standard deviation” alternatively. There could also be an output value realised through this “SELU activation function”. The outputs of SELU are normalised which could be further called “internal normalisation” (Mlfromscratch.com, 2022). Thus, there is a fact with all results with a mean of zero and one of “standard deviation”. There is a difference in “external normalisation” in which batch normalisation and other various methods are used. Thus, this SELU is constructed to map with properties leading to Self normalising neural network.

Image

Figure 1: SELU analysis with exponential operations

(Source: Mlfromscratch.com, 2022)

In this figure, in case this x is more than zero then its output is depicted with y. In case x is low than zero then use “alpha value” is multiplied by x(“exponential operations”).

“The Leaky Rectified Linear unit” is a function that has an “alpha value” that ranges between 0.1 - 0.3 (Paperswithcode.com, 2022). This is generally used in activation functions however; there are certain drawbacks in comparison to ELU. The form is

“LReLU(x) = {x         if x>0

                        {αx       if x≤0”

Thus this shows that the input of x is more than zero the output is x. In case the output is low than zero, then its output is to be alpha times in inputs.

Image

Figure 2: Leaky ReLU plotted

(Source: Paperswithcode.com, 2022)

The above equations show that the x value map is same as the y value, in case x is more than 0. However, x value is low than 0 there is a “coefficient alpha” that is 0.2. Thus, this says that input value of x is 5 and output value in this map is 1.

The neural network of Tanh activation is “f(x) = ex-e-x/ ex+e-x” (Yang et al., 2018).

This function is preferred more than the sigmoid function and provides better performance in multi-layering neural networks.

Image

Figure 3: Tanh Activation

(Source: Paperswithcode.com, 2022)

This logistic activation function is considered in order to take a real value in outputs and inputs that have a range of 0 - 1. The bigger is the input it is considered positive and the closer of output value is 1.0 and smaller input that is more negative has a closer output that is to be 0.0.

This “softmax” is a mathematical function that is converted from a “vector of numbers” to a vector in probabilities. This is mostly used in machine learning and is used as an activation function in the “neural network model” (Pedamonti and D., 2018). This network is configured with N output values. These softmax functions are used in normalising outputs and conversion of weighted sum values to the probability of one sum. The values of output in the softmax function are analysed with probability in membership of a certain class. This softmax function can be brought to mathematical function thoughts with a softer version or probabilistic in argmax function.

Question 2.

The omission of iteration neurons in dropouts that are omitted with iteration is not upgraded with a back propagation. There is no existence. Thus, there is a slowdown in the training phase. It is said that dropout rates are less than small values (Dureja et al., 2019). Three is a gradual decrease in loss. There is a starting from 0.2 or 0.3 that is compared with “bias vs. variance” in order to receive good values in dropouts.

This group out is a technique in regulation of neural networks that deepens learning models.  There is a problem of under fitting and over fitting in machine learning. This over fitting occurs when there is effective learning in training and poor performance at time of testing. This model will have a high accuracy score in a dataset of training, however, a low score in testing (Nguyen et al., 2021). This under fitting occurs when there is neither any learning received nor performed well in testing.

Image

Figure 4: Neural Network

(Source: Towardsdatascience.com, 2022)

The left network shows original neurons are working and activated. The right network shows that neurons are been removed in the network and red ones are used in model training (KILIÇARSLAN et al., 2021). There is an accuracy score tested using this testing dataset that has following code: “model. evaluted(X_testr, y_test).

Thus the testing score is analysed to 0.7692 which has reparation of 76.92%. This shows accuracy in taking is greater than testing accuracy.

Monte Carlo dropout training is provided with scalable ease in learning predictive distributions. This is worked through random switch-off neurons in a neural network that has a regulation in the network. These dropouts are smiley switching off neurons in training steps. In every step, these neurons are switched off (Nero et al., 2018). There is a probability of neurons being ignored. This group out rates are valued from 0 to 0.5.

There is a slowdown in the training process through MC. This is generally applied during test time. There is a prediction on various models. There is an average analysis of each distribution analysed. The good thing is that this does not require any model’s architecture. There can be the use of various tricks in this model that are previously trained.

Image

Figure 5: MNIST dataset

(Source: Towardsdatascience.com, 2022)

The above figure shows that there is training to 30 epochs and this model has secured with an accuracy of about 96.7% in test set (Labach et al., 2019). This prediction time in dropout there is a need of ensuring “training-like behaviour”.

Question 3.

It is said that there is no matter what is input, in case all weights are the same then all units are also to be the same in the hidden layer. This is mainly an issue in symmetry and analysis of reason that will help in initialising these weights randomly. This issue has an effect on architecture that undertakes each and every connection. Thai weights in artificial neural networks are to be initialised in small numbers (Cai et al., 2019). This is so as expectations in stochastic optimisation algorithm are used in training model that is known as stochastic gradient descent.

Artificial neural networks are used in optimising stochastic algorithms that are known as “stochastic gradient descent”. Thai algorithm considers randomness that is to find good sets of weights in order to map any function through its outputs and inputs in data that are being learned. There is a specified network in this specific training data that has different networks with different skills at all times that this training algorithm is learned (Chen et al., 2019). This stochastic gradient descent is required with weight in the network that is initialised through small random values such as 0.0, 0.1. This randomness is used in process of searching in shuffling this training dataset prior to each epoch, which would turn these differences in results into gradient estimates in each batch.

The progression in search and lowering in a neural network is referred to as convergence. This discovery is a sub-optimal solution and local optimal that is referred to as “premature convergence”. There can be use of same weight set every time there is a train in a network. This could analyse with a value of 0.0 in all weights (Inoue et al., 2019). Thus, in this case, there may be a failure in learning of equation in an algorithm. There will be chances made with network weights and finally, this model would be stuck.

Question 4.

There is a great increase in use of machine learning that is attracting a large number of researchers and practitioners. One can employ a wide range of loss functions in supervised learning, including the square of Euclidean distance, crossentropy, contrast loss, hinge loss, information gain, and other loss functions. Utilizing the loss function equal to the square of the Euclidean distance to minimise square errors on training samples is the simplest method for handling regression issues (Sun et al., 2019). There are various optimisation methods adopted that had a high influence on machine learning. Transformation of a network using Adam optimization which are applied in tasks of machine translation. There can be the use of a stochastic optimisation method that could be applied in “Markov chain Monte Carlo” that improves efficiency. This is used in handling a large number of samples. This development in optimization has brought great progress in machine learning.

However, there are still many challenges left with this machine learning with ways in improving optimisation performance with fewer data in neural networks being problematic. As per the views of Yang et al.,(2018), there is not enough availability of samples with deep knowledge of this neural network and this has become the prone cause of high variances and also over fitting. There are samples that are often truncated at times when there are long sequences. This causes creation of deviation. The use of stochastic variations is practical and graceful. There are probably good choices in developing methods for application of high-order information with variation inferences. 

The first function in this step of machine learning is based on building blocks and construction of reassembles. There are determined objective function that has appropriateness with analysis and numerical optimization methods. As per the opinion of Pedamonti et al., (2018), these learning’s on algorithms are divided into semi-supervised learning, supervised learning and reinforcement learning. There is various direction methods used in numerical optimisation that are:

  • Gradient descent: This is mostly used optimisation method. This method adopts variables in opposite directions in gradients that have an objective function.
  • Stochastic Gradient Descent: There is high complexity with a large scale of data and there are no online updates in this method (Dureja et al., 2019). SGD has an update on with dealing a large number of samples.
  • Nesterov Acceslarted Gradient Descent: This tells about learning rate and speed in convergence and prevention of trap at a minimum range in search with a research direction.

There is also Meta-learning that is also a popular way in research direction in fields of machine learning. There are solutions to large problems in learning this algorithm. Human beings have few tasks with training samples that are more efficient.

Question 5.

A Convolutional neural network has huge progress in different areas such as detection, objective classification and character recognition. There are key factors in success of modern deep learning that have a non-saturated activation function (Nguyen et al., 2021). There are APL that has an appropriate “non-convex function” that requires the proper linear factions in a component function that has a bias and unit slope of 0 that undermines representative ability and inappropriate constraints. This article has a detailed analysis of rectified linear equations that are important components in the art of deep convolutional networks. There is a purpose with an “S-shaped rectified linear activation unit” that helps in learning non-convex and convex functions. As per the views of KILIÇARSLAN et al.,(2021), there is an imitating of multiple functions in form of various fundamental laws that are based on Steven law and Webner-Frchner law. This “SReLU” is also learned with training in a whole deep network with back propagation. There are various layers in its training phase that are with a freezing method that are transferred to a predefined leaky learner unit in its initial period. “SReLU” is defined as a combination of linear equations that are performing mapping with various formulas.

The article propose SReLU, which imitates the logarithm function and the power function supplied by the Webner-Fechner law and the Stevens law, respectively, and employs piecewise linear functions to approximate non-linear convex and non-convex functions (Jin et al. 2016). This idea was based on the earlier research into these two legislation. As per the opinion of Labach et al., (2019) there could also be universally used in deepening network analysis. Experiments are made based on four databases such as CIFAR-10, MNIST, CIFAR-100 AND ImageNet through GoogLeNet and NIN that have demonstrated with SReLU that has effective boosting in performance with a deep network, there will also be exploitation in work.

 

 

References

Books:

Zhou, Z. H. (2021). Machine learning. Springer Nature.

Website:

Mlfromscratch.com, 2022, Activation Functions Explained - GELU, SELU, ELU, ReLU and more, https://mlfromscratch.com/activation-functions-explained/#leaky-relu, [Retrieved on: 11.11.2022]

Paperswithcode.com, 2022, Tanh Activation, Available from: https://paperswithcode.com/method/tanh-activation, [Retrieved on: 11.11.2022]

Towardsdatascience.com, 2022, Available from:  https://towardsdatascience.com/monte-carlo-dropout-7fd52f8b6571, [Retrieved on: 11.11.2022]

Journal:

Cai, S., Shu, Y., Chen, G., Ooi, B. C., Wang, W., & Zhang, M. (2019). Effective and efficient dropout for deep convolutional neural networks. arXiv preprint arXiv:1904.03392. DOI:https://doi.org/10.48550/arXiv.1904.03392

Chen, G., Chen, P., Shi, Y., Hsieh, C. Y., Liao, B., & Zhang, S. (2019). Rethinking the usage of batch normalization and dropout in the training of deep neural networks. arXiv preprint arXiv:1905.05928. DOI:https://doi.org/10.48550/arXiv.1905.05928

Dureja, A., & Pahwa, P. (2019). Analysis of non-linear activation functions for classification tasks using convolutional neural networks. Recent Patents on Computer Science12(3), 156-161.DOI 10.1088/1742-6596/1732/1/012026

Inoue, H. (2019). Multi-sample dropout for accelerated training and better generalization. arXiv preprint arXiv:1905.09788. DOI:https://doi.org/10.48550/arXiv.1905.09788

Jin, X., Xu, C., Feng, J., Wei, Y., Xiong, J. and Yan, S., 2016, February. Deep learning with s-shaped rectified linear activation units. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 30, No. 1).

KILIÇARSLAN, S., Kemal, A. D. E. M., & Çelik, M. (2021). An overview of the activation functions used in deep learning algorithms. Journal of New Results in Science10(3), 75-88. DOI:https://doi.org/10.54187/jnrs.1011739

Labach, A., Salehinejad, H., & Valaee, S. (2019). Survey of dropout methods for deep neural networks. arXiv preprint arXiv:1904.13310. DOI:https://doi.org/10.48550/arXiv.1904.13310

Nero, M., Shan, C., Wang, L. C., & Sumikawa, N. (2018). Discovering Interesting Plots in Production Yield Data Analytics. arXiv preprint arXiv:1807.03920. DOI:https://doi.org/10.48550/arXiv.1807.03920

Nguyen, A., Pham, K., Ngo, D., Ngo, T., & Pham, L. (2021, August). An analysis of state-of-the-art activation functions for supervised deep neural network. In 2021 International Conference on System Science and Engineering (ICSSE) (pp. 215-220). IEEE.DOI: 10.1109/ICSSE52999.2021.9538437

Pedamonti, D. (2018). Comparison of non-linear activation functions for deep neural networks on MNIST classification task. arXiv preprint arXiv:1804.02763. DOI:https://doi.org/10.48550/arXiv.1804.02763

Sun, S., Cao, Z., Zhu, H., & Zhao, J. (2019). A survey of optimization methods from a machine learning perspective. IEEE transactions on cybernetics50(8), 3668-3681.

Yang, T., Wei, Y., Tu, Z., Zeng, H., Kinsy, M. A., Zheng, N., & Ren, P. (2018). Design space exploration of neural network activation function circuits. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems38(10), 1974-1978.DOI: 10.1109/TCAD.2018.2871198