[{"data":1,"prerenderedAt":-1},["ShallowReactive",2],{"metadata-\u002Fdata-science-blog\u002Fpublication\u002Fintroduction-to-convolutional-neural-networks-cnns":3,"blogArticle":4,"opportunityTypes":36},{},{"_id":5,"robots":6,"slug":7,"title":8,"blogCategory":9,"blogDomain":13,"company":14,"contact":14,"publishingDate":15,"content":16,"intro":17,"featuredImage":18,"blogTags":19,"author":20,"createdAt":30,"updatedAt":31,"__v":32,"domain":33},"5fceb7f30fd6e70240e28956","index\u002Ffollow","introduction-to-convolutional-neural-networks-cnns","Introduction to Convolutional Neural Networks CNNs",{"_id":10,"slug":11,"title":12},"5ee117077ee8d47edb7d5f0b","publication","Publication","5f7c21782175a07c208127a3",null,"2020-12-07T23:05:54.800Z","\u003Csection class=\"section section--body\">\n\u003Cdiv class=\"section-divider\">Artificial Intelligence or, in short, AI is seeing a massive advancement in narrowing the gap between machines and humans. Researchers and practitioners all around the world have been working on various aspects of AI to develop epic procedures for this purpose. Computer vision is one of the domains of this field. The main aim of computer vision is to make the machines view the world just like humans do and use the knowledge for a wide number of activities, including image recognition, video recognition, Imagery analysis and classification, recommendation systems, and many more. Massive progress is seen in computer vision by a very specific algorithm called the Convolutional Neural Network CNN. CNN has been structured based on the Deep learning method of machine learning.\u003C\u002Fdiv>\n\u003C\u002Fsection>\n\u003Csection class=\"section section--body\">\n\u003Cdiv class=\"section-divider\">\u003Chr class=\"section-divider\" \u002F>\u003C\u002Fdiv>\n\u003Cdiv class=\"section-content\">\n\u003Cdiv class=\"section-inner sectionLayout--insetColumn\">\n\u003Ch3 class=\"graf graf--h3\">\u003Cstrong class=\"markup--strong markup--h3-strong\">Introduction:\u003C\u002Fstrong>\u003C\u002Fh3>\n\u003Cp class=\"graf graf--p\">The convolutional Neural Network (CNN) works by getting an image, designating it some weightage based on the different objects of the image, and then distinguishing them from each other. CNN requires very little pre-process data as compared to other \u003Ca href=\"https:\u002F\u002Faigents.co\u002Flearn\u002Fdeep-learning\" target=\"_blank\" rel=\"noopener\">deep learning\u003C\u002Fa> algorithms. One of the main capabilities of CNN is that it applies primitive methods for training its classifiers, which makes it good enough to learn the characteristics of the target object.\u003C\u002Fp>\n\u003Cp class=\"graf graf--p\">CNN is based on analogous architecture, as found in the neurons of the human brain, specifically the Visual Cortex. Each of the neurons gives a response to a certain stimulus in a specific region of the visual area identified as the Receptive field. These collections overlap in order to contain the whole visual area.\u003C\u002Fp>\n\u003C\u002Fdiv>\n\u003C\u002Fdiv>\n\u003C\u002Fsection>\n\u003Csection class=\"section section--body\">\n\u003Cdiv class=\"section-divider\">\u003Chr class=\"section-divider\" \u002F>\u003C\u002Fdiv>\n\u003Cdiv class=\"section-content\">\n\u003Cdiv class=\"section-inner sectionLayout--insetColumn\">\n\u003Ch3 class=\"graf graf--h3\">\u003Cstrong class=\"markup--strong markup--h3-strong\">The workflow of&nbsp;CNN:\u003C\u002Fstrong>\u003C\u002Fh3>\n\u003Cp class=\"graf graf--p\">CNN algorithm is based on various modules that are structured in a specific workflow that are listed as follows:\u003C\u002Fp>\n\u003Cul class=\"postList\">\n\u003Cli class=\"graf graf--li\">Input Image\u003C\u002Fli>\n\u003Cli class=\"graf graf--li\">Convolution Layer (Kernel)\u003C\u002Fli>\n\u003Cli class=\"graf graf--li\">Pooling Layer\u003C\u002Fli>\n\u003Cli class=\"graf graf--li\">Classification &mdash; Fully Connected Layer\u003C\u002Fli>\n\u003Cli class=\"graf graf--li\">Architectures\u003C\u002Fli>\n\u003C\u002Ful>\n\u003Ch4 class=\"graf graf--h4\">\u003Cstrong class=\"markup--strong markup--h4-strong\">Input Image:\u003C\u002Fstrong>\u003C\u002Fh4>\n\u003Cp class=\"graf graf--p\">CNN takes an image as an input, distinguishes its objects based on three color planes, and identifies various color spaces. It also measures the image dimensions. In order to explain this process, we will give an example of an RGB image given below.\u003C\u002Fp>\n\u003Cfigure class=\"graf graf--figure\">\u003Cimg class=\"graf-image\" src=\"https:\u002F\u002Fcdn-images-1.medium.com\u002Fmax\u002F1600\u002F1*g6qPMZTpO2Nl9Y2dxwgvCA.png\" data-image-id=\"1*g6qPMZTpO2Nl9Y2dxwgvCA.png\" data-width=\"875\" data-height=\"468\" \u002F>\u003C\u002Ffigure>\n\u003Cp class=\"graf graf--p\">In this image, we have various colors based on the three-color plane that is Red, Green, and Blue, also known as RGB. The various color spaces are then identified in which images are found, such as RGB, CMYK, Grayscale, and many more. It can become a tedious task while measuring the image dimensions as an example if the image is perse 8k (*7680x4320*). Here comes one of the handy capabilities of CNN that it reduces the image&rsquo;s dimension to the point that it is easier to process, which also maintaining all of its features in one piece. This is done so that a better prediction is obtained. This ability is critical when designing architectures having not only better learning features but also can work on massive datasets of images.\u003C\u002Fp>\n\u003Ch4 class=\"graf graf--h4\">Convolution Layer (Kernel):\u003C\u002Fh4>\n\u003Cp class=\"graf graf--p\">The Kernel of CNN works on the basis of the following formula.\u003C\u002Fp>\n\u003Cp class=\"graf graf--p\">Image Dimensions = n1 x n2 x 1\u003Cbr \u002F>&nbsp;where n1 = height, n2 = breadth, and 1 = Number of channels such as RGB.\u003C\u002Fp>\n\u003Cp class=\"graf graf--p\">So, as an example, the formula will become I D = 5 x 5 x 1. We will explain this using the image given below.\u003C\u002Fp>\n\u003Cfigure class=\"graf graf--figure\">\u003Cimg class=\"graf-image\" src=\"https:\u002F\u002Fcdn-images-1.medium.com\u002Fmax\u002F1600\u002F1*InCZ_725NYjwU7_aC-9_mA.png\" data-image-id=\"1*InCZ_725NYjwU7_aC-9_mA.png\" data-width=\"674\" data-height=\"439\" \u002F>\u003C\u002Ffigure>\n\u003Cp class=\"graf graf--p\">In this image, the green section shows the 5 x 5 x 1 formula. The yellow box evolves from the first box till last, performing the convolutional operation on every 3x3 matrix. This operation is called Kernel (K) and work on the basis of the following binary algorithm.\u003C\u002Fp>\n\u003Cfigure class=\"graf graf--figure\">\u003Cimg class=\"graf-image\" src=\"https:\u002F\u002Fcdn-images-1.medium.com\u002Fmax\u002F1600\u002F1*AGjvZiARHmZfGLBWLffnTw.png\" data-image-id=\"1*AGjvZiARHmZfGLBWLffnTw.png\" data-width=\"380\" data-height=\"272\" \u002F>\u003C\u002Ffigure>\n\u003Cfigure class=\"graf graf--figure\">\u003Cimg class=\"graf-image\" src=\"https:\u002F\u002Fcdn-images-1.medium.com\u002Fmax\u002F1600\u002F1*fQqPJxQhHytklvCQ181IHQ.png\" data-image-id=\"1*fQqPJxQhHytklvCQ181IHQ.png\" data-width=\"326\" data-height=\"463\" \u002F>\u003C\u002Ffigure>\n\u003Cp class=\"graf graf--p\">In the above figure, the Kernel moves to the right with a defined value for &ldquo;Stride.&rdquo; Along the way, it parses the image objects until it completes the breadth. Then it hops down to the second row on the left and moves just as in the top row till it covers the whole image. The process keeps repeating until every part of the image is parsed.\u003C\u002Fp>\n\u003Cp class=\"graf graf--p\">If there are multiple channels such as found in RGB images, then the kernel contains the same depth as found in the input image. The multiplication of the matrix is implemented based on the number of Ks. The procedure is followed as in \u003Cem class=\"markup--em markup--p-em\">stack\u003C\u002Fem> format, for example, {K1, I1}, {K2, I2}, and so on. The results are generated based on the summation of bias. The result is in the form of a squeezed &ldquo;1-depth channel&rdquo; of convoluted feature output.\u003C\u002Fp>\n\u003Cp class=\"graf graf--p\">The goal of this convolution operation is to obtain all the high-level features of the image. The high-level features can include edges of the image too. This layer is not just limited to high-level features; it also performs an operation on low-level features, such as color and gradient orientation. This architecture evolves to a new level and thus includes two more types of layers. The two layers are known as Valid padding and the Same padding.\u003C\u002Fp>\n\u003Cp class=\"graf graf--p\">The objective of these layers is to reduce the dimensionality of the image that is found in the original input image and to increase dimensionality or, in some cases, to leave it unchanged, depending on the required output. The same padding is applied to convolute the image to different dimensions of the matrix, while valid padding is applied when there is no need to change the dimension of the matrix.\u003C\u002Fp>\n\u003Cfigure class=\"graf graf--figure\">\u003Cimg class=\"graf-image\" src=\"https:\u002F\u002Fcdn-images-1.medium.com\u002Fmax\u002F1600\u002F1*bC9drJd5A4CwvlI9VDMJFg.png\" data-image-id=\"1*bC9drJd5A4CwvlI9VDMJFg.png\" data-width=\"910\" data-height=\"506\" \u002F>\u003C\u002Ffigure>\n\u003Cp class=\"graf graf--p graf--empty\">&nbsp;\u003C\u002Fp>\n\u003Ch4 class=\"graf graf--h4\">\u003Cstrong class=\"markup--strong markup--h4-strong\">Pooling layer:\u003C\u002Fstrong>\u003C\u002Fh4>\n\u003Cp class=\"graf graf--p\">As identical to the recognized layer &ldquo;convolutional,&rdquo; the foremost aim of the Pooling layer is essential to decrease the spatial size of the Convolved Feature. So, in short words, it works for decreasing the required computational power for the processing of data by the method of dimensionality reduction. Moreover, it is also beneficial for the extraction of the dominant features, which are basically rotational as well as positional invariant, so the maintenance of the process effectively is needed.\u003C\u002Fp>\n\u003Ch4 class=\"graf graf--h4\">\u003Cstrong class=\"markup--strong markup--h4-strong\">Types of&nbsp;Pooling:\u003C\u002Fstrong>\u003C\u002Fh4>\n\u003Cp class=\"graf graf--p\">There are mainly two different types of Pooling which are as follows:\u003C\u002Fp>\n\u003Cp class=\"graf graf--p\">\u003Cstrong class=\"markup--strong markup--p-strong\">Max Pooling: \u003C\u002Fstrong>The Max Pooling basically provides the maximum value within the covered image by the Kernel.\u003C\u002Fp>\n\u003Cp class=\"graf graf--p\">\u003Cstrong class=\"markup--strong markup--p-strong\">Average Pooling: \u003C\u002Fstrong>The Average Pooling provides and returns the average value within the covered image by the Kernel.\u003C\u002Fp>\n\u003Cfigure class=\"graf graf--figure\">\u003Cimg class=\"graf-image\" src=\"https:\u002F\u002Fcdn-images-1.medium.com\u002Fmax\u002F1600\u002F1*ODDBelSSa1drUjCHGgPt2w.png\" data-image-id=\"1*ODDBelSSa1drUjCHGgPt2w.png\" data-width=\"596\" data-height=\"439\" \u002F>\u003C\u002Ffigure>\n\u003Cp class=\"graf graf--p\">The other functionality of Max Pooling is also noise-suppressing, as it works on discarding those activations which contain noisy activation. And on the other side, the Average Pooling simply works on the mechanism of noise-suppressing by \u003Ca href=\"https:\u002F\u002Faigents.co\u002Flearn\u002Fdimensionality-reduction\" target=\"_blank\" rel=\"noopener\">dimensionality reduction\u003C\u002Fa>. So, in short words, we can conclude that Max Pooling works more efficiently than Average Pooling.\u003C\u002Fp>\n\u003Cp class=\"graf graf--p\">The Convolutional Layer, altogether with the Pooling layer, makes the &ldquo;i-th layer&rdquo; of the Convolutional Neural Network. Entirely reliant on the image intricacies, the layer counts might be rise-up for the objective of capturing the details of the detailed level, but also needs to have more computational power. After analyzing the above-described information about the process, we can easily execute the model for understanding the features. Moreover, here we are about to get the output and then provide it as an input for the regular Neural Network for further classification reasons.\u003C\u002Fp>\n\u003Ch4 class=\"graf graf--h4\">\u003Cstrong class=\"markup--strong markup--h4-strong\">Classification: Fully Connected Layer (FC&nbsp;Layer)\u003C\u002Fstrong>\u003C\u002Fh4>\n\u003Cp class=\"graf graf--p graf--hasDropCapModel graf--hasDropCap\">\u003Cspan class=\"graf-dropCap\">T\u003C\u002Fspan>he addition of the FC layer is mostly the easiest way for the learning purpose of the non-linear combinations of the abstract level structures, as it is also revealed by the output of the convolutional layer. The FC layer provides the space for learning \u003Ca href=\"https:\u002F\u002Faigents.co\u002Flearn\u002Fnon-linear-functions\" target=\"_blank\" rel=\"noopener\">non-linear functions\u003C\u002Fa>. As now we have achieved our task to convert our image output into a specific form of Multi-layer Perceptron, now we must flatten the output image into a form of a column vector. Over the different eras of epochs, the model is basically succeeded for the distinguishing function between the dominating and low-level features.\u003C\u002Fp>\n\u003Cp class=\"graf graf--p\">Here are some impressive examples of CNN architectures:\u003C\u002Fp>\n\u003Cul class=\"postList\">\n\u003Cli class=\"graf graf--li\">AlexNet\u003C\u002Fli>\n\u003Cli class=\"graf graf--li\">GoogLeNet\u003C\u002Fli>\n\u003Cli class=\"graf graf--li\">ZFNet\u003C\u002Fli>\n\u003Cli class=\"graf graf--li\">LeNet\u003C\u002Fli>\n\u003Cli class=\"graf graf--li\">ResNet\u003C\u002Fli>\n\u003C\u002Ful>\n\u003C\u002Fdiv>\n\u003C\u002Fdiv>\n\u003C\u002Fsection>\n\u003Csection class=\"section section--body\">\n\u003Cdiv class=\"section-divider\">\u003Chr class=\"section-divider\" \u002F>\u003C\u002Fdiv>\n\u003Cdiv class=\"section-content\">\n\u003Cdiv class=\"section-inner sectionLayout--insetColumn\">\n\u003Ch3 class=\"graf graf--h3\">\u003Cstrong class=\"markup--strong markup--h3-strong\">Summary:\u003C\u002Fstrong>\u003C\u002Fh3>\n\u003Cp class=\"graf graf--p\">In this article, the Convolutional Neural Network based on Deep Learning algorithm is explained. The workflow mechanism of CNN is explained with examples. The most powerful architectures in building CNN are given at the end which can help to make powerful AI algorithms for Computer Vision.\u003C\u002Fp>\n\u003Cp class=\"graf graf--p\">If you want to connect with me, please follow me on Twitter \u003Ca href=\"https:\u002F\u002Ftwitter.com\u002Fbajcmartinez\">@bajcmartinez\u003C\u002Fa> or visit my \u003Ca href=\"https:\u002F\u002Flivecodestream.dev\u002F\">blog on computer science\u003C\u002Fa>\u003C\u002Fp>\n\u003Cp class=\"graf graf--p\">&nbsp;\u003C\u002Fp>\n\u003C\u002Fdiv>\n\u003C\u002Fdiv>\n\u003C\u002Fsection>","Take the first steps in understanding how computer vision works with CNN networks","https:\u002F\u002Faigents-files.s3.eu-west-1.amazonaws.com\u002Fproduction\u002F1607382997458_kalea-jerielle-fuBj4vkp4-g-unsplash.jpg",[],{"_id":21,"profile":22},"5f7a38aed8ed733087c855b5",{"_id":23,"email":24,"firstName":25,"lastName":26,"github":27,"linkedin":28,"twitter":29},"5f7a38aed8ed733087c855b4","bajcmartinez@gmail.com","Juan Cruz","Martinez","https:\u002F\u002Fgithub.com\u002Fbajcmartinez","https:\u002F\u002Fwww.linkedin.com\u002Fin\u002Fbajcmartinez\u002F","bajcmartinez","2020-12-07T23:17:07.108Z","2024-08-14T13:31:47.978Z",0,{"_id":13,"slug":34,"title":35},"data-analytics-and-ai","Data Analytics & AI",true]