Design and Building Javanese Script Classification in The State Museum of Sonobudoyo Yogyakarta

The Sonobudoyo State Museum is one of the state museums in Yogyakarta where stores historical objects like the Javanese script. This Javanese script presents in street names, especially in the city of Yogyakarta to represent local content for elementary, middle, and high schools. To read and understand Javanese script, people must learn it within a specified period, whereas with Latin letters is easier and faster to understand. The purpose of this paper is to design and build a Javanese script classification dataset to attract both adults, children, and parents as effective learning media. We construct the dataset by using Deep Learning with the Convolutional Neural Network (CNN). Stages of making a dataset are input data, the process of building models, and training can then recognize Javanese script images. We collect the dataset from the internet and several different people to train the computer machines. In this paper, we construct the Javanese script classification dataset to help users to detect Javanese characters. The results of this training the application of Javanese script classification can produce a certain level of recognition of Javanese script patterns in a real application.

you can use a services cloud that provides high-spec GPUs for free as advanced computing up to 12 hours or more. One of them is Google Collaboratory or Collab [4]. Based on the problem, this paper constructs the Design of Java-Based Mobile Literacy Classification Dataset in the Yogyakarta Sonobudoyo State Museum. It is to detect Javanese script patterns so that later it can be used to translate Javanese script.

Object and Basis Theory
In this experiment, we symbolize the dataset by a letter. In making the dataset, the machine learning algorithm does the training of the dataset, the test dataset, and the validation of the dataset. There are no fixed rules for choosing and determining a training set, validation set, and test set. Generally, experts will choose these three parts of the dataset randomly from an intact database with a portion of the training set of 50%, a validation set of 25%, and a test set of 25%. If the training set and validation set are grouped in the phase training while the test set in the phase testing set, then the portion of the training phase and the testing phase ranges from 75% / 25% or 80% / 20% [2].
Deep Learning is a part of Machine Learning that consists of many layers (hidden layer) and stack. That layer is method that classifies specific commands from input to produce the output [1]. One of the Deep Learning algorithm is the Convolution Neural Network. It is a network that widely used on images processing. The architecture going through the convolution layer and processed based on the specified filters. Each of these layers produces patterns from several parts of the image to do the classification process.

The Method
Convolutional Neural Network (CNN) is one of the developments of artificial neural networks inspired by human neural networks and commonly used in image data to detect and recognize an object in an image [2]. CNN consists of neurons that have weight, bias, and activation functions. CNN process flow in Figure 1 [9]. Convolution Layer is the NN part to performs convolution operations that combines linear filters to the local area. This layer first receives the image at input right on the architecture. The shape of this layer is a filter with a length(pixels), width(pixels), and full accordance with channel image the data in the input right. These three filters will shift to all parts of the picture. The shift will perform a "dot" operation between the input and the value of the filter so that it will produce an output called an activation map or feature map. The convolution process can be seen in Figure 2.   Figure 2 shows the image size is 32x32x3, the model will convolve the image in the convolution layer, the image cutting process will be carried out with a 5x5x3 filter so that the resulting image with a smaller size is 28x28x1 [9]. Figure 3 shows the convolution formula in a simple form. Figure 3 shows N is the total number of rows and columns, F is the number of rows and columns to be taken, and stride is the shift made.

b. Pooling Layer
It has many types of pooling in neural network like RunPool pooling as new way to downsize the feature map [14]. The pooling layer receives output from the convolution layer. At this layer, the size of the image data will be reduced. The principle of the pooling layer consists of filters of a specific size and stride or step then shifts to the entire area feature map. For most CNN architectures, the method pooling used is Max pooling. Max pooling divides the convolution layer output into several grids, and then each filter shift will take the most substantial value from each grid. Depending on the step, the method can propose a fraction of its original size to reduce other dimensions of the data, thereby reducing the number of parameters to the next level. Figure 4 depicts the process of the Polling layer.

Figure 4 The Polling Layer
The layer in the image above shows the output convolution layer is divided into several grids and then a 2x2 filter shift with two strides will take the largest value from each grid [10].

c. Fully Connected Layer The
A fully connected layer takes input from the output pooling layer as a feature map. The feature map is a multidimensional array to reshape and produce many n-dimensional vectors. For example, the layer consists of 500 neurons, then will be applied softmax that returns the most massive probability list for each of the 10 class labels as the final classification of the network. Figure 5 shows the processes in the fully connected layer [9].

Results and Discussion
The following results and discussion refer to the training process in figure 9.
a) The collection of Javanese script drawings is done by collecting images from different handwriting and also obtained from the internet. Then the image is resized to the same size and compressed (zipped) then uploaded to Mediafire. b) Using Transfer Learning or importing modules needed through Google Collab. c) Download images that were previously uploaded. d) Image Data Generator is used to determine the size of the image scale. Starting from the image size, batch size and share data for the process training and test. The batch size itself is a grouping of data to be sent to a neural network to speed up repetition (epoch).

Figure 9 Script Image Data Generator
From Figure 9 we can find out research using image size = 224, and batch size = 200. The results of the script can be seen in Figure 10. Journal homepage: http://ijicom.respati.ac.id e) At this stage, the system will download labels for the testing process. It obtains the output in the form of classes or letters. The label contains the data letters used in the dataset, based on Figure 11. Figure 11 Labels f) Create a basic model of MobileNetV2, the model to optimize training uses two ways, namely with Transfer Learning and Fine Tuning Transfer Learning has two methods to build a model that is the function API and the model sequential API. Model function API is one way to create a model that has more flexibility because we can quickly determine the model where layers are connected to various layers. This model is used to identify a model multi-output, directed graph. While the sequential model API is one way of making a model by allowing the model to be made layer by layer for most problems. There is a limitation of this model when creating models with multiple layers and has many inputs or outputs. Figure 12 depicts the model   2. We compile models to find out how many parameters are trained and which cannot be trained. Figure 15 illustrates the compiling result.

Figure 15 Results Compile Model
From Figure 13, it can be concluded that the model that has been through the process training with a total parameter of 2,627,316 that cannot be trained is only 395,392 and the usual in training is 2,231,924. The figure shows that the model used is getting better than the model before going through the process training.
3. Continue Train Model is the second training or improvement process from the training. We conduct the training process by tuning the epoch 50. Figure 16 depicts the results of the train.  After going through a long process, it will produce a hassle classification of the train and test as a reference to find out the accuracy of each letter. The Calculation accuracy refers to the confusion matrix test can be seen in Figure 19.  Figure 19 Confusion Matrix Test k) Convert To TF lite in this paper, we can proceed to the next stage to save the model using tf_save_mode.save and then convert the model to the TF lite compatible format for Android. l) Downloading labels and models is the last step in creating a dataset. Downloaded labels and models will be used for manufacturing applications through Android Studio. m) Implementation we construct the applications using one interface. The user only exposes the camera to the object (Javanese script). The application will automatically detect the image and categorize the letters according to the picture and the application can display the percentage level of accuracy. Figure 20 depicts an appearance in identifying Javanese characters.

Conclusion
In this paper, we construct the Javanese script classification dataset to help users to detect Javanese characters by using deep learning. It also can be used as a learning media that is more interesting and more effective. We establish the dataset by using the method Convolutional Neural Network (CNN). Stages of making a dataset are input data, the process of building models, and training to recognize Javanese script images. The experiment produces the application of Javanese script classification to provide an accurate level of recognition of Javanese script patterns. This accuracy can be a benchmark of how accurate learning is received to recognize Javanese script patterns in the real application.