A Migration Story — MySQL → MongoDB

Published in

Teknasyon Engineering

8 min readApr 25, 2022

In this article, I will try to explain the databases we used in our project (which came in the 1st place in more than five countries in its category), the new database integrations in addition to these, and the experiences we gained during these integrations.

Today, start-up projects usually begin with the most commonly used technologies and databases so that they can emerge faster and create products quickly. Because projects aim to launch as soon as possible and see the effect they will create. For this reason, the architectural infrastructure of the project, whether it is scalable when too many users join, is often not taken into account. Many projects, which are now Unicorn, started this way and later improved their infrastructure as much as possible; some of them created products for themselves and shared them with the industry as open-source, and some turned these technologies into a product and started to sell them.

What I want to point out is that our project started with the mindset of a start-up, then became the first in its sector, and was scaled as much as the platform it uses as a database allows. Since we are using a MySQL database and think that the project cannot be scaled any more than its growth targets, we searched for different databases. During this process, especially NoSQL databases and open-source projects were examined, as they can be expanded horizontally more easily.

Databases We Researched

Cassandra

Cassandra is an open-source distributed NoSQL database developed with Java by Facebook. It was transferred to Apache in 2009. Its first reason for development was to create a NoSQL database on MySQL where data can be transferred quickly and SQL queries can also run.

Why We Didn’t Choose Cassandra: Cassandra met most of our infrastructure needs. However, we did not prefer it because DevOps processes and monitoring operations require a lot of experience.

DynamoDB

Amazon DynamoDB is a cloud-based NoSQL key/value database. Developed by Amazon and used within AWS. It is similar in structure to Cassandra.

Why We Didn’t Choose DynamoDB : We didn’t choose to use DynamoDB because if we did, it would create AWS dependency, and cost management would be difficult as the data grew.

MongoDB

MongoDB is a document-based and scalable NoSQL database that was developed by MongoDB Inc in 2009 using the C++ programming language. Today, it forms the infrastructure of many big projects. It supports two different types of horizontal growth: ReplicaSet and Sharding.

We decided to use it in MongoDB because it is more suitable for our data structure. In addition, the fact that DevOps processes are easier and more manageable compared to other databases was one of the first reasons why we chose it.

In our project, we decided to keep and respond to the information about the users and the statistical data that we gave to the users in the application from MySQL to MongoDB. In terms of numbers, user data is around 5 billion documents, and statistical data is around 10 billion.

Using a migration library that we coded with Python; we moved this data to MongoDB in 2 days in real-time with an average of 200 workers working simultaneously. Once the migration finished, we started to respond through MongoDB via the API of the application. During these processes, we completed the migration in the background without closing the project.

We have decided to use Replica Set as the MongoDB growth method. In this way, we are able to manage 1 Master N Slave machine. By giving the read-out load to the slave completely, we provided only the write operation of the master server. Thus, we ensured that the master only does his own work in case of traffic.

When we need to deliver services to a new country or expect a shift in traffic, we can increase the slave machines as much as we want, or we can turn them off when not needed. Thus, it strengthens our hands in both resource and cost management.

What we gained with the migration

We are now more flexible in resource management, and we can scale up or down as much as we want and whenever we want. Thus, our concerns about the database have decreased, and we reduced the need for our DevOps and development teams to think about the database while developing. (Altought, it will never be possible to completely eradicate this need :) )
By using one-third of the resources we spent on MySQL, we are able to respond to more than four times the number of users than the time we were using MySQL. Although the number of users increased, the resources we used decreased, and this gave us great advantages both in terms of cost and server management.

Despite the Increasing Number of Users Without Increasing Resources, We Continued to Respond Without Problem

3. We have achieved a faster response time. Thus, our application opening speed and screen transitions within the application became five times faster. By providing users with a better experience, we have provided ability to our users to use the product more.

4. We got rid of the dependency on multiple databases. By transferring all the data in MySQL and ElasticSearch in our project to MongoDB, we started to give the responses to the relevant services through MongoDB without any problems. Thus, since we only do MongoDB management and monitoring, we can spare more time for other tasks of ours.

So, What Have We Learned and What Kind of Problems Have We Encountered During the Migration?

Using the Right Document Structure

MongoDB stores data as JSON due to its structure. Like RDBMSs, the records are not stored line by line but as JSON data you specify. Therefore, when creating a MongoDB collection, it will make your work much easier in the future to determine your data structure as if you were using MySQL and stick to it without getting caught up too much with the data freedom it provides.

It supports queries like sum, distinct, and avg with the MongoDB Aggregation operator. However, these inquiries under high load lead to very serious resource usage. Therefore, the use of Aggregation should be avoided as much as possible in the results given to the end-user; such needs should be planned during the database design phase.

While we were saving the data in statistics collections, we also stored the data by performing inc operations with MongoDB’s mathematical commands. Thus, we no longer needed to sum with inquiry. We are now able to obtain the data of the user we want with a single inquiry and document. This process gave us the advantage of doing the operation that we could do in 1000 rows in MySQL with a single document.

The Right Usage of Resources

MongoDB stores the data in files in the BSON format it has created itself. The created indexes are also kept separately in files in the same format, and the results are given through them.

Since each collection and index are in separate BSON files and inquiries are made from these files, it performs more reading and writing operations on the disk than MySQL. This inquiry affects the performance of the server’s disk has an IOPS restriction.

MongoDB keeps frequently used queries or data on RAM with a cache architecture it uses, and when an inquiry about these records is received, it responds to it using RAM with. Therefore, the server’s RAM with can also affect the inquiries.

Since MongoDB does all the operations on the files on the disk, the read-write features of the disk used must be good. Features here may vary depending on the project traffic and data size.

After the MongoDB migration, the biggest issue we encountered was not realizing that IOPS limits had a very serious impact on inquiry performance. Since MongoDB does a lot of reading from the disk due to its structure, using the highest possible IOPS limit alongside the disk that offers the fastest possible read and write support will seriously affect your performance.

The Right Usage of Index

As in all databases, indexes are essential in inquiry performance in MongoDB. Therefore, the indexes and index methods to be used together with the document design should be determined during design.

MongoDB has multiple index types. The most appropriate one should be selected according to the data structure.

As the number of indexes used in MongoDB increases, the inquiry performance will decrease accordingly. For this reason, it is recommended to index the most needed fields.

MongoDB allows adding indexes too, even when under load. Depending on the size of the data, it can be completed at varying times. Therefore, an index can be added later, taking traffic and data size into account. It takes a very short time to delete an existing index.

Once we released our product on the market, we analyzed our inquiries and deleted the indexes that we had thought might be needed, and we improved our performance by nearly 30%. Therefore, using as few and smooth indexes as possible in MongoDB becomes very important as data and traffic grow.

Conclusion

We think that we made the right decision by using MongoDB. now it is easier to act on the database, and we think that it is more suitable for the growth potential of the project. The more we use it, the more we love it. Of course, each database is unique and may not meet every project’s needs. Therefore, care should be taken to choose the right database according to the project’s structure.

I explained more on the subjects in this article at the Teknasyon Tech event. If you would like to watch the video:

If you would like to take a look at the presentation that I benefited from:

https://www.canva.com/design/DAEkijqrmfk/VGpr_WeKXTK4pFUvJmTokQ/view?utm_content=DAEkijqrmfk&utm_campaign=designshare&utm_medium=link&utm_source=publishpresent#3

Thank you for reading this far patiently. For questions and different situations, you can contact me at the links below.

Linkedin: https://www.linkedin.com/in/beytullah/
Twitter: https://twitter.com/beytullah