Ben Chuanlong Du's Blog

It is never too late to learn.

Popular Databases

Things on this page are fragmentary and immature notes/thoughts of the author. Please read with your own judgement!

The YouTube video How to Choose The Right Database has great advices on how to choose the right databases.

Types of Databases

  • relational
  • non-relational
  • key value database
    • document database
    • wide column database
  • graph database
  • search engine database
  • time series database

Advantages of Relational Databases - consitency - security - ease of backup and recovery

Advantages of Non-relational Databases - flexibility - scalability - cost of effectiveness

Storage Format

  • row storage
  • columnar storage

columnar storage is good for analytical operations

Comparison of Databases

Name Language Opensource/Free PACELC Advantages Disadvantages Comment
MySQL [1] SQL Opensource PC/EC the most popular opensource RDBMS
Cassandra [1] CQL (Cassandra
Query Language)
Opensource PA/EL real-time no join
HBase [1] Opensource PC/EC real-time no join
ClickHouse [2] SQL Opensource OLAP for big data Has very good performance
TiDB [3] SQL Opensource OLAP for big data good performance, support integration with Spark
Redis [4] DSL (hashmap
API-like)
Opensource Distributed in-memory cache for real-time applications Queries or joins
neo4j [5] Cypher (Graph
Query Language)
Opensource Graph applications The most popular graph database
Elasticsearch [6] DSL, SQL Opensource Out-of-the-box search engine for large documents Designed as a search engine but also popularly used as a database
TDengine [7] SQL Opensource IoT IoT, good performance

[2] ClickHouse is an open-source column-oriented database management system that allows generating analytical data reports in real time.

yugabyte-db

yugabyte-db

scylladb

Scylla is the real-time big data database that is API-compatible with Apache Cassandra and Amazon DynamoDB. Scylla embraces a shared-nothing approach that increases throughput and storage capacity to realize order-of-magnitude performance improvements and reduce hardware costs.

MongoDB

MongoDB is a document-oriented, disk-based database optimized for operational simplicity, schema-free design and very large data volumes.

Distributed In-memory Cache

A distributed in-memory cache is essentially a distributed key-value storage/database. You can think it as a hashmap over network.

Redis is the most popular in-memory cache which is implemented in C. memcached is another (not so popular) in-memory cache and is also implemented in C. pelikan is Twitter's unified cache backend which is implemented in C and Rust.

References

Comments