Skip to main content

· 5 min read

Databend is a powerful cloud data warehouse. Built for elasticity and efficiency. Free and open. Also available in the cloud: https://app.databend.com .

What's New

Check out what we've done this week to make Databend even better for you.

New Features ✨

Format

  • better checking of format options (#8981)
  • add basic schema infer for parquet (#9043)

Query

  • QualifiedName support 'db.table.' and 'table.' (#8965)
  • support bulk insert without exprssion (#8966)

Storage

  • add cache layer for fuse engine (#8830)
  • add system table system.memory_statistics (#8945)
  • add optimize statistic ddl support (#8891)

Code Refactoring 🎉

Base

  • remove common macros (#8936)

Format

  • TypeDeserializer get rid of FormatSetting (#8950)

Planner

  • refactor extract or predicate (#8951)

Processors

  • optimize join by merging build data block (#8961)

New Expression

  • allow sparse column id in chunk, redo #8789 with a new approach. (#9008)

Documentation 📔

Bug Fixes 🔧

Base

  • try fix lost tracker (#8932)

Meta

  • fix share db bug, create DatabaseIdToName if need (#9006)

Mysql handler

  • fix mysql conns leak (#8894)

Processors

  • try fix update list memory leak (#9023)

Storage

  • read and write block in parallel when compact (#8921)

What's On In Databend

Stay connected with the latest news about Databend.

Infer Schema at a Glance

You usually need to create a table before loading data from a file stored on a stage or somewhere. Unfortunately, sometimes you might not know the file schema to create the table or are unable to input the schema due to its complexity.

Introducing the capability to infer schema from an existing file will make the work much easier. You will even be able to query data directly from a stage using a SELECT statement like select * from @my_stage.

INFER 's3://mybucket/data.csv' FILE_FORMAT = ( TYPE = CSV );
+-------------+---------+----------+
| COLUMN_NAME | TYPE | NULLABLE |
|-------------+---------+----------|
| CONTINENT | TEXT | True |
| COUNTRY | VARIANT | True |
+-------------+---------+----------+

We've added support for inferring the basic schema from parquet files in #9043, and we're now working on #7211 to implement select from @stage.

Learn More

What's Up Next

We're always open to cutting-edge technologies and innovative ideas. You're more than welcome to join the community and bring them to Databend.

Add Tls Support for Mysql Handler

opensrv-mysql v0.3.0 that was released recently includes support for TLS. It sounds like a good idea to introduce it to Databend.

let (is_ssl, init_params) = opensrv_mysql::AsyncMysqlIntermediary::init_before_ssl(
&mut shim,
&mut r,
&mut w,
&Some(tls_config.clone()),
)
.await
.unwrap();

opensrv_mysql::secure_run_with_options(shim, w, ops, tls_config, init_params).await

Issue 8983: Feature: tls support for mysql handler

If you're interested, click the link above to join the discussion and PR review work, or pick up a good first issue at https://link.databend.rs/i-m-feeling-lucky to get started.

Changelog

You can check the changelog of Databend Nightly for details about our latest developments.

Contributors

Thanks a lot to the contributors for their excellent work this week.

andylokandyariesdevilb41shBohuTANGdantengskydrmingdrmer
andylokandyariesdevilb41shBohuTANGdantengskydrmingdrmer
everpcpcflaneur2020leiyskylichuangmergify[bot]PsiACE
everpcpcflaneur2020leiyskylichuangmergify[bot]PsiACE
sandfleesoyeric128sundy-liTCeasonTracyZYJXuanwo
sandfleesoyeric128sundy-liTCeasonTracyZYJXuanwo
xudong963youngsofunyufan022zhang2014zhyass
xudong963youngsofunyufan022zhang2014zhyass

Connect With Us

We'd love to hear from you. Feel free to run the code and see if Databend works for you. Submit an issue with your problem if you need help.

DatafuseLabs Community is open to everyone who loves data warehouses. Please join the community and make a pull request to share your thoughts.

· 2 min read

Backgroup

now databend support hive catalog to run hive queries, this docs shows how to set up databend-hive enviroment and run hive sqls.

How to set up databend-hive cluster

hiveserver&metastore&hdfs is supposed to be pre-installed.

  1. download a databend-release with hive support, or build from source
## make sure JAVA_HOME is set
export JAVA_HOME=/path/to/java
export LD_LIBRARY_PATH=${JAVA_HOME}/lib/server:${LD_LIBRARY_PATH}
cargo build --features hive,storage-hdfs
  1. setup a databend cluster, refer to deploying-databend
  2. add hive catalog and hdfs storage to databend-query.toml
[storage]
type = "hdfs"

[storage.hdfs]
# hdfs namenode address,such as 127.0.0.1:8020
name_node = "xx"
root = ""

[catalogs.hive]
type = "hive"
# hive metastore address, such as 127.0.0.1:9083
address = "xx"

  1. run databend-query with java&hadoop enviroment
export HADOOP_HOME=xxx
export JAVA_HOME=xxx, such as /Library/Java/JavaVirtualMachines/openjdk-11.jdk/Contents/Home
export LD_LIBRARY_PATH=$JAVA_HOME/lib/server:$LD_LIBRARY_PATH

./bin/databend-query -c ./databend-query.toml > query.log 2>&1 &
  1. setup hive related settings with mysql client
set global sql_dialect = 'hive';

suggest settings:

-- for chinese users
set global timezone = 'Asia/Shanghai';
set global max_execute_time = 180000;

-- support hive nvl function
create FUNCTION nvl as (a,b) -> ifnull(a,b);
  1. query hive data using mysql client or mysql jdbc client. Note: hive tables must be reffered as hive.db.table
select * from hive.$db.$table limit 10;

Limititions

  1. only support parquet table, not support orc,txt
  2. not support struct&map&decimal hive data types
  3. only support hive select queries, not support DDL, insert, DML sqls
  4. not support hive udfs, hive functions are limited supported

hive features is now in beta stage, please feel free to report bugs&suggestions in databend issues.

· 4 min read

Many optimizations are usually required for big data analytics to "reduce the distance to data." For example, using the Bloom Filter Index, queries can be filtered to determine whether data should be fetched from backend storage:

Alt text

Why We Replaced the Bloom Filter

Most popular databases use Bloom Filters to handle equivalent queries and avoid useless data readings. Databend also used the classic Bloom Filter algorithm in the first version (databend#6639). However, we found the Bloom Filter Index required plenty of storage space which even exceeded the data storage size (Databend automatically created Bloom indexes for some data types to make it easier for users to work with). The Bloom Filter Index didn't show a significant performance improvement because it is not much different from reading data directly from storage.

The reason is that Bloom Filter does not know the cardinality of the data when it is generated. Take the Boolean type as an example, the algorithm allocates space for it without considering the cardinality (2, True or False).

As a result, the Databend community began exploring new solutions and determined a feasible solution that uses HyperLoglog to sort out distinct values before allocating space.

At the TiDB User Conference on a Saturday in September, I met XP (@drmingdrmer) and talked about the solution again with him. He came up with using Trie to solve the problem. An excellent idea, but lots of work. 

XP is a master for Trie, so I'm sure implementation would be no big deal for him. But I think some existing technologies might be able to help.

Alt text

Why Xor Filter?

A few explorations later, the Xor Filter algorithm, proposed by Daniel Lemire and his team in 2019: [Xor Filters: Faster and Smaller Than Bloom Filters, caught my attention(https://lemire.me/blog/2019/12/19/xor-filters-faster-and-smaller-than-bloom-filters/).

Alt text

I did a test (Xor Filter Bench) with the Rust version (xorfilter) and got a very positive result, so we replaced Bloom Filters with Xor Filters by databend#7860. Let's do a test and see how it performs with Xor Filters.

u64: 
xor bitmap encode:1230069 bytes, raw:8000000 bytes, ratio:0.15375863

bool:
xor bitmap encode:61 bytes, raw:1000000 bytes, ratio:0.000061

string:
xor bitmap encode:123067 bytes, raw:3000000 bytes, ratio:0.041022334

100000 records of the same key:
xor bitmap encode: 61 bytes, raw:3000000 bytes, ratio:0.000020333333

Test Environment

Databend: v0.8.122-nightly, single node
VM: 32 vCPU, 32 GiB (Cloud VM)
Object Storage: S3
Dataset: 10 billion records, 350G Raw Data, Xor Filter Index 700MB, all indexes and data are stored in object storage.
Table:

mysql> desc t10b;
+-------+-----------------+------+---------+-------+
| Field | Type | Null | Default | Extra |
+-------+-----------------+------+---------+-------+
| c1 | BIGINT UNSIGNED | NO | 0 | |
| c2 | VARCHAR | NO | | |
+-------+-----------------+------+---------+-------+

Deploying Databend

Step 1: Download installation package

wget https://github.com/datafuselabs/databend/releases/download/v0.8.122-nightly/databend-v0.8.122-nightly-x86_64-unknown-linux-musl.tar.gz 
tar zxvf databend-v0.8.122-nightly-x86_64-unknown-linux-musl.tar.gz

You can find the following content after extracting the package:

tree
.
├── bin
│ ├── databend-meta
│ ├── databend-metabench
│ ├── databend-metactl
│ └── databend-query
├── configs
│ ├── databend-meta.toml
│ └── databend-query.toml
├── readme.txt
└── scripts
├── start.sh
└── stop.sh

Step 2: Start Databend Meta

./bin/databend-meta -c configs/databend-meta.toml

Step 3:Configure Databend Query

For more information, refer to https://databend.rs/doc/deploy/deploying-databend

vim configs/databend-query.toml
... ...

[meta]
endpoints = ["127.0.0.1:9191"]
username = "root"
password = "root"
client_timeout_in_second = 60
auto_sync_interval = 60

# Storage config.
[storage]
# fs | s3 | azblob | obs
type = "s3"

# To use S3-compatible object storage, uncomment this block and set your values.
[storage.s3]
bucket = "<your-bucket-name>"
endpoint_url = "<your-s3-endpoint>"
access_key_id = "<your-key>"
secret_access_key = "<your-access-key>"

Step 4: Start Databend Query

./bin/databend-query -c configs/databend-query.toml

Step 5: Construct the test data set

mysql -uroot -h127.0.0.1 -P3307

Construct 10 billion rows of test data (time: 16 min 0.41 sec):

create table t10b as select number as c1, cast(rand() as string) as c2 from numbers(10000000000)

Run a query (no cache, all data and indexes are in object storage):

mysql> select * from t10b where  c2='0.6622377673133426';
+-----------+--------------------+
| c1 | c2 |
+-----------+--------------------+
| 937500090 | 0.6622377673133426 |
+-----------+--------------------+
1 row in set (20.57 sec)
Read 40000000 rows, 1009.75 MiB in 20.567 sec., 1.94 million rows/sec., 49.10 MiB/sec.

With the Xor Filter Index, it takes about 20 seconds for a single-node Databend to complete a point query on a scale of 10 billion rows. You can also speed up point queries by expanding a single Databend node into a cluster. Refer to https://databend.rs/doc/deploy/expanding-to-a-databend-cluster#deploying-a-new-query-node for details.

References

[1] Arxiv: Xor Filters: Faster and Smaller Than Bloom and Cuckoo Filters

[2] Daniel Lemire’s blog: Xor Filters: Faster and Smaller Than Bloom Filters

[3] Databend, Cloud Lakehouse: https://github.com/datafuselabs/databend

· 4 min read

Life is a journey, sometimes you must go back to move forward. When working with databases, you need to access the data history now and then. Time Travel is one of the most valuable features that Databend has rolled out. The time travel feature acts as a data recovery utility that enables you to restore a dropped table or get back a previous version of your data in a table. For example, when you accidentally delete a table or update some rows by mistake, you will need the help from Time Travel. This post sheds some light on what you can do with Time Travel in Databend.

First things first, you must know that not all the historical data can be restored based on the Databend retention policy. The default retention period is 24 hours, which means you can restore your historical data within 24 hours after it is deleted or outdated.

If you run SHOW TABLES HISTORY against a database, you will find the dropped tables (if any) and their drop time. The command does not list the dropped tables that have passed their retention period.

Restore a Dropped Table

When you delete a file from your computer, the file goes to the trash bin and you can restore the file by putting it back to its original folder.

In Databend, restoring a table is as easy as you restore a file from the trash bin. The UNDROP TABLE command makes a dropped table become available again with the data of the latest version. The "latest version" means that Databend recovers a table as well as the data that the table was holding when you deleted it.

Query Old Data

This is the most glorious part of the Time Travel story in Databend. When we say that you can get back a previous version of your data in a table, it does not mean that you roll back your table to an earlier point in time, it shows you the table's data at that point instead.

Databend automatically takes and saves a snapshot of your tables after each transaction that updates your table data. A version of a table's data practically refers to a snapshot that saves the data of the table when the snapshot was taken.

Databend provides a system function named FUSE_SNAPSHOT that enables you to find the saved snapshots. Each snapshot comes with a snapshot ID and a timestamp.

The saved snapshots are the behind-the-scenes heroes that make the time travel become true. So when you try to get back your history data, you need to tell Databend which version you want by the snapshot ID or the timestamp with an AT clause in the SELECT statement.

Create a New Table from Old Data

The Time Travel feature makes it possible to create an OLD table, which means you can create a table to hold and move on from a previous version of your data.

The CREATE TABLE statement can include a SNAPSHOT_LOCATION clause that allows you to specify a snapshot file that holds your old data. This command enables you to insert the data stored in the snapshot file when you create a table. Please note that the table you create must have same column definations as the data from the snapshot.

Go without Time Travel

Tables in Databend support Time Travel out-of-the-box. However, you might not need it for some cases, for example, when you're running low of your storage space or the data is big but unimportant. Databend currently does not provide a setting to switch it off, but you can CREATE TRANSIENT TABLE.

Transient tables are used to hold transitory data that does not require a data protection or recovery mechanism. Dataebend does not hold historical data for a transient table so you will not be able to query from a previous version of the transient table with the Time Travel feature, for example, the AT clause in the SELECT statement will not work for transient tables. Please note that you can still drop and undrop a transient table.

· 6 min read

JSON (JavaScript Object Notation) is a commonly used semi-structured data type. With the self-describing schema structure, JSON can hold all data types, including multi-level nested data types, such as Array, Object, etc. JSON takes advantage of high flexibility and easy dynamic expansion compared with the structured data types that must strictly follow the fields in a tabular data structure.

As data volume increases rapidly in recent years, many platforms have started to use and get the most out of semi-structured data types (such as JSON). For example, the JSON data shared by various platforms through open interfaces, and the public datasets and application logs stored in JSON format.

Databend supports structured data types, as well as JSON. This post dives deeply into the JSON data type in Databend.

Working with JSON in Databend

Databend stores semi-structured data as the VARIANT (also called JSON) data type:

CREATE TABLE test
(
id INT32,
v1 VARIANT,
v2 JSON
);

The JSON data needs to be generated by calling the parse_json or try_parse_json function. The input string must be in the standard JSON format, including Null, Boolean, Number, String, Array, and Object. In case of parsing failure due to invalid string, the parse_json function will return an error while the try_parse_json function will return a NULL value.

INSERT INTO test VALUES
(1, parse_json('{"a":{"b":1,"c":[1,2]}}'), parse_json('[["a","b"],{"k":"a"}]')),
(2, parse_json('{"a":{"b":2,"c":[3,4]}}'), parse_json('[["c","d"],{"k":"b"}]'));

SELECT * FROM test;
+----+-------------------------+-----------------------+
| id | v1 | v2 |
+----+-------------------------+-----------------------+
| 1 | {"a":{"b":1,"c":[1,2]}} | [["a","b"],{"k":"a"}] |
| 2 | {"a":{"b":2,"c":[3,4]}} | [["c","d"],{"k":"b"}] |
+----+-------------------------+-----------------------+

JSON usually holds data of Array or Object type. Due to the nested hierarchical structure, the internal elements can be accessed through JSON PATH. The syntax supports the following delimiters:

  • :: Colon can be used to obtain the elements in an object by the key.

  • .: Dot can be used to obtain the elements in an object by the key. Do NOT use a dot as the first delimiter in a statement, or Databend would consider the dot as the delimiter to separate the table name from the column name.

  • []: Brackets can be used to obtain the elements in an object by the key or the elements in an array by the index.

You can mix the three types of delimiters above.

SELECT v1:a.c, v1:a['b'], v1['a']:c, v2[0][1], v2[1].k FROM test;

+--------+-----------+-----------+----------+---------+
| v1:a.c | v1:a['b'] | v1['a']:c | v2[0][1] | v2[1].k |
+--------+-----------+-----------+----------+---------+
| [1,2] | 1 | [1,2] | "b" | "a" |
| [3,4] | 2 | [3,4] | "d" | "b" |
+--------+-----------+-----------+----------+---------+

The internal elements extracted through JSON PATH are also of JSON type, and they can be converted to basic types through the cast function or using the conversion operator ::.

SELECT cast(v1:a.c[0], int64), v1:a.b::int32, v2[0][1]::string FROM test;

+--------------------------+---------------+------------------+
| cast(v1:a.c[0] as int64) | v1:a.b::int32 | v2[0][1]::string |
+--------------------------+---------------+------------------+
| 1 | 1 | b |
| 3 | 2 | d |
+--------------------------+---------------+------------------+

Parsing JSON from GitHub

Many public datasets are stored in JSON format. We can import these data into Databend for parsing. The following introduction uses the GitHub events dataset as an example.

The GitHub events dataset (downloaded from GH Archive) uses the following JSON format:

{
"id":"23929425917",
"type":"PushEvent",
"actor":{
"id":109853386,
"login":"teeckyar-bot",
"display_login":"teeckyar-bot",
"gravatar_id":"",
"url":"https://api.github.com/users/teeckyar-bot",
"avatar_url":"https://avatars.githubusercontent.com/u/109853386?"
},
"repo":{
"id":531248561,
"name":"teeckyar/Times",
"url":"https://api.github.com/repos/teeckyar/Times"
},
"payload":{
"push_id":10982315959,
"size":1,
"distinct_size":1,
"ref":"refs/heads/main",
"head":"670e7ca4085e5faa75c8856ece0f362e56f55f09",
"before":"0a2871cb7e61ce47a6790adaf09facb6e1ef56ba",
"commits":[
{
"sha":"670e7ca4085e5faa75c8856ece0f362e56f55f09",
"author":{
"email":"support@teeckyar.ir",
"name":"teeckyar-bot"
},
"message":"1662804002 Timehash!",
"distinct":true,
"url":"https://api.github.com/repos/teeckyar/Times/commits/670e7ca4085e5faa75c8856ece0f362e56f55f09"
}
]
},
"public":true,
"created_at":"2022-09-10T10:00:00Z",
"org":{
"id":106163581,
"login":"teeckyar",
"gravatar_id":"",
"url":"https://api.github.com/orgs/teeckyar",
"avatar_url":"https://avatars.githubusercontent.com/u/106163581?"
}
}

From the data above, we can see that the actor, repo, payload, and org fields have a nested structure and can be stored as JSON. Others can be stored as basic data types. So we can create a table like this:

CREATE TABLE `github_data`
(
`id` VARCHAR,
`type` VARCHAR,
`actor` JSON,
`repo` JSON,
`payload` JSON,
`public` BOOLEAN,
`created_at` timestamp,
`org` json
);

Use the COPY INTO command to load the data:

COPY INTO github_data
FROM 'https://data.gharchive.org/2022-09-10-10.json.gz'
FILE_FORMAT = (
compression = auto
type = NDJSON
);

The following code returns the top 10 projects with the most commits:

SELECT   repo:name,
count(id)
FROM github_data
WHERE type = 'PushEvent'
GROUP BY repo:name
ORDER BY count(id) DESC
LIMIT 10;

+----------------------------------------------------------+-----------+
| repo:name | count(id) |
+----------------------------------------------------------+-----------+
| "Lombiq/Orchard" | 1384 |
| "maique/microdotblog" | 970 |
| "Vladikasik/statistic" | 738 |
| "brokjad/got_config" | 592 |
| "yanonono/booth-update" | 537 |
| "networkoperator/demo-cluster-manifests" | 433 |
| "kn469/web-clipper-bed" | 312 |
| "ufapg/jojo" | 306 |
| "bj5nj7oh/bj5nj7oh" | 291 |
| "appseed-projects2/500f32d3-8019-43ee-8f2a-a273163233fb" | 247 |
+----------------------------------------------------------+-----------+

The following code returns the top 10 users with the most forks:

SELECT   actor:login,
count(id)
FROM github_data
WHERE type='ForkEvent'
GROUP BY actor:login
ORDER BY count(id) DESC
LIMIT 10;

+-----------------------------------+-----------+
| actor:login | count(id) |
+-----------------------------------+-----------+
| "actions-marketplace-validations" | 191 |
| "alveraboquet" | 59 |
| "ajunlonglive" | 50 |
| "Shutch420" | 13 |
| "JusticeNX" | 13 |
| "RyK-eR" | 12 |
| "DroneMad" | 10 |
| "UnqulifiedEngineer" | 9 |
| "PeterZs" | 8 |
| "lgq2015" | 8 |
+-----------------------------------+-----------+

Performance Optimization

The JSON data generally is saved in plaintext format and needs to be parsed to generate the enumeration value of serde_json::Value every time the data is read. Compared to other basic data types, handling JSON data takes more parsing time and needs more memory space.

Databend has improved the read performance of JSON data using the following methods:

  • To speed up the parsing and reduce memory usage, Databend stores the JSON data as JSONB in binary format and uses the built-in j_entry structure to hold data type and offset position of each element.

  • Adding virtual columns to speed up the queries. Databend extracts the frequently queried fields and the fields of the same data type and stores them as separate virtual columns. Data will be directly read from the virtual columns when querying, which makes Databend achieve the same performance as querying other basic data types.

· 2 min read

The most impressive part of the movie Spider-Man: No Way Home is "three generations coming together. In the story, when the spider-man's friend repeats the magic words "Find Peter Parker!", we surprisingly see two old friends on the screen, the previous generations of spider-man. They travel from other universes to join forces and develop cures for the villains.

Did you know that you have a similar magic power in Databend? That is, you can always get back the previous versions of your data in a few simple steps whenever you need them. The secret is Databend automatically creates and saves snapshots with timestamps of your tables when a data updating occurs, so you can track the history of a table and see how what it looked like at a time point in the past.

The following code creates a table first, and then inserts values with three separate SQL statements:

create table spiderman(gen int, nickname varchar);

insert into spiderman values(1,'Peter-1');
insert into spiderman values(2,'Peter-2');
insert into spiderman values(3,'Peter-3');

Databend creates and saves three snapshots for the code above. Each one holds a historical version of the data in the table.

To find them, use the system function FUSE_SNAPSHOT. The function returns everything you may need to know about the saved snapshots of a table, such as the snapshot IDs, timestamps, and locations.

select snapshot_id,previous_snapshot_id, timestamp from fuse_snapshot('default','spiderman');

---
+----------------------------------+----------------------------------+----------------------------+
| snapshot_id | previous_snapshot_id | timestamp |
+----------------------------------+----------------------------------+----------------------------+
| 34b8df220edc4d8cb9e3e76118788686 | 4bb479751b7144d8aa2b53e5b281453f | 2022-08-30 01:18:53.202724 |
| 4bb479751b7144d8aa2b53e5b281453f | a2801ed9656d42c9812f2921214f0795 | 2022-08-30 01:18:35.597615 |
| a2801ed9656d42c9812f2921214f0795 | NULL | 2022-08-30 01:18:21.750208 |
+----------------------------------+----------------------------------+----------------------------+

You can now query the history data with a snapshot or timestamp by including the AT clause in the SELECT statement:

select * from spiderman at(snapshot=>'a2801ed9656d42c9812f2921214f0795');

---
+------+----------+
| gen | nickname |
+------+----------+
| 1 | Peter-1 |
+------+----------+
select * from spiderman at(timestamp=>'2022-08-30 01:18:21.750208'::timestamp);

---
+------+----------+
| gen | nickname |
+------+----------+
| 1 | Peter-1 |
+------+----------+

The "magic" explained above is part of the powerful Time Travel feature of Databend that enables you to query, back up, or restore from a specified historical version of your data. That's not all about the feature. You can do more with the snapshots to make your work easier. Join the Databend community to find out more "magic tricks".

· 7 min read

Hello, everyone! I'm Xuanwo. Today, on behalf of the Databend community, I would like to announce the official release of v0.8.

Development of Databend v0.8 started on March 28th, with 5000+ commits and 4600+ file changes. In the last 5 months, the community of 120+ contributors added 420K lines of code and removed 160K lines, equivalent to rewriting Databend once. In this release, the community made significant improvements to the SQL Planner framework and migrated all SQL statements to the new Planner, providing full JOIN and subquery support.

Click here to download Databend v0.8

Let's see what has been done in v0.8.

What's Databend?

Databend is a modern cloud data warehouse based on Rust that enables high-performance, elastic and scalable real-time data analysis and activates the data potential of users.

databend-arch

Significant improvements

New Planner: JOIN! JOIN! JOIN!

To better support complex SQL queries and improve user experience, Databend v0.8 is designed with a new Planner framework.

Databend has added JOIN and proper subquery support, driven by New Planner.

select vip_info.Client_ID, vip_info.Region 
from vip_info right
join purchase_records
on vip_info.Client_ID = purchase_records.Client_ID;

New Parser: The Best Parser!

While refactoring Planner, the databend community has implemented a new nom-based Parser that balances development efficiency with user experience.

New Parser makes it easy for developers to design/develop/test complex SQL syntax in an intuitive way

COPY
~ INTO ~ #copy_unit
~ FROM ~ #copy_unit
~ ( FILES ~ "=" ~ "(" ~ #comma_separated_list0(literal_string) ~ ")")?
~ ( PATTERN ~ "=" ~ #literal_string)?
~ ( FILE_FORMAT ~ "=" ~ #options)?
~ ( VALIDATION_MODE ~ "=" ~ #literal_string)?
~ ( SIZE_LIMIT ~ "=" ~ #literal_u64)?

It also gives the user specific and precise information about the error.

MySQL [(none)]> select number from numbers(10) as t inner join numbers(30) as t1 using(number);
ERROR 1105 (HY000): Code: 1065, displayText = error:
--> SQL:1:8
|
1 | select number from numbers(10) as t inner join numbers(30) as t1 using(number)
| ^^^^^^ column reference is ambiguous

No more worrying about not knowing what's wrong with SQL. Visit The New Databend SQL Planner for more information.

New Features

In addition to the newly designed Planner, the Databend community has implemented a number of new features.

COPY Enhancement

COPY capabilities have been greatly enhanced, and Databend can now:

  • Copy data from any supported storage service (even https!)
COPY 
INTO ontime200
FROM 'https://repo.databend.rs/dataset/stateful/ontime_2006_[200-300].csv'
FILE_FORMAT = (TYPE = 'CSV')
  • Support for copying compressed files
COPY 
INTO ontime200
FROM 's3://bucket/dataset/stateful/ontime.csv.gz'
FILE_FORMAT = (TYPE = 'CSV' COMPRESSION=AUTO)
  • UNLOAD data to any supported storage service
COPY 
INTO 'azblob://bucket/'
FROM ontime200
FILE_FORMAT = (TYPE = 'PARQUET‘)

Hive Support

Databend v0.8 designed and developed the Multi Catalog and implemented Hive Metastore support on top of it!

Databend can now interface directly to Hive and read data from HDFS.

select * from hive.default.customer_p2 order by c_nation;

Time Travel

A long time ago, the Databend community shared an implementation of the underlying FUSE Engine, From Git to Fuse Engine, where one of the most important features was the support for time travel, allowing us to query data tables at any point in time.

Starting from v0.8, this feature is now officially installed and we can now

  • Query the data table for a specified time
-- Travel to the time when the last row was inserted
select * from demo at (TIMESTAMP => '2022-06-22 08:58:54.509008'::TIMESTAMP);
+----------+
| c |
+----------+
| batch1.1 |
| batch1.2 |
| batch2.1 |
+----------+
  • Recover mistakenly deleted data tables
DROP TABLE test;

SELECT * FROM test;
ERROR 1105 (HY000): Code: 1025, displayText = Unknown table 'test'.

-- un-drop table
UNDROP TABLE test;

-- check
SELECT * FROM test;
+------+------+
| a | b |
+------+------+
| 1 | a |
+------+------+

Make business data have more security!

CTE Support

CTE (Common Table Expression) is a frequently used feature in OLAP business to define a temporary result set within the execution of a single statement, which is valid only during the query period, enabling the reuse of code segments, improving readability and better implementation of complex queries.

Databend v0.8 re-implements the CTE based on New Planner and now users can happily use WITH to declare the CTE.

WITH customers_in_quebec 
AS (SELECT customername,
city
FROM customers
WHERE province = 'Québec')
SELECT customername
FROM customers_in_quebec
WHERE city = 'Montréal'
ORDER BY customername;

In addition to these features mentioned above, Databend v0.8 also supports UDFs, adds DELETE statements, further enhances support for semi-structured data types, not to mention the numerous SQL statement improvements and new methods added. Thanks to all the contributors to the Databend community, without you all the new features mentioned here would not have been possible!

Quality Enhancement

Feature implementation is just the first part of product delivery. In Databend v0.8, the community introduced the concept of engineering quality, which evaluates the quality of Databend development in three dimensions: users, contributors, and community.

Reassuring users

In order for users to use Databend with confidence, the community has added a lot of tests over the last three months, fetching stateless test sets from YDB and others, adding stateful tests for ontime, hits and other datasets, putting SQL Logic Test online to cover all interfaces, and enabling SQL Fuzz testing to cover boundary cases.

Furthermore, the community has also gone live with Databend Perf to do continuous performance testing of Databend in production environments to catch unexpected performance regressions in time.

Make contributors comfortable

Databend is a large Rust project that has been criticized by the community for its build time.

To improve this issue and make contributors feel comfortable, the community went live with a highly configurable, specially tuned Self-hosted Runner to perform integration tests for PR and enabled several services or tools such as Mergify, mold, dev-tools, etc. to optimize the CI process.

We also initiated a new plan to restructure the Databend project, splitting the original huge query crate into multiple sub-crates to avoid, as much as possible, the situation of changing one line of code and check execution for five minutes.

Keeping the community happy

Databend is a contributor and participant in the open source community. During the development of v0.8, the Databend community established the principle of Upstream First, actively following and adopting the latest upstream releases, giving feedback on known bugs, contributing their own patches, and starting Tracking issues of upstream first violation to keep up with the latest developments.

The Databend community is actively exploring integration with other open source projects and has already implemented integration and support for third-party drivers such as Vector, sqlalchemy, clickhouse-driver, etc.

Next Steps

Databend v0.8 is a solid foundation release with a new Planner that makes it easier to implement features and make optimizations. In version 0.9, we expect improvements in the following areas.

  • Query Result Cache
  • JSON Optimization
  • Table Share
  • Processor Profiling
  • Resource Quota
  • Data Caching

Please check the Release proposal: Nightly v0.9 for the latest news~

Get going now!

Visit the release log and download the latest version to learn more, and feel free to submit feedback using Github Issues if you encounter problems!

· 5 min read

This post gives you a general idea about the TPC-H benchmark and explains how to run a TPC-H benchmark on Databend.

What's TPC-H?

TPC-H is a decision support benchmark. It consists of a suite of business-oriented ad hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions.

The TPC-H benchmark simulates a system for online sales of parts and components and defines eight tables in total. The structure, data volume, and mutual relationship of each table are shown in the figure below:

The benchmark workload consists of twenty-two decision support queries that must be executed as part of the TPC-H benchmark. Each TPC-H query asks a business question and includes the corresponding query to answer the question. More information about TPC-H can be found at https://www.tpc.org/tpch/.

Running TPC-H Benchmark on Databend

This section describes the steps to run the TPC-H benchmark on Databend and provides the related scripts.

Step 1: Generate test data with TPC-H Docker

The following code pulls a docker image and allocates the data in the path where you are running the TPC-H benchmark.

docker pull ghcr.io/databloom-ai/tpch-docker:main
docker run -it -v "$(pwd)":/data ghcr.io/databloom-ai/tpch-docker:main dbgen -vf -s 1

TPC-H comes with various data set sizes to test different scale factors. You can use the -s option to set scale factor in the command (for example, the code above sets the scale factor to 1). For more information about the command, see https://github.com/databloom-ai/TPCH-Docker.

SF (Gigabytes)Size
1Consists of the base row size (several million elements).
10Consists of the base row size x 10.
100Consists of the base row size x 100 (several hundred million elements).
1000Consists of the base row size x 1000 (several billion elements).

Step 2: Create database and tables

CREATE DATABASE IF NOT EXISTS tpch;

USE tpch;

CREATE TABLE IF NOT EXISTS nation
(
n_nationkey INTEGER NOT NULL,
n_name VARCHAR NOT NULL,
n_regionkey INT NOT NULL,
n_comment VARCHAR
);

CREATE TABLE IF NOT EXISTS region
(
r_regionkey INT NOT NULL,
r_name VARCHAR NOT NULL,
r_comment VARCHAR
);

CREATE TABLE IF NOT EXISTS part
(
p_partkey INT NOT NULL,
p_name VARCHAR NOT NULL,
p_mfgr VARCHAR NOT NULL,
p_brand VARCHAR NOT NULL,
p_type VARCHAR NOT NULL,
p_size INT NOT NULL,
p_container VARCHAR NOT NULL,
p_retailprice FLOAT NOT NULL,
p_comment VARCHAR NOT NULL
);

CREATE TABLE IF NOT EXISTS supplier
(
s_suppkey INT NOT NULL,
s_name VARCHAR NOT NULL,
s_address VARCHAR NOT NULL,
s_nationkey INT NOT NULL,
s_phone VARCHAR NOT NULL,
s_acctbal FLOAT NOT NULL,
s_comment VARCHAR NOT NULL
);

CREATE TABLE IF NOT EXISTS partsupp
(
ps_partkey INT NOT NULL,
ps_suppkey INT NOT NULL,
ps_availqty INT NOT NULL,
ps_supplycost FLOAT NOT NULL,
ps_comment VARCHAR NOT NULL
);

CREATE TABLE IF NOT EXISTS customer
(
c_custkey INT NOT NULL,
c_name VARCHAR NOT NULL,
c_address VARCHAR NOT NULL,
c_nationkey INT NOT NULL,
c_phone VARCHAR NOT NULL,
c_acctbal FLOAT NOT NULL,
c_mktsegment VARCHAR NOT NULL,
c_comment VARCHAR NOT NULL
);

CREATE TABLE IF NOT EXISTS orders
(
o_orderkey INT NOT NULL,
o_custkey INT NOT NULL,
o_orderstatus VARCHAR NOT NULL,
o_totalprice FLOAT NOT NULL,
o_orderdate DATE NOT NULL,
o_orderpriority VARCHAR NOT NULL,
o_clerk VARCHAR NOT NULL,
o_shippriority INT NOT NULL,
o_comment VARCHAR NOT NULL
);

CREATE TABLE IF NOT EXISTS lineitem
(
l_orderkey INT NOT NULL,
l_partkey INT NOT NULL,
l_suppkey INT NOT NULL,
l_linenumber INT NOT NULL,
l_quantity FLOAT NOT NULL,
l_extendedprice FLOAT NOT NULL,
l_discount FLOAT NOT NULL,
l_tax FLOAT NOT NULL,
l_returnflag VARCHAR NOT NULL,
l_linestatus VARCHAR NOT NULL,
l_shipdate DATE NOT NULL,
l_commitdate DATE NOT NULL,
l_receiptdate DATE NOT NULL,
l_shipinstruct VARCHAR NOT NULL,
l_shipmode VARCHAR NOT NULL,
l_comment VARCHAR NOT NULL
);

Step 3: Load test data to Databend

This step uses the HTTP API v1/streaming_load to load the test data to Databend. More information about this API can be found at https://databend.rs/doc/load-data/local.

The code below connects to Databend using the Root user. Please note that the root user only works when you access Databend from localhost. You will need to create new users and grant proper privileges first to connect to Databend remotely.

#!/bin/bash

for t in customer lineitem nation orders partsupp part region supplier
do
echo "$t"
curl -XPUT 'http://root:@127.0.0.1:8000/v1/streaming_load' -H 'insert_sql: insert into tpch.'$t' format CSV' -H 'format_field_delimiter:|' -H 'format_record_delimiter: \n' -F 'upload=@"./'$t'.tbl"'
done

Step 4: Run TPC-H queries

All the definitions of the TPC-H queries can be found at https://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v3.0.1.pdf. You can simply run them by copying and pasting the scripts to Databend.

The Databend team ran the TPC-H benchmark around two months ago and uploaded their queries and results to GitHub. You can find them at https://github.com/datafuselabs/databend/tree/main/tests/suites/0_stateless/13_tpch. Please note that Databend now uses the new planner by default, so you DO NOT need to enable it any more before running the queries.

· One min read

image

Data is important. If you're moving your business to the cloud to take advantage of the cloud-native features, the first thing you need to consider might be how to load your data to the cloud.

There are many ways to load data into Databend. You can use the command-line interface (CLI), API, or the mysqldump client utility, depending on where your data is stored.

image

The powerful COPY INTO commands allow you to load data from:

  • Files in an S3 bucket or a blob storage container.
  • Staged files (internal or external).
  • Files in a remote server.

If you have local data files to load, use the HTTP API v1/streaming_load to upload them to Databend.

Please note that Databend supports loading data from files in these formats:

  • CSV
  • JSON
  • Parquet

If you're coming from MySQL, Databend can also use a dump file (*.sql) created by the mysqldump client utility to load your data from MySQL.

Detailed explanations about loading data in different scenarios can be found at https://databend.rs/doc/load-data.

· 5 min read
tip

This post was originally published by Anne-Laure Civeyrac on https://mergify.com.

image

Every day, major projects use Mergify to automate their GitHub workflow. Whether they have a core team of 3 or 50 people, the one thing they all have in common is that the project leads are willing to let their developers focus on what’s really important—code. So we decided to meet with some of them to get to know more about the challenges they face and discover how Mergify helps their teams be more efficient when it comes to pull requests. This time, we sat down (virtually) with Xuanwo, an infrastructure engineer who oversees automation, distributed systems and storage for the Databend project.

image

Xuanwo

Please could you give us a brief outline of the Databend project.

Of course! So Databend is a modern elasticity and performance cloud data warehouse. It uses the latest techniques in vectorized query processing to allow people to do blazing-fast data analytics on object storage platforms like S3, Azure Blob, or MinIO. It was mainly inspired by ClickHouse and Snowflake and focuses on online analytical processing.

How many people are currently working on the project?

Right now, we have more than 100 contributors on Databend, with about 30 of them contributing continuously.

What’s your GitHub workflow on this project?

Databend is a very new project and it doesn’t have a stable release yet, so our pull request [PR] workflow is quite simple. All our contributions go through GitHub PRs. For every PR, we use GitHub Actions as the CI, where we run cargo check, cargo fmt, cargo clippy, and all our test cases. If all the checks pass, we merge the PR. And once on the main branch, the PR runs production CI, which goes through all test cases with the release build. We then release a nightly version daily, uploading our release builds to GitHub releases and the Docker Hub Registry.

image

How many people need to approve PRs?

We need two approvals for a PR to be merged.

What are the main challenges with this workflow?

Mainly the PR merge speed! Our developers are coding in Rust, which is known for not being very good with compilation speed. We typically need about 30 minutes to finish all our checks, and most of this time is consumed by rustc, the compiler for Rust. To make Rust compile faster, we set up our own self-hosted GitHub Actions runners, which have a very high performance. Thanks to this change, our PR merge time was reduced from 60 to 30 minutes.

What made the project team start using Mergify in the first place?

Before coming across Mergify, we enabled the option on GitHub that requires code to be updated before merging. But it added a lot of work for our maintainers, who had to merge the main branch repeatedly.

image

So to make their lives easier, we implemented Mergify to update the branch automatically and merge PRs after all tests have passed.

Which Mergify features do you use the most?

The automatic merge! And with the help of the Mergify team, we enabled the merge queue feature as well. So now, we can merge multiple PRs simultaneously with only one CI check. This helped us significantly reduce the waiting time for the CI.  

What is your favorite Mergify feature and why?

Oh, I love PR actions. Although I can implement the same features with GitHub Actions, I find Mergify’s PR actions more simple and exciting. For example, last week, we introduced a new requirement that every PR must be semantic—we want all PRs to contain a valid title starting with a type like “fix”, “feat”, or “refactor”.

image

Actions 

With the help of Mergify, I only needed to create some rules, such as adding corresponding labels if the PR title starts with “fix”, comment in the PR with a help message if the PR title doesn’t fulfill the requirements, and add post checks so that we can mark the PR as not mergeable.

This feature is very cool, and I would love to have it on issues too!

What has been the most significant impact of using Mergify on your team’s performance so far?

Our teams don’t need to worry about merges anymore! We can start jobs without having to wait on the PRs.  

What would be your n°1 tip for someone new to Mergify?

Don’t try to migrate all your workloads to Mergify in one go. Migrating things one by one and progressively will make your lives easier.

If you had time to contribute to the Mergify project, what would your contribution be about?

I used to contribute to Mergify regarding a small documentation typo. But I am not comfortable contributing to more complex issues because Mergify uses Python, a programming language I’m unfamiliar with. But if I could contribute to one feature, it would be about commands. Mergify only supports a small set of commands. Maybe we could define new commands in PR rules and allow users to call them with Mergify bots. This feature would be exciting to me!

· 5 min read

To support complex SQL queries and improve user experience, a large-scale refactoring work for Databend's SQL planner was started several months ago. At present, the refactoring is coming to an end. You can now modify the Session settings of Databend as follows to enable the new planner for early access:

image

Feature Highlights

A more friendly query experience

Data analysts and developers usually get various errors when coding SQL queries, and troubleshooting can be a nightmare when the queries are complex. I hate MySQL's error prompts because I have coded a query with dozens of JOIN clauses.

The new planner now includes some passes for strict semantic checking so that most errors can be intercepted during the compilation. A new error prompt algorithm was also introduced to help users locate the errors. When there is invalid syntax in your SQL query (for example, misspelled keywords or missing clauses), you will receive an error message that is more instructive.

image

If your SQL query has a semantic error (for example, you reference a column that is ambiguous, or a column does not exist at all), Databend can help you locate it.

image

You can also get a better experience when coding complex queries:

image

Support for JOIN queries and correlated subqueries

The new SQL planner supports JOIN queries (INNER JOIN, OUTER JOIN, CROSS JOIN) and correlated subqueries, and provides a Hash Join algorithm to execute JOIN queries.

For more information about how to use JOIN in Databend, go to https://databend.rs/doc/reference/sql/query-syntax/dml-join

JOIN is a very important part of the OLAP query. In traditional star and snowflake schemas, we join dimensional tables with fact tables through the JOIN query to generate the resulting report.

TPC-H Benchmark is a set of OLAP query benchmarks developed by the TPC committee to evaluate the OLAP capabilities of database systems. It contains the following eight tables:

  • Lineitem: Holds product information.

  • Orders: Holds order information.

  • Customer: Holds customer information.

  • Part: Holds parts information.

  • Supplier: Holds supplier information.

  • Partsupp: Parts-Supplier Relationship Table

  • Nation: Holds nation information.

  • Region: Holds region information.

    TPC-H includes 22 complex queries, corresponding to different business needs. The new SQL planner now supports the Q9 query that calculates the profit amount for a specified year and region using a large number of JOIN calculations:

    image

    Correlated subqueries are also an essential part of SQL for coding complex queries. The Q4 query of TPC-H shows the order delivery status of various priority levels over a period of time and uses a correlated subquery with the EXISTS clause to filter overdue orders:

    image

    Brand New Architecture

    We redesigned the process of SQL parsing for the new SQL planner to support more complex semantic analysis and SQL optimization.

After the client sends a SQL statement to the databend-query server, the components in the new SQL planner process the SQL statement in the order shown in the flowchart below before returning the query result to the client:

image

The Parser starts to parse a SQL query after receiving it. If a syntax error is found during the parsing, the error information will be directly returned to the client; If the parsing is successful, an AST (Abstract Syntax Tree) for the query will be constructed.

Parser

To provide more powerful syntax analysis functions and a better development experience, we have developed a DSL (Domain Specific Language) nom-rule based on the nom Parser combinator and rewritten SQL Parser based on this framework.

With this framework, we can easily define the syntax for a statement. Taking the CREATE TABLE statement as an example, we can use DSL to describe it as follows:

image

The elegant syntax brings more fun to the work of coding a parser. Try it out if you’re interested.

Binder

After the AST is successfully parsed by the Parser, we will semantically analyze it through Binder and generate an initial logical plan. During this process, we perform different types of semantic analysis:

  • Name resolution: Check the validity of the variables referenced in the SQL query by querying the relevant table and column object information in the Databend Catalog and bind the valid variables to their corresponding objects for subsequent analysis.

  • Type check: Check the validity of the expression according to the information obtained in the name resolution, and find a proper return type for the expression.

  • Subquery unnesting: Extract the subquery from the expression and translate it into relational algebra.

  • Grouping check: For queries with aggregate calculations, check whether non-aggregate columns are referenced.

With semantic analysis, we can eliminate most semantic errors and return them to the user during the compilation to provide the best troubleshooting experience.

Optimizer

After getting the initial logical plan, the optimizer will rewrite and optimize it and, finally, generate an executable physical plan.

The new planner introduced a set of Transformer Rule-based optimizer frameworks (Volcano/Cascades). An independent rule can be implemented by defining a relational algebra sub-tree structure pattern with related transform logic.

Take Predicate Push Down as a simple example:

image

We only need to define the pattern of the input plan:

image

And then implement a conversion function:

image

Interpreter

After the physical plan is generated by the Optimizer, we will translate it into an executable pipeline and hand it over to Databend's processor execution framework for calculation.

What's Next

Building a SQL planner from the ground up is a very challenging job, but the redesign and development let us find the most suitable architecture and functionalities for the system. In the future, we will continue to improve and consolidate the new SQL planner on these functions:

  • Cost-based Optimization (CBO)

  • Distributed query optimization

  • More optimization rules

Currently, we’re in the middle of migrating to the new SQL planner. We will release an announcement when the migration is complete (around July 2022). Stay tuned.

· 7 min read

Databend, developed with Rust, is a new open-source data warehouse architected toward the cloud. It is committed to providing fast elastic expansion capabilities and a pay-as-you-go user experience. GitHub:https://github.com/datafuselabs/databend

Introduction

This post explains the Databend base: Fuse Engine, a powerful columnar storage engine. The engine was designed by the Databend community with the following principles: Powerful performance, simple architecture, and high reliability.

Before we start, check out a challenging task that Databend completed: With the Fuse Engine deployed on AWS S3, a transaction wrote 22.89 TB of raw data in around one and a half hour.

mysql> INSERT INTO ontime_new SELECT * FROM ontime_new;
Query OK, 0 rows affected (1 hour 34 min 36.82 sec)
Read 31619274180 rows, 22.89 TB in 5675.207 sec., 5.57 million rows/sec., 4.03 GB/sec.

Meanwhile, the following conditions were met as well:

  • Distributed transactions: Multiple computing nodes can read and write the same data simultaneously (This is the first problem that an architecture that separates storage from compute must solve).
  • Snapshot isolation: Different versions of data do not affect each other so you can do Zero-Copy Cloning for tables.
  • Retrospective ability: You are allowed to switch to any version of the data, so you can recover with the Time Travel feature.
  • Data merging: A new version of data can be generated after merging.
  • Simple and robust: Data relationships are described using files, and you can recover entire data system based on these files.

From above, you will find that Fuse Engine is "Git-inspired". Before introducing the design of Fuse Engine, let's take a look at how the bottom layer of Git works.

How Git Works

Git implements data version control (including branch, commit, checkout, and merge) in a distributed environment. Based on Git semantics, it is possible to create a distributed storage engine. There are also some products built on Git-like on the market, such as Nessie - Transactional Catalog for Data Lakes and lakeFS.

To better explore the underlying working mechanism of Git, we use Git semantics to complete a series of "data" operations from the perspective of the database.

  1. Prepare a file named cloud.txt with the content:
2022/05/06, Databend, Cloud
  1. Commit the file cloud.txt to Git.
git commit -m "Add cloud.txt"
  1. Git generates a snapshot (Commit ID: 7d972c7ba9213c2a2b15422d4f31a8cbc9815f71).
git log 
commit 7d972c7ba9213c2a2b15422d4f31a8cbc9815f71 (HEAD)
Author: BohuTANG <overred.shuttler@gmail.com>
Date: Fri May 6 16:44:21 2022 +0800

Add cloud.txt
  1. Prepare another file named warehouse.txt.
2022/05/07, Databend, Warehouse
  1. Commit the file warehouse.txt to Git.
git commit -m "Add warehouse.txt"
  1. Git generates another snapshot (Commit ID: 15af34e4d16082034e1faeaddd0332b3836f1424).
commit 15af34e4d16082034e1faeaddd0332b3836f1424 (HEAD)
Author: BohuTANG <overred.shuttler@gmail.com>
Date: Fri May 6 17:41:43 2022 +0800

Add warehouse.txt

commit 7d972c7ba9213c2a2b15422d4f31a8cbc9815f71
Author: BohuTANG <overred.shuttler@gmail.com>
Date: Fri May 6 16:44:21 2022 +0800

Add cloud.txt

Git now keeps two versions of the data:

ID 15af34e4d16082034e1faeaddd0332b3836f1424,Version2
ID 7d972c7ba9213c2a2b15422d4f31a8cbc9815f71,Version1

We can switch between versions by the Commit ID, which implements the functions of Time Travel and Table Zero-Copy. How does Git make it possible in the bottom layer? It's not rocket science. Git introduces these types of object files to describe the relationship:

  • Commit: Describes tree object information
  • Tree: Describes blob object information
  • Blob: Describes file information

image

HEAD File

First, we need to know the HEAD pointer:

cat .git/HEAD
15af34e4d16082034e1faeaddd0332b3836f1424

Commit File

The Commit file records metadata related to the commit, such as the current tree and parent, as well as the committer, etc.

File path:

.git/objects/15/af34e4d16082034e1faeaddd0332b3836f1424

File content:

git cat-file -p 15af34e4d16082034e1faeaddd0332b3836f1424

tree 576c63e580846fa6df2337c1f074c8d840e0b70a
parent 7d972c7ba9213c2a2b15422d4f31a8cbc9815f71
author BohuTANG <overred.shuttler@gmail.com> 1651830103 +0800
committer BohuTANG <overred.shuttler@gmail.com> 1651830103 +0800

Add warehouse.txt

Tree File

The Tree file records all the files of the current version.

File path:

.git/objects/57/6c63e580846fa6df2337c1f074c8d840e0b70a

File content:

git cat-file -p 576c63e580846fa6df2337c1f074c8d840e0b70a

100644 blob 688de5069f9e873c7e7bd15aa67c6c33e0594dde cloud.txt
100644 blob bdea812b9602ed3c6662a2231b3f1e7b52dc1ccb warehouse.txt

Blob File

The Blob files are raw data files. You can veiw the file content using git cat-file (if you use Git to manage code, blobs are the code files).

git cat-file -p 688de5069f9e873c7e7bd15aa67c6c33e0594dde
2022/05/06, Databend, Cloud

git cat-file -p bdea812b9602ed3c6662a2231b3f1e7b52dc1ccb
2022/05/07, Databend, Warehouse

Fuse Engine

Databend's Fuse Engine was designed in a way similar to Git. It introduces three description files:

  • Snapshot: Describes segment object information.
  • Segment: Describes block object information.
  • Block: Describes parquet file information.

image

Let's repeat the operations we just did with Git in Fuse Engine.

  1. Create a table.
CREATE TABLE git(file VARCHAR, content VARCHAR);
  1. Write cloud.txt to Fuse Engine.

     INSERT INTO git VALUES('cloud.txt', '2022/05/06, Databend, Cloud');
  2. Fuse Engine generates a snapshot (Snapshot ID: 6450690b09c449939a83268c49c12bb2).

    CALL system$fuse_snapshot('default', 'git');
    *************************** 1. row ***************************
    snapshot_id: 6450690b09c449939a83268c49c12bb2
    snapshot_location: 53/133/_ss/6450690b09c449939a83268c49c12bb2_v1.json
    format_version: 1
    previous_snapshot_id: NULL
    segment_count: 1
    block_count: 1
    row_count: 1
    bytes_uncompressed: 68
    bytes_compressed: 351

  3. Write warehouse.txt to Fuse Engine.

    INSERT INTO git VALUES('warehouse.txt', '2022/05/07, Databend, Warehouse');
  4. Fuse Engine generates another snapshot (Snapshot ID efe2687fd1fc48f8b414b5df2cec1e19) that is linked to the previous one (Snapshot ID: 6450690b09c449939a83268c49c12bb2).

    CALL system$fuse_snapshot('default', 'git');
    *************************** 1. row ***************************
    snapshot_id: efe2687fd1fc48f8b414b5df2cec1e19
    snapshot_location: 53/133/_ss/efe2687fd1fc48f8b414b5df2cec1e19_v1.json
    format_version: 1
    previous_snapshot_id: 6450690b09c449939a83268c49c12bb2
    segment_count: 2
    block_count: 2
    row_count: 2
    *************************** 2. row ***************************
    snapshot_id: 6450690b09c449939a83268c49c12bb2
    snapshot_location: 53/133/_ss/6450690b09c449939a83268c49c12bb2_v1.json
    format_version: 1
    previous_snapshot_id: NULL
    segment_count: 1
    block_count: 1
    row_count: 1

    Fuse Engine now keeps two versions of the data:

    ID efe2687fd1fc48f8b414b5df2cec1e19,Version2
    ID 6450690b09c449939a83268c49c12bb2,Version1

    That's very similar to Git. Right?

Git needs a HEAD as an entry. So does Fuse Engine. Check the HEAD of Fuse Engine:

SHOW CREATE TABLE git\G;
*************************** 1. row ***************************
Table: git
Create Table: CREATE TABLE `git` (
`file` VARCHAR,
`content` VARCHAR
) ENGINE=FUSE SNAPSHOT_LOCATION='53/133/_ss/efe2687fd1fc48f8b414b5df2cec1e19_v1.json'

SNAPSHOT_LOCATION is the HEAD, which by default points to the latest snapshot efe2687fd1fc48f8b414b5df2cec1e19, then how do we switch to the snapshot data whose ID is 6450690b09c449939a83268c49c12bb2? First, check the snapshot information of the current table:

CALL system$fuse_snapshot('default', 'git')\G;
*************************** 1. row ***************************
snapshot_id: efe2687fd1fc48f8b414b5df2cec1e19
snapshot_location: 53/133/_ss/efe2687fd1fc48f8b414b5df2cec1e19_v1.json
format_version: 1
previous_snapshot_id: 6450690b09c449939a83268c49c12bb2
segment_count: 2
block_count: 2
row_count: 2
*************************** 2. row ***************************
snapshot_id: 6450690b09c449939a83268c49c12bb2
snapshot_location: 53/133/_ss/6450690b09c449939a83268c49c12bb2_v1.json
format_version: 1
previous_snapshot_id: NULL
segment_count: 1
block_count: 1
row_count: 1

Then create a new table (git_v1) and point SNAPSHOT_LOCATION to the snapshot file you need:

CREATE TABLE git_v1(`file` VARCHAR, `content` VARCHAR) SNAPSHOT_LOCATION='53/133/_ss/6450690b09c449939a83268c49c12bb2_v1.json';

SELECT * FROM git_v1;
+-----------+-----------------------------+
| file | content |
+-----------+-----------------------------+
| cloud.txt | 2022/05/06, Databend, Cloud |
+-----------+-----------------------------+

Snapshot File

Stores the segment information.

File path:

53/133/_ss/efe2687fd1fc48f8b414b5df2cec1e19_v1.json

File content:

{
"format_version":1,
"snapshot_id":"efe2687f-d1fc-48f8-b414-b5df2cec1e19",
"prev_snapshot_id":[
"6450690b-09c4-4993-9a83-268c49c12bb2",
1
],

"segments":[
[
"53/133/_sg/df56e911eb26446b9f8fac5acc65a580_v1.json"
],
[
"53/133/_sg/d0bff902b98846469480b23c2a8f93d7_v1.json"
]
]
... ...
}

Segment File

Stores block information.

File path:

 53/133/_sg/df56e911eb26446b9f8fac5acc65a580_v1.json

File content:

{
"format_version":1,
"blocks":[
{
"row_count":1,
"block_size":76,
"file_size":360,
"location":[
"53/133/_b/ba4a60013e27479e856f739aefeadfaf_v0.parquet",
0
],
"compression":"Lz4Raw"
}
]
... ...
}

Block File

The underlying data of Fuse Engine uses Parquet format, and each file is composed of multiple blocks.

Summary

In the early design period (October 2021) of Databend's Fuse Engine, the requirements were very clear, but the solution selection didn't go smoothly. At that time, the Databend community investigated a large number of Table Format solutions (such as Iceberg) on the market. The challenge was to choose between using an existing solution and building a new one. Finally, we decided to develop a simple and suitable Storage Engine that uses the Parquet standard as the storage format. Fuse Engine stores the Parquet Footer separately to reduce unnecessary Seek operations, and introduces a more flexible indexing mechanism, for example, operations such as Aggregation and Join can have their own indexes for acceleration.

Feel free to deploy Fuse Engine with your object storage to have a different experience on the big data analysis: https://databend.rs/doc/deploy

Databend on GitHub: https://github.com/datafuselabs/databend

· 3 min read

Deploying Databend on Your Laptop in Minutes

Deploying a data warehouse sounds like a big job to you? Definitely NOT. Databend can be deployed to your laptop and uses the local file system as storage. You can complete the deployment in a few minutes even if you're new to Databend. Now let's get started!

tip

Databend requires a scalabe storage (for example, object storage) to work. This blog uses local file system to provide you a hands-on experience. Never use a local file system as storage for production purposes.

STEP 1. Downloading Databend

a. Create a folder named databend in the directory /usr/local. Then create the following subfolders in the folder databend:

  • bin
  • data
  • etc
  • logs

b. Download and extract the latest Databend package for your platform from https://github.com/datafuselabs/databend/releases.

c. Move the extracted files databend-meta and databend-query to the folder /usr/local/databend/bin.

STEP 2. Deploying a Standalone databend-meta

a. Create a file named databend-meta.toml in the folder /usr/local/databend/etc with the following content:

dir = "metadata/_logs"
admin_api_address = "127.0.0.1:8101"
grpc_api_address = "127.0.0.1:9101"

[raft_config]
id = 1
single = true
raft_dir = "metadata/datas"

b. Open a terminal window and navigate to the folder /usr/local/databend/bin.

c. Run the following command to start databend-meta:

./databend-meta -c ../etc/databend-meta.toml > meta.log 2>&1 &

d. Run the following command to check if databend-meta was started successfully:

curl -I  http://127.0.0.1:8101/v1/health

STEP 3. Deploying a Standalone databend-query

a. Create a file named databend-query.toml in the folder /usr/local/databend/etc with the following content:

[log]
level = "INFO"
dir = "benddata/_logs"

[query]
# For admin RESET API.
admin_api_address = "127.0.0.1:8001"

# Metrics.
metric_api_address = "127.0.0.1:7071"

# Cluster flight RPC.
flight_api_address = "127.0.0.1:9091"

# Query MySQL Handler.
mysql_handler_host = "127.0.0.1"
mysql_handler_port = 3307


# Query HTTP Handler.
http_handler_host = "127.0.0.1"
http_handler_port = 8081

tenant_id = "tenant1"
cluster_id = "cluster1"

[meta]
address = "127.0.0.1:9101"
username = "root"
password = "root"

[storage]
# s3
type = "fs"

[storage.fs]
data_path = "benddata/datas"

b. Open a terminal window and navigate to the folder /usr/local/databend/bin.

c. Run the following command to start databend-meta:

./databend-query -c ../etc/databend-query.toml > query.log 2>&1 &

d. Run the following command to check if databend-meta was started successfully:

curl -I  http://127.0.0.1:8001/v1/health

There you go! You have successfully deployed Databend on your computer. If you have a SQL client on your computer, try the steps below to verify the deployment:

a. Create a connection to 127.0.0.1 from your SQL client. In the connection, set the port to 3307, and set the username to root.

b. Run the following commands to check if the query is successful:

CREATE TABLE t1(a int);

INSERT INTO t1 VALUES(1), (2);

SELECT * FROM t1;