4889软件园 > 资讯文章 > adds分区工具(Flink + Iceberg 全场景实时数仓的建设实践)

adds分区工具(Flink + Iceberg 全场景实时数仓的建设实践)

作者：佚名来源：4889软件园时间：2023-01-29 20:26:43

adds分区工具文章列表:

1、Flink + Iceberg 全场景实时数仓的建设实践
2、纽约One Vanderbiltm大楼
3、Apache Spark Delta Lake 事务日志实现源码分析
4、770㎡现代轻奢风别墅，精致典雅
5、Fedora宣布新的分区管理器

adds分区工具(Flink + Iceberg 全场景实时数仓的建设实践)

Flink + Iceberg 全场景实时数仓的建设实践

整理｜路培杰（Flink 社区志愿者）

作者: 苏舒@腾讯

来源:微信公众号:Flink 中文社区

出处:https://mp.weixin.qq.com/s?__biz=MzU3Mzg4OTMyNQ==&mid=2247490749&idx=1&sn=7d9a7c1f4fa2a4b02458929f953bf3f1

摘要： Apache Flink 是目前大数据领域非常流行的流批统一的计算引擎，数据湖是顺应云时代发展潮流的新型技术架构，以 Iceberg、Hudi、Delta 为代表的解决方案应运而生，Iceberg 目前支持 Flink 通过 DataStream API /Table API 将数据写入 Iceberg 的表，并提供对 Apache Flink 1.11.x 的集成支持。

本文由腾讯数据平台部高级工程师苏舒分享，主要介绍腾讯大数据部门基于 Apache Flink 和 Apache Iceberg 构建实时数仓的应用实践，介绍主要包括如下几个方面：

背景及痛点

数据湖 Apache Iceberg 的介绍

Flink Iceberg 构建实时数仓

未来规划

Tips：点击文末「阅读原文」即可回顾作者分析的原版视频～

一．背景及痛点

如图 1 所示，这是当前已经助力的一些内部应用的用户，其中小程序和视频号这两款应用每天或者每个月产生的数据量都在 PB 级或者 EB 级以上。

图1

这些应用的用户在构建他们自己的数据分析平台过程中，他们往往会采用图 2 这样的一个架构，相信大家对这个架构也非常的熟悉了。

1.数据平台架构

业务方比如腾讯看点或者视频号的用户，他们通常会采集应用前端的业务打点数据以及应用服务日志之类的数据，这些数据会通过消息中间件（Kafka/RocketMQ）或者数据同步服务 (flume/nifi/dataX) 接入数仓或者实时计算引擎。

在数仓体系中会有各种各样的大数据组件，譬如 Hive/HBase/HDFS/S3，计算引擎如 MapReduce、Spark、Flink，根据不同的需求，用户会构建大数据存储和处理平台，数据在平台经过处理和分析，结果数据会保存到 MySQL、Elasticsearch 等支持快速查询的关系型、非关系型数据库中，接下来应用层就可以基于这些数据进行 BI 报表开发、用户画像，或基于 Presto 这种 OLAP 工具进行交互式查询等。

图2

2.Lambda 架构的痛点

在整个过程中我们常常会用一些离线的调度系统，定期的（T 1 或者每隔几小时）去执行一些 Spark 分析任务，做一些数据的输入、输出或是 ETL 工作。离线数据处理的整个过程中必然存在数据延迟的现象，不管是数据接入还是中间的分析，数据的延迟都是比较大的，可能是小时级也有可能是天级别的。另外一些场景中我们也常常会为了一些实时性的需求去构建一个实时处理过程，比如借助 Flink Kafka 去构建实时的流处理系统。

整体上，数仓架构中有非常多的组件，大大增加了整个架构的复杂性和运维的成本。

如下图，这是很多公司之前或者现在正在采用的 Lambda 架构，Lambda 架构将数仓分为离线层和实时层，相应的就有批处理和流处理两个相互独立的数据处理流程，同一份数据会被处理两次以上，同一套业务逻辑代码需要适配性的开发两次。Lambda 架构大家应该已经非常熟悉了，下面我就着重介绍一下我们采用 Lambda 架构在数仓建设过程中遇到的一些痛点问题。

图3

例如在实时计算一些用户相关指标的实时场景下，我们想看到当前 pv、uv 时，我们会将这些数据放到实时层去做一些计算，这些指标的值就会实时呈现出来，但同时想了解用户的一个增长趋势，需要把过去一天的数据计算出来。这样就需要通过批处理的调度任务来实现，比如凌晨两三点的时候在调度系统上起一个 Spark 调度任务把当天所有的数据重新跑一遍。

很显然在这个过程中，由于两个过程运行的时间是不一样的，跑的数据却相同，因此可能造成数据的不一致。因为某一条或几条数据的更新，需要重新跑一遍整个离线分析的链路，数据更新成本很大，同时需要维护离线和实时分析两套计算平台，整个上下两层的开发流程和运维成本其实都是非常高的。

为了解决 Lambda 架构带来的各种问题，就诞生了 Kappa 架构，这个架构大家应该也非常的熟悉。

3.Kappa 架构的痛点

我们来讲一下 Kappa 架构，如图 4，它中间其实用的是消息队列，通过用 Flink 将整个链路串联起来。Kappa 架构解决了 Lambda 架构中离线处理层和实时处理层之间由于引擎不一样，导致的运维成本和开发成本高昂的问题，但 Kappa 架构也有其痛点。

首先，在构建实时业务场景时，会用到 Kappa 去构建一个近实时的场景，但如果想对数仓中间层例如 ODS 层做一些简单的 OLAP 分析或者进一步的数据处理时，如将数据写到 DWD 层的 Kafka，则需要另外接入 Flink。同时，当需要从 DWD 层的 Kafka 把数据再导入到 Clickhouse，Elasticsearch，MySQL 或者是 Hive 里面做进一步的分析时，显然就增加了整个架构的复杂性。

其次，Kappa 架构是强烈依赖消息队列的，我们知道消息队列本身在整个链路上数据计算的准确性是严格依赖它上游数据的顺序，消息队列接的越多，发生乱序的可能性就越大。ODS 层数据一般是绝对准确的，把 ODS 层的数据发送到下一个 kafka 的时候就有可能发生乱序，DWD 层再发到 DWS 的时候可能又乱序了，这样数据不一致性就会变得很严重。

第三，Kafka 由于它是一个顺序存储的系统，顺序存储系统是没有办法直接在其上面利用 OLAP 分析的一些优化策略，例如谓词下推这类的优化策略，在顺序存储的 Kafka 上来实现是比较困难的事情。

那么有没有这样一个架构，既能够满足实时性的需求，又能够满足离线计算的要求，而且还能够减轻运维开发的成本，解决通过消息队列构建 Kappa 架构过程中遇到的一些痛点？答案是肯定的，后面的篇幅会详细论述。

图4

4.痛点总结

■ 传统 T 1 任务

海量的TB级 T 1 任务延迟导致下游数据产出时间不稳定。

任务遇到故障重试恢复代价昂贵

数据架构在处理去重和 exactly-once语义能力方面比较吃力

架构复杂，涉及多个系统协调，靠调度系统来构建任务依赖关系

■ Lambda 架构痛点

同时维护实时平台和离线平台两套引擎，运维成本高

实时离线两个平台需要维护两套框架不同但业务逻辑相同代码，开发成本高

数据有两条不同链路，容易造成数据的不一致性

数据更新成本大，需要重跑链路

■ Kappa 架构痛点

对消息队列存储要求高，消息队列的回溯能力不及离线存储

消息队列本身对数据存储有时效性，且当前无法使用 OLAP 引擎直接分析消息队列中的数据

全链路依赖消息队列的实时计算可能因为数据的时序性导致结果不正确

图5

5.实时数仓建设需求

是否存在一种存储技术，既能够支持数据高效的回溯能力，支持数据的更新，又能够实现数据的批流读写，并且还能够实现分钟级到秒级的数据接入？

这也是实时数仓建设的迫切需求（图 6）。实际上是可以通过对 Kappa 架构进行升级，以解决 Kappa 架构中遇到的一些问题，接下来主要分享当前比较火的数据湖技术--Iceberg。

图 6

二、数据湖 Apache Iceberg 的介绍

1.Iceberg 是什么

首先介绍一下什么是 Iceberg 。官网描述如下：

Apache Iceberg is an open table format for huge analytic datasets. Iceberg adds tables to Presto and Spark that use a high-performance format that works just like a SQL table.

Iceberg 的官方定义是一种表格式，可以简单理解为是基于计算层（Flink , Spark）和存储层（ORC，Parqurt，Avro）的一个中间层，用 Flink 或者 Spark 将数据写入 Iceberg ，然后再通过其他方式来读取这个表，比如 Spark，Flink，Presto 等。

图 7

2.Iceberg 的 table format 介绍

Iceberg 是为分析海量数据准备的，被定义为 table format，table format 介于计算层和存储层之间。

table format 主要用于向下管理在存储系统上的文件，向上为计算层提供一些接口。存储系统上的文件存储都会采用一定的组织形式，譬如读一张 Hive 表的时候，HDFS 文件系统会带一些 partition，数据存储格式、数据压缩格式、数据存储 HDFS 目录的信息等，这些信息都存在 Metastore 上，Metastore 就可以称之为一种文件组织格式。

一个优秀的文件组织格式，如 Iceberg ，可以更高效的支持上层的计算层访问磁盘上的文件，做一些 list、rename 或者查找等操作。

3.Iceberg 的能力总结

Iceberg 目前支持三种文件格式 parquet，Avro，ORC，如图 7，无论是 HDFS 或者 S3 上的文件，可以看到有行存也有列存，后面会详细的去介绍其作用。 I ceberg 本身具备的能力总结如下（如图 8），这些能力对于后面我们利用 Iceberg 来构建实时数仓是非常重要的。

图8

基于快照的读写分离和回溯

流批统一的写入和读取

不强绑定计算存储引擎

ACID 语义及数据多版本

表, 模式及分区的变更

4.Iceberg 的文件组织格式介绍

下图展示的是 Iceberg 的整个文件组织格式。从上往下看：

首先最上层是 snapshot 模块。Iceberg 里面的 snapshot 是一个用户可读取的基本的数据单位，也就是说用户每次读取一张表里面的所有数据，都是一个snapshot 下的数据。

其次， manifest 。一个 snapshot 下面会有多个 manifest，如图 snapshot-0 有两个 manifest，而 snapshot-1 有三个 manifest，每个 manifest 下面会管理一个至多个 DataFiles 文件。

第三， DataFiles。 manifest 文件里面存放的就是数据的元信息，我们可以打开 manifest 文件，可以看到里面其实是一行行的 datafiles 文件路径。

从图上看到，snapshot-1 包含了 snapshop-0 的数据，而 snapshot-1 这个时刻写入的数据只有 manifest2，这个能力其实就为我们后面去做增量读取提供了一个很好的支持。

图 9

5.Iceberg 读写过程介绍

■ Apache Iceberg 读写

首先，如果有一个 write 操作，在写 snapsho-1 的时候，snapshot-1 是虚线框，也就是说此时还没有发生 commit 操作。这时候对 snapshot-1 的读其实是不可读的，因为用户的读只能读到已经 commit 之后的 snapshot。发生 commit 之后才可以读。同理，会有 snapshot-2，snapshot-3。

Iceberg 提供的一个重要能力，就是读写分离能力。在对 snapshot-4 进行写的时候，其实是完全不影响对 snapshot-2 和 snapshot-3 的读。Iceberg 的这个能力对于构建实时数仓是非常重要的能力之一。

图 10

同理，读也是可以并发的，可以同时读 s1、s2、s3 的快照数据，这就提供了回溯读到 snapshot-2 或者 snapshot-3 数据的能力。Snapshot-4 写完成之后，会发生一次 commit 操作，这个时候 snapshot-4 变成了实心，此时就可以读了。另外，可以看到 current Snapshot 的指针移到 s4，也就是说默认情况下，用户对一张表的读操作，都是读 current Snapshot 指针所指向的 Snapshot，但不会影响前面的 snapshot 的读操作。

■ Apache Iceberg 增量读

接下来讲一下 Iceberg 的增量读。首先我们知道 Iceberg 的读操作只能基于已经提交完成的 snapshot-1，此时会有一个 snapshot-2，可以看到每个 snapshot 都包含前面 snapshot 的所有数据，如果每次都读全量的数据，整个链路上对计算引擎来说，读取的代价非常高。

如果只希望读到当前时刻新增的数据，这个时候其实就可以根据 Iceberg 的 snapshot 的回溯机制，仅读取 snapshot1 到 snapshot2 的增量数据，也就是紫色这块的数据可以读的。

图 11

同理 s3 也是可以只读黄色的这块区域的数据，同时也可以读 s3 到 s1 这块的增量数据，基于 Flink source 的 streaming reader 功能在内部我们已经实现这种增量读取的功能，并且已经在线上运行了。刚才讲到了一个非常重要的问题，既然 Iceberg 已经有了读写分离，并发读，增量读的功能， Iceberg 要跟 Flink 实现对接，那么就必须实现 Iceberg 的 sink。

■ 实时小文件问题

社区现在已经重构了 Flink 里面的 FlinkIcebergSink，提供了 global committee 的功能，我们的架构其实跟社区的架构是保持一致的，曲线框中的这块内容是 FlinkIcebergSink。

在有多个 IcebergStreamWriter 和一个 IcebergFileCommitter 的情况下，上游的数据写到 IcebergStreamWriter 的时候，每个 writer 里面做的事情都是去写 datafiles 文件。

图 12

当每个 writer 写完自己当前这一批 datafiles 小文件的时候，就会发送消息给 IcebergFileCommitter，告诉它可以提交了。而 IcebergFileCommitter 收到信息的时，就一次性将 datafiles 的文件提交，进行一次 commit 操作。

commit 操作本身只是对一些原始信息的修改，当数据都已经写到磁盘了，只是让其从不可见变成可见。在这个情况下， Iceberg 只需要用一个 commit 即可完成数据从不可见变成可见的过程。

■ 实时小文件合并

Flink 实时作业一般会长期在集群中运行，为了要保证数据的时效性，一般会把 Iceberg commit 操作的时间周期设成 30 秒或者是一分钟。当 Flink 作业跑一天时，如果是一分钟一次 commit，一天需要 1440 个 commit，如果 Flink 作业跑一个月commit 操作会更多。甚至 snapshot commit 的时间间隔越短，生成的 snapshot 的数量会越多。当流式作业运行后，就会生成大量的小文件。

这个问题如果不解决的话， Iceberg 在 Flink 处理引擎上的 sink 操作就不可用了。我们在内部实现了一个叫做 data compaction operator 的功能，这个 operator 是跟着 Flink sink 一起走的。当 Iceberg 的 FlinkIcebergSink 每完成一次 commit 操作的时候，它都会向下游 FileScanTaskGen 发送消息，告诉 FileScanTaskGen 已经完成了一次 commit。

图 13

FileScanTaskGen 里面会有相关的逻辑，能够根据用户的配置或者当前磁盘的特性来进行文件合并任务的生成操作。FileScanTaskGen 发送到 DataFileRewitre 的内容其实就是在 FileScanTaskGen 里面生成的需要合并的文件的列表。同理，因为合并文件是需要一定的耗时操作，所以需要将其进行异步的操作分发到不同的 task rewrite operator 中。

上面讲过的 Iceberg 是有 commit 操作，对于 rewrite 之后的文件需要有一个新的 snapshot 。这里对 Iceberg 来说，也是一个 commit 操作，所以采用一个单并发的像 commit 操作一样的事件。

整条链路下来，小文件的合并目前采用的是 commit 操作，如果 commit 操作后面阻塞了，会影响前面的写入操作，这块我们后面会持续优化。现在我们也在 Iceberg 社区开了一个 design doc 文档在推进，跟社区讨论进行合并的相关工作。

三、Flink Iceberg 构建实时数仓

1.近实时的数据接入

前面介绍了 Iceberg 既支持读写分离，又支持并发读、增量读、小文件合并，还可以支持秒级到分钟级的延迟，基于这些优势我们尝试采用 Iceberg 这些功能来构建基于 Flink 的实时全链路批流一体化的实时数仓架构。

如下图所示， Iceberg 每次的 commit 操作，都是对数据的可见性的改变，比如说让数据从不可见变成可见，在这个过程中，就可以实现近实时的数据记录。

图 14

2.实时数仓 - 数据湖分析系统

此前需要先进行数据接入，比如用 Spark 的离线调度任务去跑一些数据，拉取，抽取最后再写入到 Hive 表里面，这个过程的延时比较大。有了 Iceberg 的表结构，可以中间使用 Flink ，或者 spark streaming，完成近实时的数据接入。

基于以上功能，我们再来回顾一下前面讨论的 Kappa 架构，Kappa 架构的痛点上面已经描述过， Iceberg 既然能够作为一个优秀的表格式，既支持 Streaming reader，又可以支持 Streaming sink，是否可以考虑将 Kafka 替换成 Iceberg？

Iceberg 底层依赖的存储是像 HDFS 或 S3 这样的廉价存储，而且 Iceberg 是支持 parquet、orc、Avro 这样的列式存储。有列式存储的支持，就可以对 OLAP 分析进行基本的优化，在中间层直接进行计算。例如谓词下推最基本的 OLAP 优化策略，基于 Iceberg snapshot 的 Streaming reader 功能，可以把离线任务天级别到小时级别的延迟大大的降低，改造成一个近实时的数据湖分析系统。

图 15

在中间处理层，可以用 presto 进行一些简单的查询，因为 Iceberg 支持 Streaming read，所以在系统的中间层也可以直接接入 Flink ，直接在中间层用 Flink 做一些批处理或者流式计算的任务，把中间结果做进一步计算后输出到下游。

■ 替换 Kafka 的优劣势

总的来说， Iceberg 替换 Kafka 的优势主要包括：

实现存储层的流批统一

中间层支持 OLAP 分析

完美支持高效回溯

存储成本降低

当然，也存在一定的缺陷，如：

数据延迟从实时变成近实时

对接其他数据系统需要额外开发工作

图 16

■ 秒级分析 - 数据湖加速

由于 Iceberg 本身是将数据文件全部存储在 HDFS 上的， HDFS 读写这块对于秒级分析的场景，还是不能够完全满足我们的需求，所以接下去我们会在 Iceberg 底层支持 Alluxio 这样一个缓存，借助于缓存的能力可以实现数据湖的加速。这块的架构也在我们未来的一个规划和建设中。

图 17

3.最佳实践

■ 实时小文件合并

如图 18 所示，腾讯内部已经实现了 Iceberg 的完全 SQL 化，其实我们在 table properties 里面可以设置一些小文件合并的参数，例如 snapshot 达到多少进行一次合并，一共有多少个 snapshot 时进行合并等，这样底层就可以直接通过一条 insert 语句启动 Flink 入湖任务，整个任务就可以持续运行，后台数据的 datafiles 文件也会在后台自动完成合并的操作。

图 18

下面这张图就是 Iceberg 中数据文件和数据文件对应的 meta 文件的信息，因为现在社区开源的 IceberFlinkSink 还没有文件合并的功能，可以尝试打开一个比较小的流处理任务，然后在自己电脑上跑一下，可以看到 Flink 任务运行之后，一段时间后，对应目录的文件数就会暴涨。

图 19

利用了 Iceberg 的实时合并小文件功能之后，可以看到文件数其实是可以控制在一个比较稳定的数量。

■ Flink 实时增量读取

实现实时数据的增量读取，可以将其配置到 Iceberg 的 table properties 参数里面，并且可以指定从哪个 snapshot 开始消费。如果指定了从哪个 snapshot 消费之后，每次 Flink 任务启动，就只会读取当前最新 snapshot 里面新增的数据。

图 20

在本实例中，开启了小文件合并的功能，最后用 SQL 启动了一个 Flink sink 的入湖任务。

■ SQL Extension 管理文件

当前用户非常希望所有的任务都用 SQL 来解决，小文件合并的功能其实只适用于在线上跑的一些 Flink 任务，相较于离线任务来说，每一次 commit 周期内它所生成的文件数量或者文件大小都不会特别大。

但当用户的任务跑了比较长的时间，底层的文件可能已经成千上万个了，这个时候直接在线上用实时的任务去做合并显然是不合适的，并可能会影响到线上实时任务的时效性，我们可以通过使用 SQL extension 来处理小文件合并，或者是删除遗留的文件，或者是过期 snapshot。

我们内部其实已经实现了通过用 SQL extension 的方式来管理 Iceberg 在磁盘上的数据和数据元信息的文件，后面我们会持续的往 SQL extension 增加更多的功能，来完善 Iceberg 的可用性，提升用户体验。

图 21

四、未来规划

图 22

1. Iceberg 内核能力提升

Row-level delete 功能。在用 Iceberg 构建整个数据链路的过程中，如果有数据的更新怎么办？Iceberg 当前只支持 copy on write 的 update 的能力，copy on write 对写是有一个放大的作用，如果要真正的在整个链路上构建一个实时数据处理过程，还是需要一个高效的 merge on read 的 update 能力。这是非常重要的，后面我们也会再继续跟社区合作，腾讯内部也会去做一些实践，去完善 Row-level delete 的功能。

SQL Extension 能力完善。我们会更加完善 SQL Extension 的能力。

建立统一索引加速数据检索。 Iceberg 现在并没有统一的索引来加速数据检索，现在我们也在跟社区合作，社区也提出了一个 Bloom Filter 的索引能力，通过构建统一的索引，可以加速 iceberg 检索文件的能力。

在 Iceberg 的内核提升方面，我们主要是希望先能够把这些功能给完善。

2.平台建设

在平台建设方面，我们将尝试：

首先，自动 Schema 识别抽取建表。希望能够自动的根据前端的数据 Schema 信息，能够自动的将这个表给创建出来，更方便用户去使用整个数据入湖的一个流程。

其次，更便捷的数据元信息管理。Iceberg 现在的元信息其实都是裸的，都是直接放在 hive metastore 上的，如果用户需要查看数据元信息，其实还需要去跑 SQL，我们希望在平台化的建设中把它给继续的完善。

第三，基于 Alluxio 打造数据加速层。希望用 Alluxio 打造一个数据湖加速层功能，以方便上层更加好的去实现一个秒级分析的能力。

第四，与内部各系统打通。其实我们内部还有很多像实时离线分析的各个系统，我们也是需要将我们整个平台跟内部的各个系统之间进行一个打通串联的工作。

作者: 苏舒@腾讯

来源:微信公众号:Flink 中文社区

出处:https://mp.weixin.qq.com/s?__biz=MzU3Mzg4OTMyNQ==&mid=2247490749&idx=1&sn=7d9a7c1f4fa2a4b02458929f953bf3f1

纽约One Vanderbiltm大楼

One Vanderbiltm Building / KPF

由专筑网小R编译

来自建筑事务所的描述：首座塔楼是纽约城市分区的组成部分，并且和地区以及当地交通体系相连，One Vanderbilt大厦象征着城市的恢复力，很好地展望了中央商务区的未来发展，建筑有着一系列的公共优势，形成锥子形态，构成引人注意的天际线。

Text description provided by the architects. The first tower completed as part of New York City’s East Midtown Rezoning, and with a direct connection to its metro and regional transit system, One Vanderbilt symbolizes the city’s resilience and looks to the future of its central business district with a number of public realm benefits, carefully crafted materiality, and a tapered form that establishes a striking skyline presence.

One Vanderbilt大楼是曼哈顿中心最高的办公建筑，其高度为1401英尺（约427米），这座建筑改变了区域的市民感受，建筑师将建筑语汇进行了分层次设计，满足了社会对于高端当代办公空间的需求。该项目将获得LEED与WELL双重认证，建筑有着170万平方英尺（约15.79万平方米）的A级办公空间，其中有着宽阔的无柱空间，使用者透过大玻璃窗能够感受到惊人的城市美景，在面积为3000平方英尺（278.7平方米）的楼层中，建筑师结合了室外花园露台，另外这里还有由米其林厨师Daniel Boulud为首的世界顶级餐饮空间，让建筑的功能更加丰富。

Reaching 1,401 feet (427 meters) in height, One Vanderbilt is the tallest office tower in Midtown Manhattan, and transforms the civic experience of the Grand Central district, layering its architectural language and skillfully meeting market demands for cutting-edge, contemporary office space. Expected to achieve LEED and WELL certifications, One Vanderbilt offers 1.7 million square feet of Class-A office space, featuring column-free expanses and stunning views through floor-to-ceiling windows. A 30,000 square-foot amenity floor with outdoor garden terraces, as well as world-class dining headed by esteemed, Michelin star-rated chef Daniel Boulud, will round out the building’s offerings.

公共交通与公共区域的结合

One Vanderbilt大楼结合了私人区域和公共空间，构成了独特的功能业态。综合的地下设施能够直接通向中央区域，以及Vanderbilt大道上面积为14000平方英尺（约1300平方米）的步行广场。未来到了2022年，这座塔楼还将能够完全整合东侧道路体系，在未来的规划中，长岛铁路（LIRR）将能够服务于中央广场。建筑塔楼位于42号大街与43号大街之间，沿着Madison 与Vanderbilt大道，建筑也是交通枢纽序列的重要组成部分，这里每天都有成千上万的通勤人士。

Transit Connection and Public Realm Benefits
One Vanderbilt blends private enterprise and the public realm with its unique program. An integrated complex of below-grade conditions offers direct connections to Grand Central and an active, 14,000-square-foot pedestrian plaza on Vanderbilt Avenue. By 2022, the tower will also fully integrate the new plan for East Side Access, which extends Long Island Rail Road (LIRR) service to Grand Central. Filling an entire city block between 42nd and 43rd Streets along Madison and Vanderbilt Avenues, the tower is part of the spatial sequence of the terminal and a doorstep to the city, greeting thousands of commuters daily.

在项目设计的前期，设计团队构思了多种不同的体量方案，大大提升贯穿整个公共空间的行人流量。设计团队构思了评估工具，通过数据分析来达到目的，最终构成了塔楼的整体设计。设计方案很好地考虑了流线，在让规模更大的前提下，允许更充足的阳光能够来到街道楼层。

Early in the design phase, the team engaged KPFui to study different massing options to enhance pedestrian flow throughout the public spaces in and around the building. The team created custom evaluation tools and used data analytics to reconcile competing objectives and facilitate the design of the tower. The resulting design prioritizes movement and allows more daylight to street level than the building on site previously, despite its larger size.

室内外空间的材料

为了呼应建筑旁侧的中央车站，One Vanderbilt大楼的设计团队应用了一种与其类似的砖石材料以及Gustavino砖，这应用于建筑的大堂天花板和凹槽拱肩之中，在玻璃幕墙中构成了自然的线条和明亮的肌理，这与中央车站的色调相呼应。

Materiality – Interior and Exterior
In acknowledgement of the building’s historic neighbor, Grand Central Terminal, the One Vanderbilt design team chose terracotta – an organic material akin to Grand Central’s masonry construction and famed Gustavino tiles – for the building’s lobby ceiling and fluted spandrels, which line the rising glass fa?ade with natural, luminous texture and echo the color palette of the nearby station.

设计团队与Christine Jetten工作室进行了长达5年的密切合作，设计了有着自然背景但又现代感十足的表面形态，陶土在不同的批次中有着不同的表现形式，那么这就给One Vanderbilt大楼的整体应用带来了很大的挑战。设计团队和波士顿Valley Terra Cotta共同合作，通过一系列的模型在真实环境下进行测试，这些模型有着多种形态和表面，最终构成了与中央车站以及曼哈顿历史建筑的温暖色调相协调的设计策略，One Vanderbilt大楼的陶土面板应用在这座超高层建筑之中，看起来十分温和，呈现出明亮的珍珠色调，就整体而言，整个空间有着视觉上的统一感，这也让One Vanderbilt大楼成为了纽约的城市新地标。

Over the course of five years, the design team collaborated closely with Studio Christine Jetten to create a glaze that is contextual in nature yet modern enough to stand on its own. As a live material, terracotta exhibits variations in each batch, posing a significant challenge in determining an appropriate finish for its use on One Vanderbilt. Working with Boston Valley Terra Cotta, a series of mockups were field tested in real world conditions, which combined a number of shapes and glazes to create a design sympathetic to the warm tones of the train hall and other historic Manhattan buildings. Rising the full height of the supertall building, One Vanderbilt’s terracotta panels are shaped as gentle scoops and rendered in a luminescent pearl tone. Taken as a whole, they present a visual uniformity that establishes One Vanderbilt as a new landmark in New York City.

One Vanderbilt大楼大堂有着由KPF设计的大型装置，它由多种青铜元素而构成，并且呈星星状排列，装置悬挂在一系列集成高压电缆上，每个元素进一步深化了塔楼的材质概念，它们有着手工肌理，并且经过了手工抛光，还有着一定深度的边缘，每个部分都有着独特的形态、位置，以及旋转角度，在灯光照射下，整体形态有机且多样。

The focal point of One Vanderbilt’s lobby is a large-scale KPF-designed installation. Comprised of a variety of bronze elements arranged in a starburst-like spread, the instillation is suspended on a series of integrated high-tension cables. Furthering the concept of materiality in the tower, each element is hand-textured, hand-polished, and features chamfered edges for additional depth. These pieces each have a unique shape, position, and rotation and are carefully lit to highlight their organic and varied effect.

体量与天际线

One Vanderbilt大楼延续了纽约城市标志性建筑的多层次建筑语汇，与克莱斯勒大厦以及帝国大厦共同组成了城市的天际线。就形式而言，One Vanderbilt大楼为向上逐步变细的体量构成，构成优雅的形态，并且和周围的建筑相互呼应，在塔楼的底部，一系列切割的斜角构成了同样中央广场的视觉引导，各个元素都很好地展示了Vanderbilt角落，而这个角度近一个世纪都被遮挡了起来。

Massing and Skyline Presence
Following the layered architectural language of neighboring New York City icons, One Vanderbilt joins the Chrysler Building and Empire State Building to define the city’s renowned skyline. Formally, One Vanderbilt’s massing comprises four interlocking and tapering volumes that spiral toward the sky, an elegant shape in sympathetic proportion to these iconic neighbors. At the tower’s base, a series of angled cuts organize a visual procession to Grand Central. They reveal the Vanderbilt corner of the terminal’s magnificent cornice – a view that has been obstructed for nearly a century.

One Vanderbilt大楼设计团队评论One Vanderbilt大楼

KPF总裁也是项目的主要设计负责人James von Klemperer说：“One Vanderbilt大楼让人联想起了纽约高层建筑的黄金时代。建筑体量呈现为长方体锥形，其顶部连接着帝国大厦和克莱斯勒大厦，同时，我们给高层建筑带来了全新的社会和环境特质，新建筑在空间和功能上都和中央车站连接在一起，因此也在地面层构成了视觉通廊，建立了主要的公共广场，让人们能够直接从大堂进入到车站。”

One Vanderbilt Design Team Comments One Vanderbilt
“The One Vanderbilt tower recalls the golden age of New York high rise architecture,” says KPF President and Design Principal James von Klemperer. “As a rectangular plan tapered point tower, its prominent top joins the Empire State and Chrysler buildings on the skyline. At the same time, the design gives the high rise a new relevance of social and environmental purpose. The new building connects both spatially and programmatically to Grand Central Terminal. It opens up a visual corridor at the ground plane and establishes a major public plaza, while providing direct access to the station from its lobby.”

von Klemperer继续说：“我们很荣幸能够设计这样一座商业摩天大楼，支持如今可持续建设，也丰富了公共区域的重要功能，总体而言，这个项目能够很好地推动地区的发展，引领着曼哈顿历史CBD的复兴。”

von Klemperer adds, “we’re very happy that we’ve been able to create a commercial skyscraper that supports today’s critical agendas of building sustainably and enriching the public realm. Overall, the project has already proven to be a boost for East Midtown, leading the way for a progressive rejuvenation of Manhattan’s historical CBD.”

KPF设计负责人Jeffrey Kenoff说道：“建筑的材料强调了曼哈顿的历史，这些细部设计的真实性和材质不仅仅对于塔楼自身来说非常重要，同时也关系到周边的建筑，比如中央车站和克莱斯勒大厦，其中有定制的釉面陶土立面，以及青铜裙房框架，除此之外，还有主要大堂的青铜‘艺术墙体’装置和桌子，这代表了到达的意思。”

“The materials of the building reinforce a Manhattan DNA,” says Jeffrey Kenoff, KPF Design Principal. “The authenticity and quality of these details are not only critical to their relationship within the tower itself, but also to the neighboring buildings including Grand Central and the Chrysler Building. This includes the custom glazed terra-cotta facade and soffits and the bronze podium framing, as well as the main lobby’s bronze “art wall” installation and hammered desk marking the arrival.”

KPF管理负责人Dominic Dunn说道：“这是地区的重要枢纽，One Vanderbilt大楼给人们的通勤带来了更加便利的通道，同时也给未来入驻中央广场的各种交通设施构成了直接的路径。建筑独特的形式强调了纽约的天际线，而我们的设计也构成了地面流线的体验，欢迎成千上万通勤人士的到来，这里也成为密集的城市、繁忙的交通的标志。”

“As a new hub of Midtown, One Vanderbilt will facilitate GCT commuting patterns by providing additional direct access to all levels of below grade transit that feed or will feed into Grand Central in the future,” says Dominic Dunn, KPF Managing Principal. “As its unique form enhances the experience of the New York skyline, so too does KPF’s design craft a ground plane experience that welcomes thousands of commuters, becoming a hallmark of transit-oriented design for our dense, bustling city.”

KPF技术负责人Andrew Cleary说道：“控制项目的进度和One Vanderbilt大楼的整体交付大概是很大的挑战了，这个项目非常复杂，需要按照时间节点完成各个部分，这很明确地证明了设计团队和施工团队的密切合作关系。”

“Maintaining the fast-tracked schedule to design and deliver One Vanderbilt was perhaps one of the biggest challenges,” says Andrew Cleary, KPF Technical Director. “The fact that a project of this complexity has repeatedly achieved all the major construction milestones on time is a clear testament to the tight collaboration that the design and construction teams forged from the outset of the design process.”

KPF高级设计师Darina Zlateva说：“我认为One Vanderbilt大楼是一座人文主义摩天大楼，它有着细致的细部设计，这也分解了建筑的整体规模，给到人们适当的尺度感。锥形体量让光线和空气能够直接来到街道，在拱肩部分，倾斜的陶土片将人们的视线引导向天空，而在大堂中，这里有专门设计的悬挂青铜艺术作品，这让人联想起城市的韵律和节奏，最后，顶部的采光结构重新表达了克莱斯勒标志性的对角线，从城市的各个方位，人们都能够欣赏到它。”

“I’ve always thought of One Vanderbilt as the humanist skyscraper,” says Darina Zlateva, KPF Senior Designer. “The details throughout break down the scale of the building for the human experience and to the delight of the human eye. The tapered massing allows more light and air down to the street. In the spandrel, diagonally oriented terra cotta pieces lift your eye up to the sky. In the lobby, a suspended bronze art piece that we designed specifically for the space recalls the movement and rhythm of our beloved city. Finally, the lit structural tracery of the crown is a reinterpretation of the iconic diagonals in the Chrysler Building, now experienced volumetrically from all directions in the city.”

来自KPF的城市塑造经验

One Vanderbilt大楼也成为了KPF又一具有代表作的纽约项目，其代表项目还有正在进行的作品，该作品位于哈德逊广场，其中包括总体规划以及多个建筑的设计。而One Vanderbilt大楼也加入了KPF在曼哈顿的定位项目，例如麦迪逊达到1号，这是与SL Green 和Hines的另一次合作，还有麦迪逊大道390号，这个项目通过一定的设计重新把各个功能空间分布在8个全新的竖向楼层中，另外还有哈德逊公共空间，这个项目是在一个翻新的老仓库上进行17层的扩建，为使用者带来最为先进的办公场所。这些项目都很好地表达了KPF对于城市设计的卓越策略，同时又能够把建筑和当地的基础设施完美地结合在一起。

Building on KPF’s City-Shaping Experience
One Vanderbilt joins KPF’s portfolio of impactful New York projects, including the firm’s ongoing work at Hudson Yards, which comprises the design of its master plan and numerous buildings – 10, 20, 30, and 55 Hudson Yards, as well as the newly-opened outdoor observation deck Edge. It also joins KPF’s ongoing repositioning work in Manhattan, such as One Madison Avenue – another collaboration with SL Green and Hines – as well as 390 Madison Avenue, for which a surgical re-massing redistributes existing square footage in the form of eight new vertical stories, and Hudson Commons, which adds 17 stories above a renovated former warehouse to create state-of-the-art office space for tech tenants. Together, these projects demonstrate the firm’s penchant for urban design and thoughtful integration of architecture with local infrastructure and zoning conditions.

建筑设计：KPF
类型：办公楼/摩天大楼
面积：160722 m2
时间：2020年
摄影：Raimund Koch, Liane Curtis, Donna Dotan, Sam Morgan, Peter Walker, Atchain
项目主席：A. Eugene Kohn
主创建筑师：James von Klemperer
管理负责人：Dominic Dunn
设计负责人：Jeffrey Kenoff
执行经理：Andrew Cleary
高级设计师：Darina Zlateva
项目负责人：Nicole McGlinn-Morrison

Apache Spark Delta Lake 事务日志实现源码分析

我们已经在这篇文章详细介绍了 Apache Spark Delta Lake 的事务日志是什么、主要用途以及如何工作的。那篇文章已经可以很好地给大家介绍 Delta Lake 的内部工作原理，原子性保证，本文为了学习的目的，带领大家从源码级别来看看 Delta Lake 事务日志的实现。在看本文时，强烈建议先看一下《深入理解 Apache Spark Delta Lake 的事务日志》文章。

Delta Lake 更新数据事务实现

Delta Lake 里面所有对表数据的更新（插入数据、更新数据、删除数据）都需要进行下面这些步骤，其主要目的是把删除哪些文件、新增哪些文件等记录写入到事务日志里面，也就是 _delta_log 目录下的 json 文件，通过这个实现 Delta Lake 的 ACID 以及时间旅行。下面我们进入事务日志提交的切入口 org.apache.spark.sql.delta.OptimisticTransaction#commit，持久化事务操作日志都是需要调用这个函数进行的。commit 函数实现如下：

def commit(actions: Seq[Action], op: DeltaOperations.Operation): Long = recordDeltaOperation( deltaLog, "delta.commit") { val version = try { // 事务日志提交之前需要先做一些工作，比如如果更新操作是第一次进行的，那么需要初始化 Protocol， // 还需要将用户对 Delta Lake 表的设置持久化到事务日志里面 var finalActions = prepareCommit(actions, op) // 如果这次更新操作需要删除之前的文件，那么 isBlindAppend 为 false，否则为 true val isBlindAppend = { val onlyAddFiles = finalActions.collect { case f: FileAction => f }.forall(_.isInstanceOf[AddFile]) onlyAddFiles && !dependsOnFiles } // 如果 commitInfo.enabled 参数设置为 true，那么还需要把 commitInfo 记录到事务日志里面 if (spark.sessionState.conf.getConf(DeltaSQLConf.DELTA_COMMIT_INFO_ENABLED)) { commitInfo = CommitInfo( clock.getTimeMillis(), op.name, op.jsonEncodedValues, Map.empty, Some(readVersion).filter(_ >= 0), None, Some(isBlindAppend)) finalActions = commitInfo : finalActions } // 真正写事务日志，如果发生版本冲突会重试直到事务日志写成功 val commitVersion = doCommit(snapshot.version 1, finalActions, 0) logInfo(s"Committed delta #$commitVersion to ${deltaLog.logPath}") // 对事务日志进行 checkpoint 操作 postCommit(commitVersion, finalActions) commitVersion } catch { case e: DeltaConcurrentModificationException => recordDeltaEvent(deltaLog, "delta.commit.conflict." e.conflictType) throw e case NonFatal(e) => recordDeltaEvent( deltaLog, "delta.commit.failure", data = Map("exception" -> Utils.exceptionString(e))) throw e } version}

我们先从这个函数的两个参数开始介绍。

_actions: Seq[Action]_：Delta Lake 表更新操作产生的新文件（AddFile）和需要删除文件的列表(RemoveFile)。如果是 Structured Streaming 作业，还会记录 SetTransaction 记录，里面会存储作业的 query id（sql.streaming.queryId）、batchId 以及当前时间。这个就是我们需要持久化到事务日志里面的数据。

_op: DeltaOperations.Operation_：Delta 操作类型，比如 WRITE、STREAMING UPDATE、DELETE、MERGE 以及 UPDATE 等。

在 commit 函数里面主要分为三步：prepareCommit、doCommit 以及 postCommit。prepareCommit 的实现如下：

protected def prepareCommit( actions: Seq[Action], op: DeltaOperations.Operation): Seq[Action] = { assert(!committed, "Transaction already committed.") // 如果我们更新了表的 Metadata 信息，那么需要将其写入到事务日志里面 var finalActions = newMetadata.toSeq actions val metadataChanges = finalActions.collect { case m: Metadata => m } assert( metadataChanges.length <= 1, "Cannot change the metadata more than once in a transaction.") metadataChanges.foreach(m => verifyNewMetadata(m)) // 首次提交事务日志，那么会确保 _delta_log 目录要存在， // 然后检查 finalActions 里面是否有 Protocol，没有的话需要初始化协议版本 if (snapshot.version == -1) { deltaLog.ensureLogDirectoryExist() if (!finalActions.exists(_.isInstanceOf[Protocol])) { finalActions = Protocol() : finalActions } } finalActions = finalActions.map { // 第一次提交，并且是 Metadata那么会将 Delta Lake 的配置信息加入到 Metadata 里面 case m: Metadata if snapshot.version == -1 => val updatedConf = DeltaConfigs.mergeGlobalConfigs( spark.sessionState.conf, m.configuration, Protocol()) m.copy(configuration = updatedConf) case other => other } deltaLog.protocolWrite( snapshot.protocol, logUpgradeMessage = !actions.headOption.exists(_.isInstanceOf[Protocol])) // 如果 actions 里面有删除的文件，那么需要检查 Delta Lake 是否支持删除 val removes = actions.collect { case r: RemoveFile => r } if (removes.exists(_.dataChange)) deltaLog.assertRemovable() finalActions}

prepareCommit 里面做的事情比较简单，主要对事务日志进行补全等操作。具体为

、由于 Delta Lake 表允许对已经存在的表模式进行修改，比如添加了新列，或者覆盖原有表的模式等。那么这时候我们需要将新的 Metadata 写入到事务日志里面。Metadata 里面存储了表的 schema、分区列、表的配置、表的创建时间。注意，除了表的 schema 和分区字段可以在后面修改，其他的信息都不可以修改的。

、如果是首次提交事务日志，那么先检查表的 _delta_log 目录是否存在，不存在则创建。然后检查是否设置了协议的版本，如果没有设置，则使用默认的协议版本，默认的协议版本中 readerVersion = 1，writerVersion = 2；

、如果是第一次提交，并且是 Metadata ，那么会将 Delta Lake 的配置信息加入到 Metadata 里面。Delta Lake 表的配置信息主要是在 org.apache.spark.sql.delta.sources.DeltaSQLConf 类里面定义的，比如我们可以在创建 Delta Lake 表的时候指定多久做一次 Checkpoint。

、由于我们可以通过 spark.databricks.delta.properties.defaults.appendOnly 参数将表设置为仅允许追加，所以如果当 actions 里面存在 RemoveFile，那么我们需要判断表是否允许删除。

我们回到 commit 函数里面，在执行完 prepareCommit 之后得到了 finalActions 列表，这些信息就是我们需要写入到事务日志里面的数据。紧接着会判断这次事务变更是否需要删除之前的文件，如果是，那么 isBlindAppend 为 false，否则为 true。

当 commitInfo.enabled 参数设置为 true（默认），那么还需要将 commitInfo 写入到事务日志文件里面。CommitInfo 里面包含了操作时间、操作的类型（WRITEUPDATE）、操作类型（Overwrite）等重要信息。最后到了 doCommit 函数的调用，大家注意看第一个参数传递的是 snapshot.version 1，snapshot.version 是事务日志中最新的版本，比如 _delta_lake 目录下的文件如下：

那么 snapshot.version 的值就是3，所以这次更新操作的版本应该是4。我们来看下 doCommit 函数的实现：

private def doCommit( attemptVersion: Long, actions: Seq[Action], attemptNumber: Int): Long = deltaLog.lockInterruptibly { try { logDebug(s"Attempting to commit version $attemptVersion with ${actions.size} actions") // 真正写事务日志的操作 deltaLog.store.write( deltaFile(deltaLog.logPath, attemptVersion), actions.map(_.json).toIterator) val commitTime = System.nanoTime() // 由于发生了数据更新，所以更新内存中事务日志的最新快照，并做相关判断 val postCommitSnapshot = deltaLog.update() if (postCommitSnapshot.version < attemptVersion) { throw new IllegalStateException( s"The committed version is $attemptVersion " s"but the current version is ${postCommitSnapshot.version}.") } // 发送一些统计信息 var numAbsolutePaths = 0 var pathHolder: Path = null val distinctPartitions = new mutable.HashSet[Map[String, String]] val adds = actions.collect { case a: AddFile => pathHolder = new Path(new URI(a.path)) if (pathHolder.isAbsolute) numAbsolutePaths = 1 distinctPartitions = a.partitionValues a } val stats = CommitStats( startVersion = snapshot.version, commitVersion = attemptVersion, readVersion = postCommitSnapshot.version, txnDurationMs = NANOSECONDS.toMillis(commitTime - txnStartNano), commitDurationMs = NANOSECONDS.toMillis(commitTime - commitStartNano), numAdd = adds.size, numRemove = actions.collect { case r: RemoveFile => r }.size, bytesNew = adds.filter(_.dataChange).map(_.size).sum, numFilesTotal = postCommitSnapshot.numOfFiles, sizeInBytesTotal = postCommitSnapshot.sizeInBytes, protocol = postCommitSnapshot.protocol, info = Option(commitInfo).map(_.copy(readVersion = None, isolationLevel = None)).orNull, newMetadata = newMetadata, numAbsolutePaths, numDistinctPartitionsInAdd = distinctPartitions.size, isolationLevel = null) recordDeltaEvent(deltaLog, "delta.commit.stats", data = stats) attemptVersion } catch { case e: java.nio.file.FileAlreadyExistsException => checkAndRetry(attemptVersion, actions, attemptNumber) }}

、这里就是真正写事务日志的操作，其中 store 是通过 spark.delta.logStore.class 参数指定的，目前支持 HDFS、S3、Azure 以及 Local 等存储介质。默认是 HDFS。具体的写事务操作参见下面的介绍。

、持久化事务日志之后，更新内存中的事务日志最新的快照，然后做相关的合法性校验；

、发送一些统计信息。这里应该是 databricks 里面含有的功能，开源版本这里面其实并没有做什么操作。

下面我们开看看真正写事务日志的实现，为了简单起见，我们直接查看 HDFSLogStore 类中对应的方法，主要涉及 writeInternal，其实现如下：

private def writeInternal(path: Path, actions: Iterator[String], overwrite: Boolean): Unit = { // 获取 HDFS 的 FileContext 用于后面写事务日志 val fc = getFileContext(path) // 如果需要写的事务日志已经存在那么就需要抛出异常，后面再重试 if (!overwrite && fc.util.exists(path)) { // This is needed for the tests to throw error with local file system throw new FileAlreadyExistsException(path.toString) } // 事务日志先写到临时文件 val tempPath = createTempPath(path) var streamClosed = false // This flag is to avoid double close var renameDone = false // This flag is to save the delete operation in most of cases. val stream = fc.create( tempPath, EnumSet.of(CREATE), CreateOpts.checksumParam(ChecksumOpt.createDisabled())) try { // 将本次修改产生的 actions 写入到临时事务日志里 actions.map(_ "n").map(_.getBytes(UTF_8)).foreach(stream.write) stream.close() streamClosed = true try { val renameOpt = if (overwrite) Options.Rename.OVERWRITE else Options.Rename.NONE // 将临时的事务日志移到正式的事务日志里面，如果移动失败则抛出异常，后面再重试 fc.rename(tempPath, path, renameOpt) renameDone = true } catch { case e: org.apache.hadoop.fs.FileAlreadyExistsException => throw new FileAlreadyExistsException(path.toString) } } finally { if (!streamClosed) { stream.close() } // 删除临时事务日志 if (!renameDone) { fc.delete(tempPath, false) } }}

writeInternal 的实现逻辑很简单，其实就是我们正常的写文件操作，具体如下：

、获取 HDFS 的 FileContext 用于后面写事务日志

、如果需要写的事务日志已经存在那么就需要抛出异常，后面再重试；比如上面我们写事务日志之前磁盘中最新的事务日志文件是 00000000000000000003.json，我们这次写的事务日志文件应该是 00000000000000000004.json，但是由于 Delta Lake 允许多个用户写数据，所以在我们获取最新的事务日志版本到写事务日志期间已经有用户写了一个新的事务日志 00000000000000000004.json，那么我们这次写肯定要失败了。这时候会抛出 FileAlreadyExistsException 异常，以便后面重试。

、写事务日志的时候是先写到表 _delta_lake 目录下的临时文件里面，比如我们这次写的事务日志文件为 00000000000000000004.json，那么会往类似于 .00000000000000000004.json.0887f7da-5920-4214-bd2e-7c14b4244af1.tmp 文件里面写数据的。

、将本次更新操作的事务记录写到临时文件里；

、写完事务日志之后我们需要将临时事务日志移到最后正式的日志文件里面，比如将 .00000000000000000004.json.0887f7da-5920-4214-bd2e-7c14b4244af1.tmp 移到 00000000000000000004.json。大家注意，在写事务日志文件的过程中同样存在多个用户修改表，所以 00000000000000000004.json 文件很可能已经被别的修改占用了，这时候也需要抛出 FileAlreadyExistsException 异常，以便后面重试。

整个事务日志写操作就完成了，我们再回到 doCommit 函数，注意由于 writeInternal 可能会抛出 FileAlreadyExistsException 异常，也就是 deltaLog.store.write(xxx) 调用可能会抛出异常，我们注意看到 doCommit 函数 catch 了这个异常，并在异常捕获里面调用 checkAndRetry(attemptVersion, actions, attemptNumber)，这就是事务日志重写过程， checkAndRetry 函数的实现如下：

protected def checkAndRetry( checkVersion: Long, actions: Seq[Action], attemptNumber: Int): Long = recordDeltaOperation( deltaLog, "delta.commit.retry", tags = Map(TAG_LOG_STORE_CLASS -> deltaLog.store.getClass.getName)) { // 读取磁盘中持久化的事务日志，并更新内存中事务日志快照 deltaLog.update() // 重试的版本是刚刚更新内存中事务日志快照的版本 1 val nextAttempt = deltaLog.snapshot.version 1 // 做相关的合法性判断 (checkVersion until nextAttempt).foreach { version => val winningCommitActions = deltaLog.store.read(deltaFile(deltaLog.logPath, version)).map(Action.fromJson) val metadataUpdates = winningCommitActions.collect { case a: Metadata => a } val txns = winningCommitActions.collect { case a: SetTransaction => a } val protocol = winningCommitActions.collect { case a: Protocol => a } val commitInfo = winningCommitActions.collectFirst { case a: CommitInfo => a }.map( ci => ci.copy(version = Some(version))) val fileActions = winningCommitActions.collect { case f: FileAction => f } // If the log protocol version was upgraded, make sure we are still okay. // Fail the transaction if we're trying to upgrade protocol ourselves. if (protocol.nonEmpty) { protocol.foreach { p => deltaLog.protocolRead(p) deltaLog.protocolWrite(p) } actions.foreach { case Protocol(_, _) => throw new ProtocolChangedException(commitInfo) case _ => } } // Fail if the metadata is different than what the txn read. if (metadataUpdates.nonEmpty) { throw new MetadataChangedException(commitInfo) } // Fail if the data is different than what the txn read. if (dependsOnFiles && fileActions.nonEmpty) { throw new ConcurrentWriteException(commitInfo) } // Fail if idempotent transactions have conflicted. val txnOverlap = txns.map(_.appId).toSet intersect readTxn.toSet if (txnOverlap.nonEmpty) { throw new ConcurrentTransactionException(commitInfo) } } logInfo(s"No logical conflicts with deltas [$checkVersion, $nextAttempt), retrying.") // 开始重试事务日志的写操作 doCommit(nextAttempt, actions, attemptNumber 1)}

checkAndRetry 函数只有在事务日志写冲突的时候才会出现，主要目的是重写当前的事务日志。

、因为上次更新事务日志发生冲突，所以我们需要再一次读取磁盘中持久化的事务日志，并更新内存中事务日志快照；

、重试的版本是刚刚更新内存中事务日志快照的版本 1；

、做相关的合法性判断；

、开始重试事务日志的写操作。

当事务日志成功持久化到磁盘之后，这时候会执行 commit 操作的最后一步，执行 postCommit 函数，其实现如下：

protected def postCommit(commitVersion: Long, commitActions: Seq[Action]): Unit = { committed = true if (commitVersion != 0 && commitVersion % deltaLog.checkpointInterval == 0) { try { deltaLog.checkpoint() } catch { case e: IllegalStateException => logWarning("Failed to checkpoint table state.", e) } }}

postCommit 函数实现很简单，就是判断需不需要对事务日志做一次 checkpoint 操作，其中 deltaLog.checkpointInterval 就是通过 spark.databricks.delta.properties.defaults.checkpointInterval 参数设置的，默认每写10次事务日志做一次 checkpoint。

checkpoint 的其实就是将内存中事务日志的最新快照持久化到磁盘里面，如下所示：

-rw-r--r-- 1 yangping.wyp wheel 811B 8 28 19:12 00000000000000000000.json-rw-r--r-- 1 yangping.wyp wheel 514B 8 28 19:14 00000000000000000001.json-rw-r--r-- 1 yangping.wyp wheel 711B 8 29 10:54 00000000000000000002.json-rw-r--r-- 1 yangping.wyp wheel 865B 8 29 10:56 00000000000000000003.json-rw-r--r-- 1 yangping.wyp wheel 668B 8 29 14:36 00000000000000000004.json-rw-r--r-- 1 yangping.wyp wheel 13K 8 29 14:36 00000000000000000005.checkpoint.parquet-rw-r--r-- 1 yangping.wyp wheel 514B 8 29 14:36 00000000000000000005.json-rw-r--r-- 1 yangping.wyp wheel 514B 8 29 14:36 00000000000000000006.json-rw-r--r-- 1 yangping.wyp wheel 24B 8 29 14:36 _last_checkpoint

00000000000000000005.checkpoint.parquet 文件就是对事务日志进行 checkpoint 的文件，里面汇总了 00000000000000000000.json - 00000000000000000005.json 之间的所有事务操作记录。所以下一次需要构建事务日志的快照时，只需要从 00000000000000000005.checkpoint.parquet 文件、00000000000000000006.json 文件构造，而无需再读取 00000000000000000000.json - 00000000000000000005.json 之间的事务操作。

同时我们还看到做完 checkpoint 之后还会生成一个 _last_checkpoint 文件，这个其实就是对 CheckpointMetaData 类的持久化操作。里面记录了最后一次 checkpoint 的版本，checkpoint 文件里面的 Action 条数，如下：

? cat _last_checkpoint{"version":5,"size":10}

注意，其实 CheckpointMetaData 类里面还有个 parts 字段，这个代表 checkpoint 文件有几个分片。因为随着时间的推移，checkpoint 文件也会变得很大，如果只写到一个 checkpoint 文件里面效率不够好，这时候会对 checkpoint 文件进行拆分，拆分成几个文件是记录到 parts 里面，但是目前开源版本的 Delta Lake 尚无这个功能，也不知道数砖后面会不会开源。

写在最后

为了营造一个开放的Cassandra技术交流环境，社区建立了微信公众号和钉钉群。为广大用户提供专业的技术分享及问答，定期开展专家技术直播，欢迎大家加入。另云Cassandra免费火爆公测中，欢迎试用：https://www.aliyun.com/product/cds

作者：明惠

本文为云栖社区内容，未经允许不得转载。

770㎡现代轻奢风别墅，精致典雅

阿兰·德波顿在《幸福的建筑》中写道的，性格中难以捉摸的真正的、有创造性的、自发的部分，一定意义上是由我们碰巧身处其中的所在决定的。

Alain De Botton wrote in "Building of Happiness" that the elusive real, creative, and spontaneous part of the character is determined to a certain extent by where we happen to be in it.

我们孜孜不倦追求的家的模样，或许是这样的：它拥有无边界的视野、静谧而丰富、极具品味又惬意，将跳动的、艺术的、高贵的、隐秘的元素组合在一起，打造出一个彰显个性与品位的家。

The look of the home that we tirelessly pursue may be like this: it has a borderless vision, quiet and rich, very tasteful and cozy, combining beating, artistic, noble, and secret elements together to create a A home that demonstrates individuality and taste.

本案建筑为五层独栋别墅，空间由一层入户，设计功能主义与艺术美学并重，待客会友、家庭欢聚及家庭成员私享休闲三大功能，以合理的空间规划分布。

The building in this case is a five-story single-family villa. The space is housed on the first floor. Design functionalism and artistic aesthetics are both important. The three functions of hospitality and friends, family gatherings and family members’ private leisure are distributed with reasonable spatial planning.

欢客

一层开敞的空间同时容纳了客厅、酒窖、台球厅的功能，开放的交互关系增进了情感交流，这种愉悦的空间氛围如同谱写乐章，在美感的引导下，把偶然的事件变成一个主题，然后记录在生命的乐章中，犹如作曲家谱写奏鸣曲的主旋律，人生的主题也在反复出现、重演、修正、延展。

The open space on the first floor accommodates the functions of the living room, wine cellar, and billiard hall at the same time. The open interactive relationship enhances emotional communication. This pleasant space atmosphere is like composing a piece of music. Under the guidance of aesthetics, it turns accidental events into A theme is then recorded in the movement of life, just as a composer composes the main melody of a sonata. The theme of life is also repeated, repeated, revised, and extended.

沙发呈围合之势限定范围，与处于同一空间的休闲区之间保有一定的区域感，设计师将沙发区重心面向花园景观，闲暇之余可观窗外景，与户外形成良好的互通关系，让空间拥有无限的延展性和强大的融合感。

The sofa is enclosed to limit the range, and maintains a certain sense of area between the leisure area in the same space. The designer places the center of the sofa area toward the garden landscape, and enjoys the outside view of the window during leisure time, forming a good communication relationship with the outdoors. The space has unlimited ductility and a strong sense of integration.

客厅自然光线与功能井然有序地巧妙结合，简约流线的物化呈现给功能以交流的空间，人与空间和谐有序，无主灯设计增添空间的层次感。

The natural light in the living room is ingeniously combined with functions in an orderly manner. The simple and streamlined materialization presents a space where functions are communicated. People and space are harmonious and orderly. The design without the main light adds a sense of hierarchy to the space.

一整排的顶天立地柜，玻璃的窄边黑框门和木饰面的层板相得益彰，配合暗藏灯带，削弱空间压抑感又扩大了整体空间感。

A whole row of top-to-the-top cabinets, glass narrow-edge black frame doors and wooden veneer laminates complement each other, and the hidden light strips weaken the sense of space depression and expand the overall sense of space.

在纷杂浮躁的尘世里，酒窖或许是寻觅归属感的最佳地方。以深棕色为主调，别致的酒柜似融非融、似隔非隔，原木木材有着其独特的木材芳香，主人能够在存酒空间体验到自然、朴实的气息，简洁中带有复古的味道。

In the chaotic and impetuous world, the wine cellar may be the best place to find a sense of belonging. With dark brown as the main tone, the chic wine cabinet seems to be infused and separated. The log wood has its unique wood fragrance. The owner can experience the natural and simple atmosphere in the wine storage space, with a simple and retro taste. .

品酒区不仅仅是一个放置酒杯、临时饮酒的场所，还具备品酒、休闲等多种功能。品酒区吧台以纯白大理石铺陈，便于品酒以便观察酒的色度。

The wine tasting area is not only a place for placing wine glasses and temporary drinking, but also has multiple functions such as wine tasting and leisure. The bar counter in the wine tasting area is paved with pure white marble for easy wine tasting to observe the color of the wine.

艺术

旋转楼梯贯穿整体空间，犹如大型雕塑串联着每层过厅的艺术品，产生趣味性的情景对话：晚色从陌生的树木中走来，像夜间的小路，正静悄悄地听着回忆的足音。

The spiral staircase runs through the entire space, like a large sculpture connecting the artworks on each floor of the hall, producing interesting situational dialogues: the evening comes from strange trees, like a path at night, quietly listening to the footsteps of memories .

作为串联上下空间的桥梁，梯步沿用大理石，扶手选取黑色木作，暖色灯光映衬，黑白金的搭配凸显空间构成的张弛。

As a bridge connecting the upper and lower spaces, the stairs are always made of marble, the handrails are made of black wood, and the warm colors are set against the lights. The combination of black and white gold highlights the relaxation of the space.

楼梯处线条优雅，搭配倾泻而下的水晶灯，金属与大理石的碰撞，仿若美妙的音乐在供人们欣赏。

The lines of the stairs are elegant, with crystal lamps pouring down, and the collision of metal and marble is like wonderful music for people to enjoy.

艺术步梯利用光影变幻，艺术化表达优雅的意境。沿艺术步梯向上，是作为休闲聚会的会客厅，空间的社交属性更为明显。

The artistic step ladder uses the change of light and shadow to artistically express the elegant mood. Along the artistic steps, it is used as a living room for leisure gatherings, and the social attributes of the space are more obvious.

围炉

会客厅取纳自然所赋予的安静与宁性引入室内，将自然、空间和光影融为一体，“谈笑有鸿儒，往来无白丁”，创造出度假式生活的会客环境。

The living room takes the quietness and tranquility endowed by nature into the room, and integrates nature, space, light and shadow, "talking and laughing, there is no philanthropy, no white ding", creating a resort-style living environment.

整体以黑白主为色调，辅以蓝、橙色块相应点缀，为冷静的空间加入几抹跳色，背景墙的设计由上至下大气铺开，每一帧都如艺术画般精心着笔。浅色布艺沙发、爱马仕橙单人休闲椅和茶几、晕染图纹的地毯，每处设计都传递着细节的考量，营造一处蕴藏温馨的氛围。

The overall tone is black and white, supplemented by blue and orange blocks correspondingly embellished, adding a few jumps to the calm space. The design of the background wall is spread out from top to bottom, and each frame is meticulously painted like an artistic painting. Light-colored fabric sofa, Hermès orange single lounge chair and coffee table, and smudged patterned carpet, each design conveys the consideration of details, creating a warm atmosphere.

绮席

比起客厅的温和淡雅，餐厨区的色彩与材质运用则略为清新活力。顶部的设计延续出第三空间视角，更将幸福的时刻一一记录，美食、快乐都是双倍的了！

Compared with the gentleness and elegance of the living room, the use of colors and materials in the kitchen area is slightly fresher and more vibrant. The design on the top continues the perspective of the third space, and records the moments of happiness one by one. Food and happiness are doubled!

半开放的厨房以灰白色系为主，窗明几净，感受烟火气息，体验一个真实，鲜活，充满温度的空间环境。

The semi-open kitchen is mainly gray and white, with bright and clean windows, feel the smell of fireworks, and experience a real, lively and warm space environment.

餐桌选用圆形款式，以围坐的方式营造了一家人一团和气的温馨用餐场景。色调以白色为主，搭配灰色餐椅来点缀，空间统一而整体。强调设计感的吊灯，简洁中见品质。舒适与美学的兼顾，体现了主人崇尚雅致而美好的生活方式。

The dining table adopts a round style to create a cozy and warm dining scene for a family by sitting around. The color is mainly white, with gray dining chairs for embellishment, and the space is unified and integrated. The chandelier that emphasizes the sense of design, sees quality in simplicity. The balance of comfort and aesthetics reflects the owner's advocating elegant and beautiful lifestyle.

幽盛

位于三楼的长辈房次虽删繁就简，却不失艺术的韵味。在洁净的米灰色调之中，摈弃生活的焦灼繁杂之余，释放自己心中属于美与艺术的美好愿景。

Although the elders’ rooms on the third floor have been deleted from the complex and simplified, they do not lose the artistic charm. In the clean beige tones, abandon the anxiety and complexity of life, and release the beautiful vision of beauty and art in my heart.

在月清风朗的夜晚，拉开窗子，任月色静静流淌，轻盈飘逸的韵致，清新蕴含的情调自然流淌在心中，心情在月色中变得清朗而柔软。恍惚间，感受到生命中的种种感动和灵动浮若。

On a clear and breezy night, open the window, let the moonlight flow quietly, with a light and elegant charm, and the fresh sentiment that naturally flows in the heart, and the mood becomes clear and soft in the moonlight. In a trance, I feel all the touches and agility in my life.

琼菲

女儿房里粉色的加入，将时尚新贵的气质轻巧点缀，多色彩的强烈碰撞，极富个性时尚的质感。深棕色的地板和纯洁的白色背景墙提升粉色的高级感，营造令人安心的氛围，带来甜而不腻的感受。

The addition of pink in the daughter's room embellishes the stylish upstarts with a light touch, the strong collision of multiple colors, and a very personalized and fashionable texture. The dark brown floor and pure white background wall enhance the high-level sense of pink, creating a reassuring atmosphere and bringing a sweet but not greasy feeling.

床侧的现代艺术感吊灯，让空间富含柔和而不失高级的气质。点缀质感的床品，随意搭一条薄毯，触目所及之处皆臻品，悠然莹润仿佛让诗意多了几分纯净，让生活多了些许恣意。

The modern artistic chandelier on the side of the bed enriches the space with a soft yet high-level temperament. Embellish the texture of the bedding, put up a thin blanket at will, and everything you can see is the perfect product. The leisurely radiance seems to make the poetry a little more pure and make life a little more arbitrary.

隐逸

主卧套房独占一层，纵深通透的主卧带来自由呼吸的愉悦通感，宽敞的卧室、卫浴间和休闲区，打造主人房礼序与尊贵并具的生活仪式。

The master bedroom suite occupies the first floor. The deep and transparent master bedroom brings the pleasant sense of free breathing. The spacious bedroom, bathroom and leisure area create a life ceremony that combines the etiquette and honor of the master bedroom.

兼顾多与少、轻与重、软和硬的矛盾关系，主卧空间在延续了岩板材质的运用上，融入多种元素内容，借以传递生活有度，心情坦然的幸福观：兼容并蓄，保持平稳的心态，让一切变得刚刚好，就会产生恰到好处的得到。

Taking into account the contradictory relationship between more and less, light and heavy, soft and hard, the master bedroom space continues the use of rock slab material and incorporates a variety of elements to convey the concept of happiness in life and calm mood: inclusive and maintaining With a stable mindset, everything becomes just right, and you will get just the right amount.

现代简约的奢华感在通体大理石材质的映衬下，更多了几分神韵，典雅气质显而易见，卫生间镜面的运用恰到好处，无形中扩大了洗浴空间，使得分区鲜明的洗浴空间使用起来更加便捷。

The modern and simple luxury feeling is set off by the whole body marble material, and the elegance is obvious. The use of bathroom mirrors is just right, which invisibly enlarges the bathing space, making the partitioned bathing space more convenient to use.

庭院

设计将开阔的自然空间，延展至室外，院子让本就通透敞亮的居室充满和煦的阳光，草木繁盛的庭院，白云万里的长空，远眺是一望无际的田野，在此四时风景正与理想生活将一起愉悦生长。

The design extends the open natural space to the outdoors. The courtyard fills the already transparent and bright living room with warm sunlight. The grassy courtyard, the sky with white clouds, overlooks the endless fields, and the scenery is ideal in the four seasons. Life will grow happily together.