1. 版本控制基础概念

A version control system (or revision control system) is a system that tracks incremental versions (or revisions) of files and, in some cases, directories over time. Of course, merely tracking the various versions of a user's (or group of users') files and directories isn't very interesting in itself. What makes a version control system useful is the fact that it allows you to explore the changes which resulted in each of those versions and facilitates the arbitrary recall of the same.

In this section, we'll introduce some fairly high-level version control system components and concepts. We'll limit our discussion to modern version control systems—in today's interconnected world, there is very little point in acknowledging version control systems which cannot operate across wide-area networks.

1.1. 版本库

At the core of the version control system is a repository, which is the central store of that system's data. The repository usually stores information in the form of a filesystem tree—a hierarchy of files and directories. Any number of clients connect to the repository, and then read or write to these files. By writing data, a client makes the information available to others; by reading data, the client receives information from others. 图 1.1 “一个典型的客户/服务器系统” illustrates this.

图 1.1. 一个典型的客户/服务器系统

一个典型的客户/服务器系统

Why is this interesting? So far, this sounds like the definition of a typical file server. And indeed, the repository is a kind of file server, but it's not your usual breed. What makes the repository special is that as the files in the repository are changed, the repository remembers each version of those files.

When a client reads data from the repository, it normally sees only the latest version of the filesystem tree. But what makes a version control client interesting is that it also has the ability to request previous states of the filesystem from the repository. A version control client can ask historical questions such as What did this directory contain last Wednesday? and Who was the last person to change this file, and what changes did he make? These are the sorts of questions that are at the heart of any version control system.

1.2. 工作副本

A version control system's value comes from the fact that it tracks versions of files and directories, but the rest of the software universe doesn't operate on versions of files and directories. Most software programs understand how to operate only on a single version of a specific type of file. So how does a version control user interact with an abstract—and, often, remote—repository full of multiple versions of various files in a concrete fashion? How does his or her word processing software, presentation software, source code editor, web design software, or some other program—all of which trade in the currency of simple data files—get access to such files? The answer is found in the version control construct known as a working copy.

A working copy is, quite literally, a local copy of a particular version of a user's VCS-managed data upon which that user is free to work. Working copies[5] appear to other software just as any other local directory full of files, so those programs don't have to be version-control-aware in order to read from and write to that data. The task of managing the working copy and communicating changes made to its contents to and from the repository falls squarely to the version control system's client software.

1.3. 版本模型

If the primary mission of a version control system is to track the various versions of digital information over time, a very close secondary mission in any modern version control system is to enable collaborative editing and sharing of that data. But different systems use different strategies to achieve this. It's important to understand these different strategies, for a couple of reasons. First, it will help you compare and contrast existing version control systems, in case you encounter other systems similar to Subversion. Beyond that, it will also help you make more effective use of Subversion, since Subversion itself supports a couple of different ways of working.

1.3.1. 文件共享的问题

所有的版本控制系统都需要解决这样一个基础问题:怎样让系统允许用户共享信息,而不会让他们因意外而互相干扰?版本库里意外覆盖别人的更改非常的容易。

考虑图 1.2 “需要避免的问题”的情景。假设我们有两个共同工作者,Harry 和 Sally。他们想同时编辑版本库里的同一个文件,如果 Harry 先保存它的修改,(过了一会)Sally 可能凑巧用自己的版本覆盖了这些文件,Harry 的更改不会永远消失(因为系统记录了每次修改),但 Harry 所有的修改不会出现在 Sally 新版本的文件中,因为她没有在开始的时候看到 Harry 的修改。所以 Harry 的工作还是丢失了—至少是从最新的版本中丢失了—而且可能是意外的。这就是我们要明确避免的情况!

图 1.2. 需要避免的问题

需要避免的问题

1.3.2. “锁定-修改-解锁”方案

许多版本控制系统使用锁定-修改-解锁机制解决这种问题,在这样的模型里,在一个时间段里版本库的一个文件只允许被一个人修改。首先在修改之前,Harry要锁定住这个文件,锁定很像是从图书馆借一本书,如果Harry锁住这个文件,Sally不能做任何修改,如果Sally想请求得到一个锁,版本库会拒绝这个请求。在Harry结束编辑并且放开这个锁之前,她只可以阅读文件。Harry解锁后,就要换班了,Sally得到自己的轮换位置,锁定并且开始编辑这个文件。图 1.3 ““锁定-修改-解锁”方案”描述了这样的解决方案。

图 1.3. “锁定-修改-解锁”方案

“锁定-修改-解锁”方案

锁定-修改-解锁模型有一点问题就是限制太多,经常会成为用户的障碍:

  • 锁定可能导致管理问题。有时候 Harry 会锁住文件然后忘了此事,这就是说 Sally 一直等待解锁来编辑这些文件,她在这里僵住了。然后 Harry 去旅行了,现在 Sally 只好去找管理员放开锁,这种情况会导致不必要的耽搁和时间浪费。

  • 锁定可能导致不必要的串行开发。如果 Harry 编辑一个文件的开始,Sally 想编辑同一个文件的结尾,这种修改不会冲突,设想修改可以正确的合并到一起,他们可以轻松的并行工作而没有太多的坏处,没有必要让他们轮流工作。

  • 锁定可能导致错误的安全状态。假设 Harry 锁定和编辑一个文件 A,同时 Sally 锁定并编辑文件 B。但是如果 A 和 B 互相依赖,修改导致它们不兼容会怎么样呢?这样 A 和 B 不能正确的工作了,锁定机制对防止此类问题将无能为力—从而产生了一种处于安全状态的假相。很容易想象 Harry 和 Sally 都以为自己锁住了文件,而且从一个安全,孤立的情况开始工作,因而没有尽早发现他们不匹配的修改。锁定经常成为真正交流的替代品。

1.3.3. “拷贝-修改-合并”方案

Subversion, CVS, and many other version control systems use a copy-modify-merge model as an alternative to locking. In this model, each user's client contacts the project repository and creates a personal working copy. Users then work simultaneously and independently, modifying their private copies. Finally, the private copies are merged together into a new, final version. The version control system often assists with the merging, but ultimately, a human being is responsible for making it happen correctly.

这是一个例子,Harry 和 Sally 为同一个项目各自建立了一个工作副本,工作是并行的,修改了同一个文件 A,Sally 首先保存修改到版本库,当 Harry 想去提交修改的时候,版本库提示文件 A 已经过期,换句话说,A在他上次更新之后已经更改了,所以当他通过客户端请求合并版本库和他的工作副本之后,碰巧 Sally 的修改和他的不冲突,所以一旦他把所有的修改集成到一起,他可以将工作拷贝保存到版本库,图 1.4 ““拷贝-修改-合并”方案”图 1.5 ““拷贝-修改-合并”方案(续)”展示了这一过程。

图 1.4. “拷贝-修改-合并”方案

“拷贝-修改-合并”方案

图 1.5. “拷贝-修改-合并”方案(续)

“拷贝-修改-合并”方案(续)

但是如果 Sally 和 Harry 的修改交迭了该怎么办?这种情况叫做冲突,这通常不是个大问题,当 Harry 告诉他的客户端去合并版本库的最新修改到自己的工作副本时,他的文件 A 就会处于冲突状态:他可以看到一对冲突的修改集,并手工的选择保留一组修改。需要注意的是软件不能自动的解决冲突,只有人可以理解并作出智能的选择,一旦 Harry 手工的解决了冲突—也许需要与 Sally 讨论—它可以安全的把合并的文件保存到版本库。

拷贝-修改-合并模型感觉有一点混乱,但在实践中,通常运行的很平稳,用户可以并行的工作,不必等待别人,当工作在同一个文件上时,也很少会有交迭发生,冲突并不频繁,处理冲突的时间远比等待解锁花费的时间少。

最后,一切都要归结到一条重要的因素:用户交流。当用户交流贫乏,语法和语义的冲突就会增加,没有系统可以强制用户完美的交流,没有系统可以检测语义上的冲突,所以没有任何证据能够承诺锁定系统可以防止冲突,实践中,锁定除了约束了生产力,并没有做什么事。



[5] The term working copy can be generally applied to any one file version's local instance. When most folks use the term, though, they are referring to a whole directory tree containing files and subdirectories managed by the version control system.