當前位置:首頁> 熱門排行 >IPFS——內容尋址,版本化,對等的文件系統(1)


2021-02-05 08:39:37

本文基于《IPFS - Content Addressed, Versioned, P2P File System(DRAFT 3)》進行翻譯,翻譯過程中主要參考IPFS白皮書,根據自己的理解來做調整。

作者: Juan Benet (juan@benet.ai)


The InterPlanetary File System (IPFS) is a peer-to-peer distributed file system that seeks to connect all computing devices with the same system of files. In some ways, IPFS is similar to the Web, but IPFS could be seen as a single BitTorrent swarm, exchanging objects within one Git repository. In other words, IPFS provides a high throughput content-addressed block storage model, with contentaddressed hyper links. This forms a generalized Merkle DAG, a data structure upon which one can build versioned file systems, blockchains, and even a Permanent Web. IPFS combines a distributed hashtable, an incentivized block exchange, and a self-certifying namespace. IPFS has no single point of failure, and nodes do not need to trust each other.

星際文件系統(IPFS)是一種對等分布式文件系統,旨在將所有計算設備連接到相同的文件系統。在某些方面,IPFS和Web很像,但IPFS可以看作是一個BitTorrent集群,并在Git倉庫中做對象交換。換句話來說,IPFS提供了高吞吐的基于內容尋址的塊存儲模型和超鏈接。這形成了一個廣義的默克爾有向無環圖(Merkle DAG)數據結構,可以用這個數據結構構建版本化文件系統,區塊鏈,甚至是永久性網站。IPFS結合了分布式哈希表,帶激勵機制的塊交換和自認證的命名空間。IPFS沒有單點故障,節點不需要相互信任。


There have been many attempts at constructing a global
distributed file system. Some systems have seen significant success, and others failed completely. Among the academic attempts, AFS [6] has succeeded widely and is still in use today. Others [7, ?] have not attained the same success. Outside of academia, the most successful systems have been peer-to-peer file-sharing applications primarily geared toward large media (audio and video). Most notably, Napster, KaZaA, and BitTorrent [2] deployed large file distribution systems supporting over 100 million simultaneous users. Even today, BitTorrent maintains a massive deployment where tens of millions of nodes churn daily [16]. These applications saw greater numbers of users and files distributed than their academic file system counterparts. However, the applications were not designed as infrastructure to be built upon. While there have been successful repurposings[^1], no general file-system has emerged that offers global, low-latency, and decentralized distribution.

在構建全球化的分布式文件系統方面,已經有很多嘗試。一些系統取得了重要的成功,而另一些卻徹底的失敗了。在學術界的嘗試中,AFS[6]取得了廣泛的成功,至今也還在使用。另一些[7,?]就沒有獲得一樣的成功。學術之外,最成功的系統是面向大多媒體(音頻和視頻)的點對點,文件共享的應用系統。最值得注意的是,Napster,KaZaA和BitTorrent[2]部署了大型文件分發系統,支持超過1億的同步用戶。即使在今天, BitTorrent也維持著每天千萬節點的活躍數[16]??梢钥吹?,這些應用程序分發的用戶和文件數量比學術文件系統對應數量多。但是,這些應用不是作為基礎設施來設計的。雖然取得了成功的應用,但沒有出現一種通用的文件系統,支持全球化,低延遲,去中心化分發。

Perhaps this is because a “good enough” system for most use cases already exists: HTTP. By far, HTTP is the most successful “distributed system of files” ever deployed. Coupled with the browser, HTTP has had enormous technical and social impact. It has become the de facto way to transmit files across the internet. Yet, it fails to take advantage of dozens of brilliant file distribution techniques invented in the last fifteen years. From one prespective, evolving Web infrastructure is near-impossible, given the number of backwards compatibility constraints and the number of strongparties invested in the current model. But from another perspective, new protocols have emerged and gained wide use since the emergence of HTTP. What is lacking is upgrading design: enhancing the current HTTP web, and introducing new functionality without degrading user experience.

可能是適用大多數場景的“足夠好用”的系統已經存在的原因:它就是HTTP。到目前為止,HTTP是最成功的“文件發布系統”。與瀏覽器相結合,HTTP在技術和社會上有巨大的影響力。它已成為互聯網文件傳輸的事實標準。然而,它沒有采用最近15年發明的數十種先進的文件分發技術。從一個角度來看,考慮到向后兼容性約束的數量以及對當前模型感興趣的強大團隊的數量,演進Web基礎架構幾乎不可能實現。但從另一個角度來看,自HTTP出現以來,新的協議已經出現并得到廣泛的應用。 缺乏的是升級設計:增強當前的HTTP網絡,并引入新功能而不會降低用戶體驗。

Industry has gotten away with using HTTP this long because moving small files around is relatively cheap, even for small organizations with lots of traffic. But we are entering a new era of data distribution with new challenges: (a)hosting and distributing petabyte datasets, (b) computing on large data across organizations, (c) high-volume highdefinition on-demand or real-time media streams, (d) versioning and linking of massive datasets, (e) preventing accidental disappearance of important files, and more. Many of these can be boiled down to “lots of data, accessible everywhere.” Pressed by critical features and bandwidth concerns, we have already given up HTTP for different data distribution protocols. The next step is making them part of the Web itself.


Orthogonal to efficient data distribution, version control systems have managed to develop important data collaboration workflows. Git, the distributed source code version control system, developed many useful ways to model and implement distributed data operations. The Git toolchain offers versatile versioning functionality that large file distribution systems severely lack. New solutions inspired by Git are emerging, such as Camlistore [?], a personal file storage system, and Dat [?] a data collaboration toolchain and dataset package manager. Git has already influenced distributed filesystem design [9], as its content addressed Merkle DAG data model enables powerful file distribution strategies. What remains to be explored is how this data structure can influence the design of high-throughput oriented file systems, and how it might upgrade the Web itself.

與高效的數據分發相對應,版本控制系統已經設法開發了重要數據的協作工作流。分布式源代碼版本控制系統Git開發了許多有用的方法來建模和實現分布式數據操作。 Git工具鏈提供了大型文件分發系統嚴重缺乏的多種版本功能。 受Git啟發的新解決方案正在興起,如Camlistore [?],個人文件存儲系統,以及Dat [?]數據協作工具鏈和數據集包管理器。 Git已經影響了分布式文件系統設計[9],因為它的內容尋址Merkle DAG數據模型可以實現強大的文件分發策略。 還有待探討的是,這種數據結構如何影響高吞吐量文件系統的設計,以及它如何升級Web本身。

This paper introduces IPFS, a novel peer-to-peer versioncontrolled filesystem seeking to reconcile these issues. IPFS synthesizes learnings from many past successful systems.Careful interface-focused integration yields a system greater than the sum of its parts. The central IPFS principle is modeling all data as part of the same Merkle DAG.

本文介紹IPFS,一種新穎的對等網絡版本控制的文件系統,旨在解決這些問題。 IPFS綜合了過去許多成功的系統的經驗教訓。精心設計、專注于接口集成的系統產生的效益大于構建它的各個部件的總和。IPFS的核心原則是將所有數據建模為同一Merkle DAG的一部分。