请输入您要查询的百科知识:

 

词条 Clustered file system
释义

  1. {{Anchor|SHARED-DISK}}Shared-disk file system

     Examples 

  2. {{Anchor|DISTRIBUTED-FS}}Distributed file systems

     Design goals  History  Examples 

  3. Network-attached storage

  4. Design considerations

     Avoiding single point of failure  Performance  Concurrency 

  5. History

  6. See also

  7. References

  8. Further reading

{{distinguish|data cluster}}{{redirect2|Network filesystem|Parallel file system|the Sun NFS protocol|Network File System|the IBM GPFS protocol|IBM General Parallel File System}}{{multiple|{{refimprove|date=December 2015}}{{cleanup|date=December 2013|reason=Merges need to be smoothed over}}
}}

A clustered file system is a file system which is shared by being simultaneously mounted on multiple servers. There are several approaches to clustering, most of which do not employ a clustered file system (only direct attached storage for each node). Clustered file systems can provide features like location-independent addressing and redundancy which improve reliability or reduce the complexity of the other parts of the cluster. Parallel file systems are a type of clustered file system that spread data across multiple storage nodes, usually for redundancy or performance.[1]

{{Anchor|SHARED-DISK}}Shared-disk file system

A shared-disk file system uses a storage area network (SAN) to allow multiple computers to gain direct disk access at the block level. Access control and translation from file-level operations that applications use to block-level operations used by the SAN must take place on the client node. The most common type of clustered file system, the shared-disk file system —by adding mechanisms for concurrency control—provides a consistent and serializable view of the file system, avoiding corruption and unintended data loss even when multiple clients try to access the same files at the same time. Shared-disk file-systems commonly employ some sort of fencing mechanism to prevent data corruption in case of node failures, because an unfenced device can cause data corruption if it loses communication with its sister nodes and tries to access the same information other nodes are accessing.

The underlying storage area network may use any of a number of block-level protocols, including SCSI, iSCSI, HyperSCSI, ATA over Ethernet (AoE), Fibre Channel, network block device, and InfiniBand.

There are different architectural approaches to a shared-disk filesystem. Some distribute file information across all the servers in a cluster (fully distributed).[2] Others utilize a centralized metadata server. Both achieve the same result of enabling all servers to access all the data on a shared storage device.[3]

Examples

{{Div col|colwidth=25em}}
  • Silicon Graphics (SGI) clustered file system (CXFS)
  • Veritas Cluster File System
  • DataPlow Nasan File System
  • IBM General Parallel File System (GPFS)
  • Lustre
  • Microsoft Cluster Shared Volumes (CSV)
  • Oracle Cluster File System (OCFS)
  • OpenVMS Files-11 File System
  • PolyServe storage solutions
  • Quantum StorNext File System (SNFS), ex ADIC, ex CentraVision File System (CVFS)
  • Blue Whale Clustered file system (BWFS)
  • Red Hat Global File System (GFS2)
  • Sun QFS
  • TerraScale Technologies TerraFS
  • Versity VSM
  • VMware VMFS
  • Apple Xsan
  • LizardFS
{{div col end}}

{{Anchor|DISTRIBUTED-FS}}Distributed file systems

Distributed file systems do not share block level access to the same storage but use a network protocol.[4][5] These are commonly known as network file systems, even though they are not the only file systems that use the network to send data.[6] Distributed file systems can restrict access to the file system depending on access lists or capabilities on both the servers and the clients, depending on how the protocol is designed.

The difference between a distributed file system and a distributed data store is that a distributed file system allows files to be accessed using the same interfaces and semantics as local files{{snd}} for example, mounting/unmounting, listing directories, read/write at byte boundaries, system's native permission model. Distributed data stores, by contrast, require using a different API or library and have different semantics (most often those of a database).[7]

A distributed file system may also be created by software implementing IBM's Distributed Data Management Architecture (DDM), in which programs running on one computer use local interfaces and semantics to create, manage and access files located on other networked computers. All such client requests are trapped and converted to equivalent messages defined by the DDM. Using protocols also defined by the DDM, these messages are transmitted to the specified remote computer on which a DDM server program interprets the messages and uses the file system interfaces of that computer to locate and interact with the specified file.

Design goals

Distributed file systems may aim for "transparency" in a number of aspects. That is, they aim to be "invisible" to client programs, which "see" a system which is similar to a local file system. Behind the scenes, the distributed file system handles locating files, transporting data, and potentially providing other features listed below.

  • Access transparency: clients are unaware that files are distributed and can access them in the same way as local files are accessed.
  • Location transparency: a consistent namespace exists encompassing local as well as remote files. The name of a file does not give its location.
  • Concurrency transparency: all clients have the same view of the state of the file system. This means that if one process is modifying a file, any other processes on the same system or remote systems that are accessing the files will see the modifications in a coherent manner.
  • Failure transparency: the client and client programs should operate correctly after a server failure.
  • Heterogeneity: file service should be provided across different hardware and operating system platforms.
  • Scalability: the file system should work well in small environments (1 machine, a dozen machines) and also scale gracefully to bigger ones (hundreds through tens of thousands of systems).
  • Replication transparency: Clients should be unaware of the file replication performed across multiple servers to support scalability.
  • Migration transparency: files should be able to move between different servers without the client's knowledge.

History

The Incompatible Timesharing System used virtual devices for transparent inter-machine file system access in the 1960s. More file servers were developed in the 1970s. In 1976 Digital Equipment Corporation created the File Access Listener (FAL), an implementation of the Data Access Protocol as part of DECnet Phase II which became the first widely used network file system. In 1985 Sun Microsystems created the file system called "Network File System" (NFS) which became the first widely used Internet Protocol based network file system.[5] Other notable network file systems are Andrew File System (AFS), Apple Filing Protocol (AFP), NetWare Core Protocol (NCP), and Server Message Block (SMB) which is also known as Common Internet File System (CIFS).

In 1986, IBM announced client and server support for Distributed Data Management Architecture (DDM) for the System/36, System/38, and IBM mainframe computers running CICS. This was followed by the support for IBM Personal Computer, AS/400, IBM mainframe computers under the MVS and VSE operating systems, and FlexOS. DDM also became the foundation for Distributed Relational Database Architecture, also known as DRDA.

There are many peer-to-peer network protocols for open-source distributed file systems for cloud or closed-source clustered file systems, e. g.: 9P, AFS, Coda, CIFS/SMB, DCE/DFS, Lustre, [https://www.panasas.com/panfs-architecture/panfs/ PanFS], Google File System, Mnet, Chord Project.

Examples

{{Main|List of file systems#Distributed file systems|l1 = List of distributed file systems}}{{Div col|colwidth=25em}}
  • Alluxio
  • BeeGFS (Fraunhofer)
  • Ceph (Inktank, Red Hat, SUSE)
  • Windows Distributed File System (DFS) (Microsoft)
  • Infinit
  • GfarmFS
  • GlusterFS (Red Hat)
  • GFS (Google Inc.)
  • HDFS (Apache Software Foundation)
  • IPFS
  • iRODS
  • LizardFS (Skytechnology)
  • MapR FS
  • MooseFS (Core Technology / Gemius)
  • ObjectiveFS
  • OneFS (EMC Isilon)
  • OpenIO
  • OrangeFS (Clemson University, Omnibond Systems), formerly Parallel Virtual File System
  • PanFS (Panasas)
  • Parallel Virtual File System (Clemson University, Argonne National Laboratory, Ohio Supercomputer Center)
  • RozoFS (Rozo Systems)
  • Torus (CoreOS)
  • XtreemFS
{{div col end}}

Network-attached storage

{{Main|Network-attached storage}}

Network-attached storage (NAS) provides both storage and a file system, like a shared disk file system on top of a storage area network (SAN). NAS typically uses file-based protocols (as opposed to block-based protocols a SAN would use) such as NFS (popular on UNIX systems), SMB/CIFS (Server Message Block/Common Internet File System) (used with MS Windows systems), AFP (used with Apple Macintosh computers), or NCP (used with OES and Novell NetWare).

Design considerations

Avoiding single point of failure

The failure of disk hardware or a given storage node in a cluster can create a single point of failure that can result in data loss or unavailability. Fault tolerance and high availability can be provided through data replication of one sort or another, so that data remains intact and available despite the failure of any single piece of equipment. For examples, see the lists of distributed fault-tolerant file systems and distributed parallel fault-tolerant file systems.

Performance

A common performance measurement of a clustered file system is the amount of time needed to satisfy service requests. In conventional systems, this time consists of a disk-access time and a small amount of CPU-processing time. But in a clustered file system, a remote access has additional overhead due to the distributed structure. This includes the time to deliver the request to a server, the time to deliver the response to the client, and for each direction, a CPU overhead of running the communication protocol software.

Concurrency

Concurrency control becomes an issue when more than one person or client is accessing the same file or block and want to update it. Hence updates to the file from one client should not interfere with access and updates from other clients. This problem is more complex with file systems due to concurrent overlapping writes, where different writers write to overlapping regions of the file concurrently.[8] This problem is usually handled by concurrency control or locking which may either be built into the file system or provided by an add-on protocol.

History

IBM mainframes in the 1970s could share physical disks and file systems if each machine had its own channel connection to the drives' control units. In the 1980s, Digital Equipment Corporation's TOPS-20 and OpenVMS clusters (VAX/ALPHA/IA64) included shared disk file systems.[9]

See also

{{Div col|colwidth=25em}}
  • Network-attached storage
  • Storage area network
  • Shared resource
  • Direct-attached storage
  • Peer-to-peer file sharing
  • Disk sharing
  • Distributed data store
  • Distributed file system for cloud
  • Global file system
  • Gopher (protocol)
  • List of distributed file systems
  • CacheFS
  • RAID
{{div col end}}

References

1. ^{{cite web |last1=Saify |first1=Amina |last2=Kochhar |first2=Garima |last3=Hsieh |first3=Jenwei |last4=Celebioglu |first4=Onur |title=Enhancing High-Performance Computing Clusters with Parallel File Systems |url=http://i.dell.com/sites/doccontent/business/solutions/power/de/Documents/ps2q05-20040179-Saify-OE_de.pdf |website=Dell Power Solutions |publisher=Dell Inc. |accessdate=6 March 2019 |date=May 2005}}
2. ^{{cite web |title=Disk Backup Through Algebraic Signatures in Scalable Distributed Data Structures|last1=Mokadem |first1=Riad |last2=Litwin |first2=Witold |last3=Schwarz |first3=Thomas |url=https://www.lamsade.dauphine.fr/~litwin/cours98/Doc-cours-clouds/BackupAlgSign2006.pdf |publisher= DEXA 2006 Springer |format=PDF |accessdate=8 June 2006|year=2006}}
3. ^{{cite web |last1=Periasamy |first1=Anand Babu |last2=Collins |first2=Eli |last3=Darcy |first3=Jeff |last4=Farnum |first4=Gregory |title=Gluster: What are advantages of using centralized metadata server ( such as namenode in HDFS ) compared to elastic hashing algorithm used in GlusterFS for a File System? |url=https://www.quora.com/Gluster-What-are-advantages-of-using-centralized-metadata-server-such-as-namenode-in-HDFS-compared-to-elastic-hashing-algorithm-used-in-GlusterFS-for-a-File-System |website=Quora |publisher=Quora Inc. |accessdate=6 March 2019 |date=April 2013}}
4. ^{{cite web |last1=Silberschatz |first1=Abraham |last2=Galvin |first2=Peter |last3=Gagne |first3=Greg |title=Operating System Concepts, 8th Edition |url=http://www.uobabylon.edu.iq/download/M.S%202013-2014/Operating_System_Concepts,_8th_Edition%5BA4%5D.pdf |website=University of Babylon |publisher=John Wiley & Sons, Inc. |accessdate=4 March 2019 |pages=705-725 |format=PDF |date=2009}}
5. ^{{citation|title=Sun's Network File System|url=http://pages.cs.wisc.edu/~remzi/OSTEP/dist-nfs.pdf|publisher= Arpaci-Dusseau Books|date = 2014|first1 = Remzi H.|last1 =Arpaci-Dusseau|first2=Andrea C.|last2 = Arpaci-Dusseau}}
6. ^{{cite web |last1=Sandberg |first1=Russel |title=The Sun Network Filesystem: Design, Implementation and Experience |url=https://cse.buffalo.edu/faculty/tkosar/cse710_spring13/papers/nfs.pdf |website=Proceedings of the Summer 1986 USENIX Technical Conference and Exhibition |publisher=Sun Microsystems, Inc. |accessdate=6 March 2019 |date=1986 |quote=NFS was designed to simplify the sharing of filesystem resources in a network of non-homogeneousmachines.}}
7. ^{{cite book |last1=Sobh |first1=Tarek |title=Advances in Computer and Information Sciences and Engineering |date=2008 |publisher=Springer Science & Business Media |pages=423-440}}
8. ^Pessach, Yaniv (2013). Distributed Storage: Concepts, Algorithms, and Implementations. {{ISBN|978-1482561043}}.
9. ^{{cite web |last1=Murphy |first1=Dan |title=Origins and Development of TOPS-20 |url=http://tenex.opost.com/hbook.html |publisher=Dan Murphy |accessdate=6 March 2019 |date=1996 |at=Ambitious Plans for Jupiter |quote=Ultimately, both VMS and TOPS-20 shipped this kind of capability.}}

Further reading

  • A Taxonomy of Distributed Storage Systems
  • [https://web.archive.org/web/20131224120756/http://trac.nchc.org.tw/grid/raw-attachment/wiki/jazz/09-05-22/A_Taxonomy_and_Survey_on_Distributed_File_Systems.pdf A Taxonomy and Survey on Distributed File Systems]
  • A survey of distributed file systems
  • The Evolution of File Systems
{{File systems|state=collapsed}}

7 : Computer file systems|Data management|Distributed data storage|Distributed file systems|Network file systems|Shared disk file systems|Storage area networks

随便看

 

开放百科全书收录14589846条英语、德语、日语等多语种百科知识,基本涵盖了大多数领域的百科知识,是一部内容自由、开放的电子版国际百科全书。

 

Copyright © 2023 OENC.NET All Rights Reserved
京ICP备2021023879号 更新时间:2024/11/13 9:22:27