|  | ========================== | 
|  | General Filesystem Caching | 
|  | ========================== | 
|  |  | 
|  | ======== | 
|  | OVERVIEW | 
|  | ======== | 
|  |  | 
|  | This facility is a general purpose cache for network filesystems, though it | 
|  | could be used for caching other things such as ISO9660 filesystems too. | 
|  |  | 
|  | FS-Cache mediates between cache backends (such as CacheFS) and network | 
|  | filesystems: | 
|  |  | 
|  | +---------+ | 
|  | |         |                        +--------------+ | 
|  | |   NFS   |--+                     |              | | 
|  | |         |  |                 +-->|   CacheFS    | | 
|  | +---------+  |   +----------+  |   |  /dev/hda5   | | 
|  | |   |          |  |   +--------------+ | 
|  | +---------+  +-->|          |  | | 
|  | |         |      |          |--+ | 
|  | |   AFS   |----->| FS-Cache | | 
|  | |         |      |          |--+ | 
|  | +---------+  +-->|          |  | | 
|  | |   |          |  |   +--------------+ | 
|  | +---------+  |   +----------+  |   |              | | 
|  | |         |  |                 +-->|  CacheFiles  | | 
|  | |  ISOFS  |--+                     |  /var/cache  | | 
|  | |         |                        +--------------+ | 
|  | +---------+ | 
|  |  | 
|  | Or to look at it another way, FS-Cache is a module that provides a caching | 
|  | facility to a network filesystem such that the cache is transparent to the | 
|  | user: | 
|  |  | 
|  | +---------+ | 
|  | |         | | 
|  | | Server  | | 
|  | |         | | 
|  | +---------+ | 
|  | |                  NETWORK | 
|  | ~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ | 
|  | | | 
|  | |           +----------+ | 
|  | V           |          | | 
|  | +---------+      |          | | 
|  | |         |      |          | | 
|  | |   NFS   |----->| FS-Cache | | 
|  | |         |      |          |--+ | 
|  | +---------+      |          |  |   +--------------+   +--------------+ | 
|  | |           |          |  |   |              |   |              | | 
|  | V           +----------+  +-->|  CacheFiles  |-->|  Ext3        | | 
|  | +---------+                        |  /var/cache  |   |  /dev/sda6   | | 
|  | |         |                        +--------------+   +--------------+ | 
|  | |   VFS   |                                ^                     ^ | 
|  | |         |                                |                     | | 
|  | +---------+                                +--------------+      | | 
|  | |                  KERNEL SPACE                      |      | | 
|  | ~~~~~|~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~|~~~~~~|~~~~ | 
|  | |                  USER SPACE                        |      | | 
|  | V                                                    |      | | 
|  | +---------+                                           +--------------+ | 
|  | |         |                                           |              | | 
|  | | Process |                                           | cachefilesd  | | 
|  | |         |                                           |              | | 
|  | +---------+                                           +--------------+ | 
|  |  | 
|  |  | 
|  | FS-Cache does not follow the idea of completely loading every netfs file | 
|  | opened in its entirety into a cache before permitting it to be accessed and | 
|  | then serving the pages out of that cache rather than the netfs inode because: | 
|  |  | 
|  | (1) It must be practical to operate without a cache. | 
|  |  | 
|  | (2) The size of any accessible file must not be limited to the size of the | 
|  | cache. | 
|  |  | 
|  | (3) The combined size of all opened files (this includes mapped libraries) | 
|  | must not be limited to the size of the cache. | 
|  |  | 
|  | (4) The user should not be forced to download an entire file just to do a | 
|  | one-off access of a small portion of it (such as might be done with the | 
|  | "file" program). | 
|  |  | 
|  | It instead serves the cache out in PAGE_SIZE chunks as and when requested by | 
|  | the netfs('s) using it. | 
|  |  | 
|  |  | 
|  | FS-Cache provides the following facilities: | 
|  |  | 
|  | (1) More than one cache can be used at once.  Caches can be selected | 
|  | explicitly by use of tags. | 
|  |  | 
|  | (2) Caches can be added / removed at any time. | 
|  |  | 
|  | (3) The netfs is provided with an interface that allows either party to | 
|  | withdraw caching facilities from a file (required for (2)). | 
|  |  | 
|  | (4) The interface to the netfs returns as few errors as possible, preferring | 
|  | rather to let the netfs remain oblivious. | 
|  |  | 
|  | (5) Cookies are used to represent indices, files and other objects to the | 
|  | netfs.  The simplest cookie is just a NULL pointer - indicating nothing | 
|  | cached there. | 
|  |  | 
|  | (6) The netfs is allowed to propose - dynamically - any index hierarchy it | 
|  | desires, though it must be aware that the index search function is | 
|  | recursive, stack space is limited, and indices can only be children of | 
|  | indices. | 
|  |  | 
|  | (7) Data I/O is done direct to and from the netfs's pages.  The netfs | 
|  | indicates that page A is at index B of the data-file represented by cookie | 
|  | C, and that it should be read or written.  The cache backend may or may | 
|  | not start I/O on that page, but if it does, a netfs callback will be | 
|  | invoked to indicate completion.  The I/O may be either synchronous or | 
|  | asynchronous. | 
|  |  | 
|  | (8) Cookies can be "retired" upon release.  At this point FS-Cache will mark | 
|  | them as obsolete and the index hierarchy rooted at that point will get | 
|  | recycled. | 
|  |  | 
|  | (9) The netfs provides a "match" function for index searches.  In addition to | 
|  | saying whether a match was made or not, this can also specify that an | 
|  | entry should be updated or deleted. | 
|  |  | 
|  | (10) As much as possible is done asynchronously. | 
|  |  | 
|  |  | 
|  | FS-Cache maintains a virtual indexing tree in which all indices, files, objects | 
|  | and pages are kept.  Bits of this tree may actually reside in one or more | 
|  | caches. | 
|  |  | 
|  | FSDEF | 
|  | | | 
|  | +------------------------------------+ | 
|  | |                                    | | 
|  | NFS                                  AFS | 
|  | |                                    | | 
|  | +--------------------------+                +-----------+ | 
|  | |                          |                |           | | 
|  | homedir                     mirror          afs.org   redhat.com | 
|  | |                          |                            | | 
|  | +------------+           +---------------+              +----------+ | 
|  | |            |           |               |              |          | | 
|  | 00001        00002       00007           00125        vol00001   vol00002 | 
|  | |            |           |               |                         | | 
|  | +---+---+     +-----+      +---+      +------+------+            +-----+----+ | 
|  | |   |   |     |     |      |   |      |      |      |            |     |    | | 
|  | PG0 PG1 PG2   PG0  XATTR   PG0 PG1   DIRENT DIRENT DIRENT        R/W   R/O  Bak | 
|  | |                                            | | 
|  | PG0                                       +-------+ | 
|  | |       | | 
|  | 00001   00003 | 
|  | | | 
|  | +---+---+ | 
|  | |   |   | | 
|  | PG0 PG1 PG2 | 
|  |  | 
|  | In the example above, you can see two netfs's being backed: NFS and AFS.  These | 
|  | have different index hierarchies: | 
|  |  | 
|  | (*) The NFS primary index contains per-server indices.  Each server index is | 
|  | indexed by NFS file handles to get data file objects.  Each data file | 
|  | objects can have an array of pages, but may also have further child | 
|  | objects, such as extended attributes and directory entries.  Extended | 
|  | attribute objects themselves have page-array contents. | 
|  |  | 
|  | (*) The AFS primary index contains per-cell indices.  Each cell index contains | 
|  | per-logical-volume indices.  Each of volume index contains up to three | 
|  | indices for the read-write, read-only and backup mirrors of those volumes. | 
|  | Each of these contains vnode data file objects, each of which contains an | 
|  | array of pages. | 
|  |  | 
|  | The very top index is the FS-Cache master index in which individual netfs's | 
|  | have entries. | 
|  |  | 
|  | Any index object may reside in more than one cache, provided it only has index | 
|  | children.  Any index with non-index object children will be assumed to only | 
|  | reside in one cache. | 
|  |  | 
|  |  | 
|  | The netfs API to FS-Cache can be found in: | 
|  |  | 
|  | Documentation/filesystems/caching/netfs-api.txt | 
|  |  | 
|  | The cache backend API to FS-Cache can be found in: | 
|  |  | 
|  | Documentation/filesystems/caching/backend-api.txt | 
|  |  | 
|  | A description of the internal representations and object state machine can be | 
|  | found in: | 
|  |  | 
|  | Documentation/filesystems/caching/object.txt | 
|  |  | 
|  |  | 
|  | ======================= | 
|  | STATISTICAL INFORMATION | 
|  | ======================= | 
|  |  | 
|  | If FS-Cache is compiled with the following options enabled: | 
|  |  | 
|  | CONFIG_FSCACHE_STATS=y | 
|  | CONFIG_FSCACHE_HISTOGRAM=y | 
|  |  | 
|  | then it will gather certain statistics and display them through a number of | 
|  | proc files. | 
|  |  | 
|  | (*) /proc/fs/fscache/stats | 
|  |  | 
|  | This shows counts of a number of events that can happen in FS-Cache: | 
|  |  | 
|  | CLASS	EVENT	MEANING | 
|  | =======	=======	======================================================= | 
|  | Cookies	idx=N	Number of index cookies allocated | 
|  | dat=N	Number of data storage cookies allocated | 
|  | spc=N	Number of special cookies allocated | 
|  | Objects	alc=N	Number of objects allocated | 
|  | nal=N	Number of object allocation failures | 
|  | avl=N	Number of objects that reached the available state | 
|  | ded=N	Number of objects that reached the dead state | 
|  | ChkAux	non=N	Number of objects that didn't have a coherency check | 
|  | ok=N	Number of objects that passed a coherency check | 
|  | upd=N	Number of objects that needed a coherency data update | 
|  | obs=N	Number of objects that were declared obsolete | 
|  | Pages	mrk=N	Number of pages marked as being cached | 
|  | unc=N	Number of uncache page requests seen | 
|  | Acquire	n=N	Number of acquire cookie requests seen | 
|  | nul=N	Number of acq reqs given a NULL parent | 
|  | noc=N	Number of acq reqs rejected due to no cache available | 
|  | ok=N	Number of acq reqs succeeded | 
|  | nbf=N	Number of acq reqs rejected due to error | 
|  | oom=N	Number of acq reqs failed on ENOMEM | 
|  | Lookups	n=N	Number of lookup calls made on cache backends | 
|  | neg=N	Number of negative lookups made | 
|  | pos=N	Number of positive lookups made | 
|  | crt=N	Number of objects created by lookup | 
|  | tmo=N	Number of lookups timed out and requeued | 
|  | Updates	n=N	Number of update cookie requests seen | 
|  | nul=N	Number of upd reqs given a NULL parent | 
|  | run=N	Number of upd reqs granted CPU time | 
|  | Relinqs	n=N	Number of relinquish cookie requests seen | 
|  | nul=N	Number of rlq reqs given a NULL parent | 
|  | wcr=N	Number of rlq reqs waited on completion of creation | 
|  | AttrChg	n=N	Number of attribute changed requests seen | 
|  | ok=N	Number of attr changed requests queued | 
|  | nbf=N	Number of attr changed rejected -ENOBUFS | 
|  | oom=N	Number of attr changed failed -ENOMEM | 
|  | run=N	Number of attr changed ops given CPU time | 
|  | Allocs	n=N	Number of allocation requests seen | 
|  | ok=N	Number of successful alloc reqs | 
|  | wt=N	Number of alloc reqs that waited on lookup completion | 
|  | nbf=N	Number of alloc reqs rejected -ENOBUFS | 
|  | int=N	Number of alloc reqs aborted -ERESTARTSYS | 
|  | ops=N	Number of alloc reqs submitted | 
|  | owt=N	Number of alloc reqs waited for CPU time | 
|  | abt=N	Number of alloc reqs aborted due to object death | 
|  | Retrvls	n=N	Number of retrieval (read) requests seen | 
|  | ok=N	Number of successful retr reqs | 
|  | wt=N	Number of retr reqs that waited on lookup completion | 
|  | nod=N	Number of retr reqs returned -ENODATA | 
|  | nbf=N	Number of retr reqs rejected -ENOBUFS | 
|  | int=N	Number of retr reqs aborted -ERESTARTSYS | 
|  | oom=N	Number of retr reqs failed -ENOMEM | 
|  | ops=N	Number of retr reqs submitted | 
|  | owt=N	Number of retr reqs waited for CPU time | 
|  | abt=N	Number of retr reqs aborted due to object death | 
|  | Stores	n=N	Number of storage (write) requests seen | 
|  | ok=N	Number of successful store reqs | 
|  | agn=N	Number of store reqs on a page already pending storage | 
|  | nbf=N	Number of store reqs rejected -ENOBUFS | 
|  | oom=N	Number of store reqs failed -ENOMEM | 
|  | ops=N	Number of store reqs submitted | 
|  | run=N	Number of store reqs granted CPU time | 
|  | pgs=N	Number of pages given store req processing time | 
|  | rxd=N	Number of store reqs deleted from tracking tree | 
|  | olm=N	Number of store reqs over store limit | 
|  | VmScan	nos=N	Number of release reqs against pages with no pending store | 
|  | gon=N	Number of release reqs against pages stored by time lock granted | 
|  | bsy=N	Number of release reqs ignored due to in-progress store | 
|  | can=N	Number of page stores cancelled due to release req | 
|  | Ops	pend=N	Number of times async ops added to pending queues | 
|  | run=N	Number of times async ops given CPU time | 
|  | enq=N	Number of times async ops queued for processing | 
|  | can=N	Number of async ops cancelled | 
|  | rej=N	Number of async ops rejected due to object lookup/create failure | 
|  | ini=N	Number of async ops initialised | 
|  | dfr=N	Number of async ops queued for deferred release | 
|  | rel=N	Number of async ops released (should equal ini=N when idle) | 
|  | gc=N	Number of deferred-release async ops garbage collected | 
|  | CacheOp	alo=N	Number of in-progress alloc_object() cache ops | 
|  | luo=N	Number of in-progress lookup_object() cache ops | 
|  | luc=N	Number of in-progress lookup_complete() cache ops | 
|  | gro=N	Number of in-progress grab_object() cache ops | 
|  | upo=N	Number of in-progress update_object() cache ops | 
|  | dro=N	Number of in-progress drop_object() cache ops | 
|  | pto=N	Number of in-progress put_object() cache ops | 
|  | syn=N	Number of in-progress sync_cache() cache ops | 
|  | atc=N	Number of in-progress attr_changed() cache ops | 
|  | rap=N	Number of in-progress read_or_alloc_page() cache ops | 
|  | ras=N	Number of in-progress read_or_alloc_pages() cache ops | 
|  | alp=N	Number of in-progress allocate_page() cache ops | 
|  | als=N	Number of in-progress allocate_pages() cache ops | 
|  | wrp=N	Number of in-progress write_page() cache ops | 
|  | ucp=N	Number of in-progress uncache_page() cache ops | 
|  | dsp=N	Number of in-progress dissociate_pages() cache ops | 
|  | CacheEv	nsp=N	Number of object lookups/creations rejected due to lack of space | 
|  | stl=N	Number of stale objects deleted | 
|  | rtr=N	Number of objects retired when relinquished | 
|  | cul=N	Number of objects culled | 
|  |  | 
|  |  | 
|  | (*) /proc/fs/fscache/histogram | 
|  |  | 
|  | cat /proc/fs/fscache/histogram | 
|  | JIFS  SECS  OBJ INST  OP RUNS   OBJ RUNS  RETRV DLY RETRIEVLS | 
|  | ===== ===== ========= ========= ========= ========= ========= | 
|  |  | 
|  | This shows the breakdown of the number of times each amount of time | 
|  | between 0 jiffies and HZ-1 jiffies a variety of tasks took to run.  The | 
|  | columns are as follows: | 
|  |  | 
|  | COLUMN		TIME MEASUREMENT | 
|  | =======		======================================================= | 
|  | OBJ INST	Length of time to instantiate an object | 
|  | OP RUNS		Length of time a call to process an operation took | 
|  | OBJ RUNS	Length of time a call to process an object event took | 
|  | RETRV DLY	Time between an requesting a read and lookup completing | 
|  | RETRIEVLS	Time between beginning and end of a retrieval | 
|  |  | 
|  | Each row shows the number of events that took a particular range of times. | 
|  | Each step is 1 jiffy in size.  The JIFS column indicates the particular | 
|  | jiffy range covered, and the SECS field the equivalent number of seconds. | 
|  |  | 
|  |  | 
|  | =========== | 
|  | OBJECT LIST | 
|  | =========== | 
|  |  | 
|  | If CONFIG_FSCACHE_OBJECT_LIST is enabled, the FS-Cache facility will maintain a | 
|  | list of all the objects currently allocated and allow them to be viewed | 
|  | through: | 
|  |  | 
|  | /proc/fs/fscache/objects | 
|  |  | 
|  | This will look something like: | 
|  |  | 
|  | [root@andromeda ~]# head /proc/fs/fscache/objects | 
|  | OBJECT   PARENT   STAT CHLDN OPS OOP IPR EX READS EM EV F S | NETFS_COOKIE_DEF TY FL NETFS_DATA       OBJECT_KEY, AUX_DATA | 
|  | ======== ======== ==== ===== === === === == ===== == == = = | ================ == == ================ ================ | 
|  | 17e4b        2 ACTV     0   0   0   0  0     0 7b  4 0 0 | NFS.fh           DT  0 ffff88001dd82820 010006017edcf8bbc93b43298fdfbe71e50b57b13a172c0117f38472, e567634700000000000000000000000063f2404a000000000000000000000000c9030000000000000000000063f2404a | 
|  | 1693a        2 ACTV     0   0   0   0  0     0 7b  4 0 0 | NFS.fh           DT  0 ffff88002db23380 010006017edcf8bbc93b43298fdfbe71e50b57b1e0162c01a2df0ea6, 420ebc4a000000000000000000000000420ebc4a0000000000000000000000000e1801000000000000000000420ebc4a | 
|  |  | 
|  | where the first set of columns before the '|' describe the object: | 
|  |  | 
|  | COLUMN	DESCRIPTION | 
|  | =======	=============================================================== | 
|  | OBJECT	Object debugging ID (appears as OBJ%x in some debug messages) | 
|  | PARENT	Debugging ID of parent object | 
|  | STAT	Object state | 
|  | CHLDN	Number of child objects of this object | 
|  | OPS	Number of outstanding operations on this object | 
|  | OOP	Number of outstanding child object management operations | 
|  | IPR | 
|  | EX	Number of outstanding exclusive operations | 
|  | READS	Number of outstanding read operations | 
|  | EM	Object's event mask | 
|  | EV	Events raised on this object | 
|  | F	Object flags | 
|  | S	Object work item busy state mask (1:pending 2:running) | 
|  |  | 
|  | and the second set of columns describe the object's cookie, if present: | 
|  |  | 
|  | COLUMN		DESCRIPTION | 
|  | ===============	======================================================= | 
|  | NETFS_COOKIE_DEF Name of netfs cookie definition | 
|  | TY		Cookie type (IX - index, DT - data, hex - special) | 
|  | FL		Cookie flags | 
|  | NETFS_DATA	Netfs private data stored in the cookie | 
|  | OBJECT_KEY	Object key	} 1 column, with separating comma | 
|  | AUX_DATA	Object aux data	} presence may be configured | 
|  |  | 
|  | The data shown may be filtered by attaching the a key to an appropriate keyring | 
|  | before viewing the file.  Something like: | 
|  |  | 
|  | keyctl add user fscache:objlist <restrictions> @s | 
|  |  | 
|  | where <restrictions> are a selection of the following letters: | 
|  |  | 
|  | K	Show hexdump of object key (don't show if not given) | 
|  | A	Show hexdump of object aux data (don't show if not given) | 
|  |  | 
|  | and the following paired letters: | 
|  |  | 
|  | C	Show objects that have a cookie | 
|  | c	Show objects that don't have a cookie | 
|  | B	Show objects that are busy | 
|  | b	Show objects that aren't busy | 
|  | W	Show objects that have pending writes | 
|  | w	Show objects that don't have pending writes | 
|  | R	Show objects that have outstanding reads | 
|  | r	Show objects that don't have outstanding reads | 
|  | S	Show objects that have work queued | 
|  | s	Show objects that don't have work queued | 
|  |  | 
|  | If neither side of a letter pair is given, then both are implied.  For example: | 
|  |  | 
|  | keyctl add user fscache:objlist KB @s | 
|  |  | 
|  | shows objects that are busy, and lists their object keys, but does not dump | 
|  | their auxiliary data.  It also implies "CcWwRrSs", but as 'B' is given, 'b' is | 
|  | not implied. | 
|  |  | 
|  | By default all objects and all fields will be shown. | 
|  |  | 
|  |  | 
|  | ========= | 
|  | DEBUGGING | 
|  | ========= | 
|  |  | 
|  | If CONFIG_FSCACHE_DEBUG is enabled, the FS-Cache facility can have runtime | 
|  | debugging enabled by adjusting the value in: | 
|  |  | 
|  | /sys/module/fscache/parameters/debug | 
|  |  | 
|  | This is a bitmask of debugging streams to enable: | 
|  |  | 
|  | BIT	VALUE	STREAM				POINT | 
|  | =======	=======	===============================	======================= | 
|  | 0	1	Cache management		Function entry trace | 
|  | 1	2					Function exit trace | 
|  | 2	4					General | 
|  | 3	8	Cookie management		Function entry trace | 
|  | 4	16					Function exit trace | 
|  | 5	32					General | 
|  | 6	64	Page handling			Function entry trace | 
|  | 7	128					Function exit trace | 
|  | 8	256					General | 
|  | 9	512	Operation management		Function entry trace | 
|  | 10	1024					Function exit trace | 
|  | 11	2048					General | 
|  |  | 
|  | The appropriate set of values should be OR'd together and the result written to | 
|  | the control file.  For example: | 
|  |  | 
|  | echo $((1|8|64)) >/sys/module/fscache/parameters/debug | 
|  |  | 
|  | will turn on all function entry debugging. |