turbonfs/inc/fcsm.h (100 lines of code) (raw):

#ifndef __FCSM_H__ #define __FCSM_H__ #include <queue> #include "aznfsc.h" // Forward declarations. struct nfs_inode; struct nfs_client; struct rpc_task; namespace aznfsc { /** * This is the flush-commit state machine responsible for ensuring cached file * data is properly synced with the backend blob, with the following goals: * - Keep dirty/uncommitted data in check as per configured limits. * This is to make sure we flush dirty data and commit uncommitted data at * appropriate time so that it doesn't grow beyond configured limits and also * flushing and committing at regular intervals results in better utilization * of storage bandwidth. There are two sub goals when honoring local limits * for unstable writes: * - Flush in units of full block size. * - Flush multiple blocks in parallel to maximize storage throughput. * - Spread out storage IOs uniformly for better throughput. * - Keep global memory pressure in check as per configured limits. * Note that global memory pressure can force premature flush (resulting in * smaller blocks) and premature commit (resulting in more PBL calls), but * when in high memory pressure we have no option. * * This follows a declarative model where caller calls ensure_flush() and * ensure_commit() for telling the state machine to flush and commit at the * next opportune time. These two are the state machine event handlers called * by the fuse threads executing application writes. The other two event * handlers are called by libnfs threads when flush or commit completes, those * are: * - on_flush_complete() * - on_commit_complete() * * The state machine starts in idle state till one of the ensure handlers is * called. If the ensure handler finds out that it has to flush or commit and * the state machine is not running, it kicks off the state machine by sending * a flush/write or commit RPC request. Any other fuse thread which comes in * the meantime and has to flush or commit finds out that the state machine * is already running, so it just queues a flush/commit job (fctgt) expressing * the flush/commit goal. When the ongoing flush or commit completes the * callback (libnfs thread) checks the queue and finds out that more flush * and/or commit needs to be done so it issues the required flush/commit * request, keeping the state machine moving, till the flush/commit callback * finds no fctgt queued and then the state machine goes back to idle state, * to be kicked again when new ensure calls are made. * * Note: fcsm doesn't introduce new locks, various members of fcsm are * protected by flush_lock. */ #define FCSM_MAGIC *((const uint32_t *)"FCSM") class fcsm { public: const uint32_t magic = FCSM_MAGIC; /** * Initialize fcsm. * nfs_client is for convenience, nfs_inode identifies the target file. */ fcsm(struct nfs_client *_client, struct nfs_inode *_inode); /** * Ensure *all* dirty bytes are flushed or scheduled for flushing (if flush * or commit is already ongoing). * If the state machine is currently not running, it'll kick off the state * machine by calling sync_membufs(), else it'll add a new flush target to * ftgtq, which will be run by on_flush_complete() when the ongoing flush * completes. If task is non-null it is the frontend write task which * initiated this flush and in that case a blocking target will be added * which means the specified task will block till the requested target * completes, else a non-blocking target will be added which just requests * dirty bytes to be flushed w/o having the application wait. * * ensure_flush() provides the following guarantees: * - For stable write, it'll flush *all* cached dirty bytes starting from * the lowest offset. More dirty data can be added, but that may not * necessarily be flushed, only what is returned by * get_dirty_nonflushing_bcs_range() at the time of the call. * - For unstable write, it'll flush contiguous cached dirty bytes starting * from the lowest offset. If caller wants to flush *all* then they need * to pass 'flush_full_unstable'. If all dirty data is not contiguous, * then it'll involve switch to stable write. * - On completion of that flush: * - If 'task' is non-null, it will be completed. * - If 'done' is non-null, it will be set to true to signal completion. * Caller must wait for that in a loop. * Only one of 'task' and 'done' can be non-null. * * write_off and write_len describe the current application write call. * They are needed for logging when task is nullptr. * * Caller MUST hold the flush_lock. */ void ensure_flush(uint64_t write_off, uint64_t write_len, struct rpc_task *task = nullptr, std::atomic<bool> *done = nullptr, bool flush_full_unstable = false); /** * Ensure all or some commit-pending bytes are committed or scheduled for * commit (if a flush or commit is already ongoing). If not already flushed * data will be flushed before committing. Caller can pass 'commit_full' as * true to convey flush/commit *all dirty data*, else ensure_commit() will * decide how much to flush/commit based on heurustics and configuration. * If 'task' is null it'll add a non-blocking commit target to ctgtq, else * it'll add a blocking commit target for completing task when given commit * goal is met. The goal is decided by ensure_commit() based on configured * limits or if 'commit_full' is true it means caller wants entire dirty * data to be flushed and committed. * * Caller MUST hold the flush_lock. * * See ensure_flush() for more details. */ void ensure_commit(uint64_t write_off, uint64_t write_len, struct rpc_task *task = nullptr, std::atomic<bool> *done = nullptr, bool commit_full = false); /** * Callbacks to be called when flush/commit successfully complete. * These will update flushed_seq_num/committed_seq_num and run flush/commit * targets from ftgtq/ctgtq as appropriate. */ void on_flush_complete(uint64_t flush_bytes); void on_commit_complete(uint64_t commit_bytes); /** * Is the state machine currently running, i.e. it has sent (one or more) * flush requests or a commit request and is waiting for it to complete. * At any point only one flush (executed as one or more parallel write * calls for different blocks) or commit can be running. Once the currently * running flush/commit completes it checks ftgtq/ctgtq to see if it needs * to perform more flush/commit, if yes the state machine continues to run, * till it has no more targets to execute, at which point the state machine * is no more actively running. It's said to be "idling" and needs to be * "kicked" again if a new flush/commit is to be executed. * The stats machine starts in idle state. */ bool is_running() const { return running; } /** * Mark the state machine as "running". * This will be done by the fuse thread that calls one of the ensure methods * and finds the state machine as idling. It kicks off the state machine by * triggering the flush/commit and calls mark_running() to mark the state * machine as running. * clear_running() will be called by a libnfs callback thread that calls one * of the on_flush_complete()/on_commit_complete() callbacks and finds out * that there are no more flush/commit targets. */ void mark_running(); void clear_running(); /** * Nudge the flush-commit state machine. * After the fuse thread copies the application data into the cache, it * must call this to let FCSM know that some more dirty data has been * added. It checks the dirty data against the configured limits and * decides which of the following action to take: * - Do nothing, as dirty data is still within limits. * - Start flushing. * - Start committing. * - Flush/commit while blocking the task till flush/commit completes. */ void run(struct rpc_task *task, uint64_t extent_left, uint64_t extent_right); /** * Call when more writes are dispatched, or prepared to be dispatched. * These must correspond to membufs which are dirty and not already * flushing. * This MUST be called before the write_iov_callback() can be called, i.e., * before the actual write call is issued. */ void add_flushing(uint64_t bytes); struct nfs_inode *get_inode() const { return inode; } void fc_cb_enter() { ++in_fc_callback; } void fc_cb_exit() { assert(fc_cb_running()); --in_fc_callback; } uint64_t fc_cb_count() const { return in_fc_callback; } bool fc_cb_running() const { return (fc_cb_count() > 0); } /** * Call when more commit are dispatched, or prepared to be dispatched. * These must correspond to membufs which are already flushed, i.e., they * must not be dirty or flushing and must be commit pending. * This MUST be called before the commit_callback can be called, i.e., * before the actual commit call is issued. */ void add_committing(uint64_t bytes); /** * ctgtq_cleanup() is called when we switch to stable writes. * It clears up all the queued commit targets as stable writes will not * cause a commit and those targets would never normally complete. */ void ctgtq_cleanup(); void ftgtq_cleanup(); private: /* * The singleton nfs_client, for convenience. */ struct nfs_client *const client; /* * File inode for which we are tracking flush/commit. */ struct nfs_inode *const inode; /** * Flush/commit target. * One such target is added to fcsm::ctgtq/ctgtq by ensure_flush() or * ensure_commit() to request the flush/commit state machine to perform a * specific flush/commit operation when flushed_seq_num/committed_seq_num * match the target values. This also has an implied meaning that the * requestor wants appropriate flush/commit to be performed (by fcsm) to * reach those flush/commit goals. Note that this follows a declarative * syntax where fctgt specifies what flushed_seq_num/committed_seq_num it * wants to reach and then the state machine issues the appropriate * flush/commit requests to reach that goal. * If task is non-null it is the requestor frontend write task that wants * to be completed when the flush/commit seq goal is met. Such targets are * called blocking targets as they cause the requesting task to block till * the flush/commit goal is met. Non-blocking targets have task==nullptr. * Blocking targets are added to slow down the writer application while * non-blocking targets are added to "initiate" flush/commit while the * task would not wait for it to complete, more of a background activity * vs the inline nature of blocking targets. */ struct fctgt { fctgt(struct fcsm *fcsm, uint64_t _flush_seq, uint64_t _commit_seq, struct rpc_task *_task = nullptr, std::atomic<bool> *_done = nullptr, bool _commit_full = false); /* * Flush and commit targets (in terms of flushed_seq_num/committed_seq_num) * that this target wants to reach. */ const uint64_t flush_seq = 0; const uint64_t commit_seq = 0; /* * If non-null this is the frontend write task that must be completed * once the above target is reached. */ struct rpc_task *const task = nullptr; /* * If non-null, it's initial value must be false, and will be set to * true when the target completes. Caller will typically wait for it * to become true, in a loop. */ std::atomic<bool> *done = nullptr; /* * Pointer to the containing fcsm. */ struct fcsm *const fcsm = nullptr; /* * commit *all* dirty data. This means for unstable writes, if all * dirty data is not contiguous then we will switch to stable write. * This is true only when ensure_commit() is called from * flush_cache_and_wait(). */ bool commit_full = false; #if 0 /* * Has the required flush/commit task started? * Once triggered, then it just waits for the flush/commit callback * and once completed, if task is non-null, it'll complete the task. */ bool triggered = false; #endif }; /* * Queue of flush targets for the state machine. * These indicate how much needs to be flushed and what task to complete * when the requested flush target is met. */ std::queue<struct fctgt> ftgtq; /* * Queue of commit targets for the state machine. * These indicate how much needs to be committed and what task to complete * when the requested commit target is met. */ std::queue<struct fctgt> ctgtq; /* * Continually increasing seq number of the last byte successfully flushed * and committed. Flush-commit targets (fctgt) are expressed in terms of * these. These are actually numerically one more than the last * flushing/commiting and flushed/committed byte's seq number, e.g., if the * last byte flushed was byte #1 (0 and 1 are flushed), then flushed_seq_num * would be 2. * Note that these are *not* offsets in the file but these are cumulative * values for total bytes flushed/committed till now since the file was * opened. In case of overwrite same byte(s) may be repeatedly flushed * and/or committed so these can grow higher than AZNFSC_MAX_FILE_SIZE. */ std::atomic<uint64_t> flushed_seq_num = 0; std::atomic<uint64_t> flushing_seq_num = 0; std::atomic<uint64_t> committed_seq_num = 0; std::atomic<uint64_t> committing_seq_num = 0; std::atomic<uint64_t> in_fc_callback = 0; /* * The state machine starts in an idle state. */ std::atomic<bool> running = false; }; struct FC_CB_TRACKER { explicit FC_CB_TRACKER(struct nfs_inode *_inode); ~FC_CB_TRACKER(); private: struct nfs_inode *const inode; }; } #endif /* __FCSM_H__ */