| PROPER CARE AND FEEDING OF RETURN VALUES FROM rcu_dereference() | 
 |  | 
 | Most of the time, you can use values from rcu_dereference() or one of | 
 | the similar primitives without worries.  Dereferencing (prefix "*"), | 
 | field selection ("->"), assignment ("="), address-of ("&"), addition and | 
 | subtraction of constants, and casts all work quite naturally and safely. | 
 |  | 
 | It is nevertheless possible to get into trouble with other operations. | 
 | Follow these rules to keep your RCU code working properly: | 
 |  | 
 | o	You must use one of the rcu_dereference() family of primitives | 
 | 	to load an RCU-protected pointer, otherwise CONFIG_PROVE_RCU | 
 | 	will complain.  Worse yet, your code can see random memory-corruption | 
 | 	bugs due to games that compilers and DEC Alpha can play. | 
 | 	Without one of the rcu_dereference() primitives, compilers | 
 | 	can reload the value, and won't your code have fun with two | 
 | 	different values for a single pointer!  Without rcu_dereference(), | 
 | 	DEC Alpha can load a pointer, dereference that pointer, and | 
 | 	return data preceding initialization that preceded the store of | 
 | 	the pointer. | 
 |  | 
 | 	In addition, the volatile cast in rcu_dereference() prevents the | 
 | 	compiler from deducing the resulting pointer value.  Please see | 
 | 	the section entitled "EXAMPLE WHERE THE COMPILER KNOWS TOO MUCH" | 
 | 	for an example where the compiler can in fact deduce the exact | 
 | 	value of the pointer, and thus cause misordering. | 
 |  | 
 | o	Do not use single-element RCU-protected arrays.  The compiler | 
 | 	is within its right to assume that the value of an index into | 
 | 	such an array must necessarily evaluate to zero.  The compiler | 
 | 	could then substitute the constant zero for the computation, so | 
 | 	that the array index no longer depended on the value returned | 
 | 	by rcu_dereference().  If the array index no longer depends | 
 | 	on rcu_dereference(), then both the compiler and the CPU | 
 | 	are within their rights to order the array access before the | 
 | 	rcu_dereference(), which can cause the array access to return | 
 | 	garbage. | 
 |  | 
 | o	Avoid cancellation when using the "+" and "-" infix arithmetic | 
 | 	operators.  For example, for a given variable "x", avoid | 
 | 	"(x-x)".  There are similar arithmetic pitfalls from other | 
 | 	arithmetic operatiors, such as "(x*0)", "(x/(x+1))" or "(x%1)". | 
 | 	The compiler is within its rights to substitute zero for all of | 
 | 	these expressions, so that subsequent accesses no longer depend | 
 | 	on the rcu_dereference(), again possibly resulting in bugs due | 
 | 	to misordering. | 
 |  | 
 | 	Of course, if "p" is a pointer from rcu_dereference(), and "a" | 
 | 	and "b" are integers that happen to be equal, the expression | 
 | 	"p+a-b" is safe because its value still necessarily depends on | 
 | 	the rcu_dereference(), thus maintaining proper ordering. | 
 |  | 
 | o	Avoid all-zero operands to the bitwise "&" operator, and | 
 | 	similarly avoid all-ones operands to the bitwise "|" operator. | 
 | 	If the compiler is able to deduce the value of such operands, | 
 | 	it is within its rights to substitute the corresponding constant | 
 | 	for the bitwise operation.  Once again, this causes subsequent | 
 | 	accesses to no longer depend on the rcu_dereference(), causing | 
 | 	bugs due to misordering. | 
 |  | 
 | 	Please note that single-bit operands to bitwise "&" can also | 
 | 	be dangerous.  At this point, the compiler knows that the | 
 | 	resulting value can only take on one of two possible values. | 
 | 	Therefore, a very small amount of additional information will | 
 | 	allow the compiler to deduce the exact value, which again can | 
 | 	result in misordering. | 
 |  | 
 | o	If you are using RCU to protect JITed functions, so that the | 
 | 	"()" function-invocation operator is applied to a value obtained | 
 | 	(directly or indirectly) from rcu_dereference(), you may need to | 
 | 	interact directly with the hardware to flush instruction caches. | 
 | 	This issue arises on some systems when a newly JITed function is | 
 | 	using the same memory that was used by an earlier JITed function. | 
 |  | 
 | o	Do not use the results from the boolean "&&" and "||" when | 
 | 	dereferencing.	For example, the following (rather improbable) | 
 | 	code is buggy: | 
 |  | 
 | 		int a[2]; | 
 | 		int index; | 
 | 		int force_zero_index = 1; | 
 |  | 
 | 		... | 
 |  | 
 | 		r1 = rcu_dereference(i1) | 
 | 		r2 = a[r1 && force_zero_index];  /* BUGGY!!! */ | 
 |  | 
 | 	The reason this is buggy is that "&&" and "||" are often compiled | 
 | 	using branches.  While weak-memory machines such as ARM or PowerPC | 
 | 	do order stores after such branches, they can speculate loads, | 
 | 	which can result in misordering bugs. | 
 |  | 
 | o	Do not use the results from relational operators ("==", "!=", | 
 | 	">", ">=", "<", or "<=") when dereferencing.  For example, | 
 | 	the following (quite strange) code is buggy: | 
 |  | 
 | 		int a[2]; | 
 | 		int index; | 
 | 		int flip_index = 0; | 
 |  | 
 | 		... | 
 |  | 
 | 		r1 = rcu_dereference(i1) | 
 | 		r2 = a[r1 != flip_index];  /* BUGGY!!! */ | 
 |  | 
 | 	As before, the reason this is buggy is that relational operators | 
 | 	are often compiled using branches.  And as before, although | 
 | 	weak-memory machines such as ARM or PowerPC do order stores | 
 | 	after such branches, but can speculate loads, which can again | 
 | 	result in misordering bugs. | 
 |  | 
 | o	Be very careful about comparing pointers obtained from | 
 | 	rcu_dereference() against non-NULL values.  As Linus Torvalds | 
 | 	explained, if the two pointers are equal, the compiler could | 
 | 	substitute the pointer you are comparing against for the pointer | 
 | 	obtained from rcu_dereference().  For example: | 
 |  | 
 | 		p = rcu_dereference(gp); | 
 | 		if (p == &default_struct) | 
 | 			do_default(p->a); | 
 |  | 
 | 	Because the compiler now knows that the value of "p" is exactly | 
 | 	the address of the variable "default_struct", it is free to | 
 | 	transform this code into the following: | 
 |  | 
 | 		p = rcu_dereference(gp); | 
 | 		if (p == &default_struct) | 
 | 			do_default(default_struct.a); | 
 |  | 
 | 	On ARM and Power hardware, the load from "default_struct.a" | 
 | 	can now be speculated, such that it might happen before the | 
 | 	rcu_dereference().  This could result in bugs due to misordering. | 
 |  | 
 | 	However, comparisons are OK in the following cases: | 
 |  | 
 | 	o	The comparison was against the NULL pointer.  If the | 
 | 		compiler knows that the pointer is NULL, you had better | 
 | 		not be dereferencing it anyway.  If the comparison is | 
 | 		non-equal, the compiler is none the wiser.  Therefore, | 
 | 		it is safe to compare pointers from rcu_dereference() | 
 | 		against NULL pointers. | 
 |  | 
 | 	o	The pointer is never dereferenced after being compared. | 
 | 		Since there are no subsequent dereferences, the compiler | 
 | 		cannot use anything it learned from the comparison | 
 | 		to reorder the non-existent subsequent dereferences. | 
 | 		This sort of comparison occurs frequently when scanning | 
 | 		RCU-protected circular linked lists. | 
 |  | 
 | 	o	The comparison is against a pointer that references memory | 
 | 		that was initialized "a long time ago."  The reason | 
 | 		this is safe is that even if misordering occurs, the | 
 | 		misordering will not affect the accesses that follow | 
 | 		the comparison.  So exactly how long ago is "a long | 
 | 		time ago"?  Here are some possibilities: | 
 |  | 
 | 		o	Compile time. | 
 |  | 
 | 		o	Boot time. | 
 |  | 
 | 		o	Module-init time for module code. | 
 |  | 
 | 		o	Prior to kthread creation for kthread code. | 
 |  | 
 | 		o	During some prior acquisition of the lock that | 
 | 			we now hold. | 
 |  | 
 | 		o	Before mod_timer() time for a timer handler. | 
 |  | 
 | 		There are many other possibilities involving the Linux | 
 | 		kernel's wide array of primitives that cause code to | 
 | 		be invoked at a later time. | 
 |  | 
 | 	o	The pointer being compared against also came from | 
 | 		rcu_dereference().  In this case, both pointers depend | 
 | 		on one rcu_dereference() or another, so you get proper | 
 | 		ordering either way. | 
 |  | 
 | 		That said, this situation can make certain RCU usage | 
 | 		bugs more likely to happen.  Which can be a good thing, | 
 | 		at least if they happen during testing.  An example | 
 | 		of such an RCU usage bug is shown in the section titled | 
 | 		"EXAMPLE OF AMPLIFIED RCU-USAGE BUG". | 
 |  | 
 | 	o	All of the accesses following the comparison are stores, | 
 | 		so that a control dependency preserves the needed ordering. | 
 | 		That said, it is easy to get control dependencies wrong. | 
 | 		Please see the "CONTROL DEPENDENCIES" section of | 
 | 		Documentation/memory-barriers.txt for more details. | 
 |  | 
 | 	o	The pointers are not equal -and- the compiler does | 
 | 		not have enough information to deduce the value of the | 
 | 		pointer.  Note that the volatile cast in rcu_dereference() | 
 | 		will normally prevent the compiler from knowing too much. | 
 |  | 
 | o	Disable any value-speculation optimizations that your compiler | 
 | 	might provide, especially if you are making use of feedback-based | 
 | 	optimizations that take data collected from prior runs.  Such | 
 | 	value-speculation optimizations reorder operations by design. | 
 |  | 
 | 	There is one exception to this rule:  Value-speculation | 
 | 	optimizations that leverage the branch-prediction hardware are | 
 | 	safe on strongly ordered systems (such as x86), but not on weakly | 
 | 	ordered systems (such as ARM or Power).  Choose your compiler | 
 | 	command-line options wisely! | 
 |  | 
 |  | 
 | EXAMPLE OF AMPLIFIED RCU-USAGE BUG | 
 |  | 
 | Because updaters can run concurrently with RCU readers, RCU readers can | 
 | see stale and/or inconsistent values.  If RCU readers need fresh or | 
 | consistent values, which they sometimes do, they need to take proper | 
 | precautions.  To see this, consider the following code fragment: | 
 |  | 
 | 	struct foo { | 
 | 		int a; | 
 | 		int b; | 
 | 		int c; | 
 | 	}; | 
 | 	struct foo *gp1; | 
 | 	struct foo *gp2; | 
 |  | 
 | 	void updater(void) | 
 | 	{ | 
 | 		struct foo *p; | 
 |  | 
 | 		p = kmalloc(...); | 
 | 		if (p == NULL) | 
 | 			deal_with_it(); | 
 | 		p->a = 42;  /* Each field in its own cache line. */ | 
 | 		p->b = 43; | 
 | 		p->c = 44; | 
 | 		rcu_assign_pointer(gp1, p); | 
 | 		p->b = 143; | 
 | 		p->c = 144; | 
 | 		rcu_assign_pointer(gp2, p); | 
 | 	} | 
 |  | 
 | 	void reader(void) | 
 | 	{ | 
 | 		struct foo *p; | 
 | 		struct foo *q; | 
 | 		int r1, r2; | 
 |  | 
 | 		p = rcu_dereference(gp2); | 
 | 		if (p == NULL) | 
 | 			return; | 
 | 		r1 = p->b;  /* Guaranteed to get 143. */ | 
 | 		q = rcu_dereference(gp1);  /* Guaranteed non-NULL. */ | 
 | 		if (p == q) { | 
 | 			/* The compiler decides that q->c is same as p->c. */ | 
 | 			r2 = p->c; /* Could get 44 on weakly order system. */ | 
 | 		} | 
 | 		do_something_with(r1, r2); | 
 | 	} | 
 |  | 
 | You might be surprised that the outcome (r1 == 143 && r2 == 44) is possible, | 
 | but you should not be.  After all, the updater might have been invoked | 
 | a second time between the time reader() loaded into "r1" and the time | 
 | that it loaded into "r2".  The fact that this same result can occur due | 
 | to some reordering from the compiler and CPUs is beside the point. | 
 |  | 
 | But suppose that the reader needs a consistent view? | 
 |  | 
 | Then one approach is to use locking, for example, as follows: | 
 |  | 
 | 	struct foo { | 
 | 		int a; | 
 | 		int b; | 
 | 		int c; | 
 | 		spinlock_t lock; | 
 | 	}; | 
 | 	struct foo *gp1; | 
 | 	struct foo *gp2; | 
 |  | 
 | 	void updater(void) | 
 | 	{ | 
 | 		struct foo *p; | 
 |  | 
 | 		p = kmalloc(...); | 
 | 		if (p == NULL) | 
 | 			deal_with_it(); | 
 | 		spin_lock(&p->lock); | 
 | 		p->a = 42;  /* Each field in its own cache line. */ | 
 | 		p->b = 43; | 
 | 		p->c = 44; | 
 | 		spin_unlock(&p->lock); | 
 | 		rcu_assign_pointer(gp1, p); | 
 | 		spin_lock(&p->lock); | 
 | 		p->b = 143; | 
 | 		p->c = 144; | 
 | 		spin_unlock(&p->lock); | 
 | 		rcu_assign_pointer(gp2, p); | 
 | 	} | 
 |  | 
 | 	void reader(void) | 
 | 	{ | 
 | 		struct foo *p; | 
 | 		struct foo *q; | 
 | 		int r1, r2; | 
 |  | 
 | 		p = rcu_dereference(gp2); | 
 | 		if (p == NULL) | 
 | 			return; | 
 | 		spin_lock(&p->lock); | 
 | 		r1 = p->b;  /* Guaranteed to get 143. */ | 
 | 		q = rcu_dereference(gp1);  /* Guaranteed non-NULL. */ | 
 | 		if (p == q) { | 
 | 			/* The compiler decides that q->c is same as p->c. */ | 
 | 			r2 = p->c; /* Locking guarantees r2 == 144. */ | 
 | 		} | 
 | 		spin_unlock(&p->lock); | 
 | 		do_something_with(r1, r2); | 
 | 	} | 
 |  | 
 | As always, use the right tool for the job! | 
 |  | 
 |  | 
 | EXAMPLE WHERE THE COMPILER KNOWS TOO MUCH | 
 |  | 
 | If a pointer obtained from rcu_dereference() compares not-equal to some | 
 | other pointer, the compiler normally has no clue what the value of the | 
 | first pointer might be.  This lack of knowledge prevents the compiler | 
 | from carrying out optimizations that otherwise might destroy the ordering | 
 | guarantees that RCU depends on.  And the volatile cast in rcu_dereference() | 
 | should prevent the compiler from guessing the value. | 
 |  | 
 | But without rcu_dereference(), the compiler knows more than you might | 
 | expect.  Consider the following code fragment: | 
 |  | 
 | 	struct foo { | 
 | 		int a; | 
 | 		int b; | 
 | 	}; | 
 | 	static struct foo variable1; | 
 | 	static struct foo variable2; | 
 | 	static struct foo *gp = &variable1; | 
 |  | 
 | 	void updater(void) | 
 | 	{ | 
 | 		initialize_foo(&variable2); | 
 | 		rcu_assign_pointer(gp, &variable2); | 
 | 		/* | 
 | 		 * The above is the only store to gp in this translation unit, | 
 | 		 * and the address of gp is not exported in any way. | 
 | 		 */ | 
 | 	} | 
 |  | 
 | 	int reader(void) | 
 | 	{ | 
 | 		struct foo *p; | 
 |  | 
 | 		p = gp; | 
 | 		barrier(); | 
 | 		if (p == &variable1) | 
 | 			return p->a; /* Must be variable1.a. */ | 
 | 		else | 
 | 			return p->b; /* Must be variable2.b. */ | 
 | 	} | 
 |  | 
 | Because the compiler can see all stores to "gp", it knows that the only | 
 | possible values of "gp" are "variable1" on the one hand and "variable2" | 
 | on the other.  The comparison in reader() therefore tells the compiler | 
 | the exact value of "p" even in the not-equals case.  This allows the | 
 | compiler to make the return values independent of the load from "gp", | 
 | in turn destroying the ordering between this load and the loads of the | 
 | return values.  This can result in "p->b" returning pre-initialization | 
 | garbage values. | 
 |  | 
 | In short, rcu_dereference() is -not- optional when you are going to | 
 | dereference the resulting pointer. |