Acta Mathematica .4cademiae Scientiarum Hungaricae Tomus 20 (1--2), (1969), pp. 89--104. O N SETS OF I N T E G E R S C O N T A I N I N G N O F O U R ELEMENTS IN ARITHMETIC PROGRESSION By E. SZEMERI~DI (Budapest) In what follows we use capital letters to denote sequences of integers, A + B to denote the sum of two sets of integers formed elementwise, and A-q B to denote the complement of the set B with respect to the set A. Let us for convenience call an arithmetic progression of k (distinct) terms a k-progression. I f a set A contains no k-progression we say that A is k-free. The maximal number of elements a k-free set A ~ [0, n) can have is denoted by Zk(n). Furthermore we set ~k = i ~ n~eo ~(n) n Actually we can replace lim on the right hand side by lim. For, given e > 0 and n, we can find arbitrarily large m so that Zk(m)>=(Tk--*)m; in particular we m a y assume that qn<m<=(q + 1)n holds for a positive integer q. In other words there is a k-free set A c= [0, m) with cardinality IAI-->(~k-~)m. Now [0, m) can be split into (q + 1) subintervals of length at most n. One of these must contain at least {q~) [Al elements of A which clearlyform a k-free set. Hence *k(n) =~ ' [ ~ - + 1 } far--> ( ~ k - ~ ) q+lm -> q n. ( ~ - ~)-~-i- Since e can be taken arbitrarily small and q arbitrarily large, we have *k(n) >= ~'k n , whence 7k = lim zk(n): n < 1 Clearly 7k = 1 - - ~ - , and 7a<_-Ta<_ - . . . . It has been proved by F. BEHREND* that either all 7k are zero, or 7k ~ 1 as k - ~ ~o. * On sequences of integers containing no arithmetic progression, ~asopis Mat. Fis. Praha, 67 (1938), pp. 235--239. Aeta Mathematica Academiae Scientiarum Hungarlcae 20, z969 90 E. SZEMER~DI In 1953 ROTH* proved that 73 = 0 . In fact he proved more than that, namely n z a (n) << log log n " 9Roth's p r o o f uses estimates of exponential sums. In this paper we shall prove the following THEOREM. 74=0, i.e. r4(n)=o(n). The p r o o f is elementary. The problem of 7s, 76 . . . . is left open. The p r o o f is indirect, so from now on we assume that 74>0. For convenience we write 7=~4 9 We shall formulate in this section the two main lemmas and deduce the theorem from them. We write Q(b, c, d, e) for the system b-2c+d = c-2d+e = O, which means that either b, c, d, e form an arithmetic progression, or they are identical. Throughout the paper n~(8) shall mean a number (for example the smallest one) with the property that f o r n ->n4(e) a 4-free set A c=[0, n) cannot contain more than (7 +e)n elements. Occasionally we use the analogue meaning for na(a ) as well. Let B, C, D ~=[0, q). We regard B and C as fixed while D varies. We then define D* = {e; eE[0, q) and there are bEB, cEC, dED such that Q(b, c, d, e)}. With this notation we shall prove LEMMA (Ho, ..., H,).** There are absolute constants % > 0 , 7 ' > 0 , k o and qo with the following property: I f q>-qo, 3[q, and if B, C are 4-free sets contained in [0, q), [BI-->(7-zo)q, [el->(7 -~o)q, then there are disjoint sets Ho, .... Hk, k<=ko, such that 6 HK=[lq, 2q], K=O 1 IHol-<--ff7q; IH~l -> ~' q for K=l,2,...,k, * On certain sets of integers. I; II, J. Lond. Math. Soc., 28 (1953), pp. 104--109; 29 (1953), pp. 20--26. ** The full force of the hypothesis that (say) C is 4-free is not needed for the proof of this lemma: see the footnote on page 95. Acta Mathematlca Academlae Sclentlarum Hungaricae 20, z969 ON SETS OF INTEGERS 91 and such that if for some K # O G~HK, then ]G[ => 21--~ IHK], Ill IO*l ~ I--~7 IH;I. The other main lemma is LEMMABCDE. Let e1E(0, 7) u and qo be given. Then there is a q>=qo and there are sets Bo, Co, 31 ..... Du, E1 ..... Eu~ [0, q), all 4-free, all with at least ( 7 - el)q elements, such that Q(b, c, d, e) with b E Bo, cECo, dE Di, eE Ei is insolvable for all i= 1, ..., u, and such that for each xE[0, q) the set of all i' s for which x EEi holds is 4-free. We now prove the theorem using these two lemmas. Let Co, 7 and ko have the meaning of lemma (H o ..... Hk). Put e~ = rain eo, 20 ' " and t--n4(el). Van der Waerden's Theorem* gives a number u = N(k o , t) such that in any partition of [0, u) into at most ko classes there is at least one class which contains a t-progresssion. We apply lemma BCDE with this el, and u, and with qo = 3n ~(el)1 From [Dd --->(7-~l)q, ~- q-->n4(~l) we see that D~(~[3q, 2 q I = ]D~[- D~O[O,I} -- D ~ A [ 2 q , q] >= => (y--el)q--2(7+e,) 1 q ~ ( 7 - - 5 ~ 0 1 q . We now define the sets HK by lemma (Ho ..... Hk), using Bo, Co for B, C respectively. For each i E (0, u] there is a j =j(i) E (0, k] such that 1 ]D, A nj[ ~ ~ 7 ]Hjl. * Beweis einer Baudetschen Vermutung, N i e n n . A r c h . Wiskunde, 15 (1927), pp. 212--216. Acta Matbematlca Academlae Sclentiarum Hungaricae 20, i969 92 E. SZEMEREDI For otherwise we should get the contradiction 1 ' ( ? - 5~1) T q <- q' [Ho[+I~s~= , q = ~ j=O IDi G Hsl < 1 ~ 7 +[1111 ~ T q < - (V-5a~)~-q 1 since e~ <=~ ?. Attaching such a j(i) to each i, it gives a partition of the i's into k classes. Since u = N(ko, t) and k <=ko one of these classes contains a t-progression. In other words, there is a Jo and an arithmetic progression il ..... it such that ID~ N//So I for i = i,, it. F r o m lemma (Ho . . . . . Hk) we then have that I(DiNHso)* 1 => I-T7 IHfol where the * is taken with respect to B o and Co. With the trivial relatior~ ( U N V)*c= U* N V* this implies that {'/ ID*NH*ol--> 1-?-~ IH*ol. Now D* N E~ = 0, for this is merely a restatement of the fact that the relations Q(b, c, d, e) with bE Bo, cE Co, dE Di, eE E~ are impossible. Hence IEiNH*oI+ID*NH*oI <= [H*o], so that IE~nH~ol < 1 for i = i l , ..., it. Put [0, q ) - H * o = M . IHfol=~.q, We notice that M is not empty, since otherwise the last inequality would imply that [Eil<_-897 q, in contradiction with the fact that [Ei[ >= (7 -- 81) q >= [ ? ' 2 ~ ' 1 q. Furthermore, lemma ( H o . . . . . Hk) shows that ~-->V'. Therefore 1 _ [MI _> q - IH*o] ] 1 --y+____> - 1 - c~ , ] Acta Matbematica Academiae Sclentiarltm ,Hungarica~ zb, ~96~ , 1 - c, .93 O N SETS OF I N T E G E R S for i = i I . . . . . it. Summing over these i's we see that t [E~ (~M[ >= (7+281)tiM 1. "C=I We conclude that there is at least one x CM which occurs in not less than (7 +251)t of the sets Ei. By lemma BCDE those i~'s for which xCE~ form a 4-free set. They are contained 5n an arithmetic progression of t terms and by the choice of t =n~(el), there cannot be more than (7 +el) t numbers i~ for which xCE~. Thus we have reached a contradiction and the theorem is proved. In this section we shall prove lemma (Ho . . . . ;H~). For this we need three other lemmas. The first is almost obvious. We call it therefore THE SIMPLE LEMMA. Let A~=[0, n) be 4-fi'ee and IAI >-(7-5)n. Let MC=[0, n) have a complement that is the union of disjoint arithmetic progressions Po, ~ = 1..... r each of length ]P~[>-n4(5"). Then we have [A NM[ ~ 7]MI-(e+5")n. PROOF. Each A 0 Pe as a 4-free subset of a progression fulfils IA n~~ =(7 +5 t )le~I. Hence we have the following inequalities: [ANM[ = I A I - Z I ANPel --> ( 7 - e ) n - ( v + 5 ' ) Z l P ~ [ Q = 0 -- (7 - 5 ) n - (7 § 5') (n - IM1) -- (7 + 59 !M] - (5 + 5')n -> 7 ]MI - (5 + e')n. LEMMA p(6, l). For any real 6 E (0, 1) and any natural number l there exists a number p(6, l) with the following property." I f u>=p(6,1), a~[O,u), Ial>=6u, ,then G contains a set St of the form st = {y} + {0, x~} + . . . + {0, x~} with natural numbers xl ..... xt. PROOF. The proof goes by complete induction and uses the box principle. The case l = 1 is trivial, since it states only that there is a pair of elements of G. [1] A suitable choice of p(6, 1) is 1 + ~ - since this exceeds ~- so that the hypothesis concerning G shows that I~1 => 6u > 1 . Nowtake l_->2 and assume the case l q=p 1 has been already proved. We set , l-- 11 . Any number u can be represented as u = kq+r, 0 <=r < q . Acta Mathematica Academiae Scientiarum Hungaricae 20, r969 94 ~. SZEMERt~DI We choose p(6, l) so that u_~p(6, l) implies that 4 5 1)t_ , k > 5-~, ~ k > ( q - . A possible choice is, for example p(6,1)=max[[l+~--~-]q, [1 + 2 ] q ~ ) . Let R be the number of those sets GK=GO[(K--1)q, Kq], for which lUll ~ - q . Then R _-> K=I ..... k k, otherwise " k 6 6kq <=6u <= Ial ~ q + ~ ' IGK[ ~ (1 + R ) q + ( k - R ) - ~ q = K=I =[1---~}Rq+[l+k--f2}q<[1--~]~kq+[1+k---~}q= = fikq-I~,l}q<6kq' By the introduction hypothesis, in each of the sets Gr a set of the type Sz_ 1 can be found. In each St-1 we have 1<=xl, ..., xt_l<=q-1. Thus there are not more 6 > ( q - 1)z_ 1 there are than ( q - 1)1-1 diffeient choices of xl . . . . , xz. Since R>=~k two sets GK containing St_ 1 and Sj_ 1 formed with the same numbers xa, ..., xt but different y, y', say with y' > y . Then with xt =y" - y we have G ~ St-1 U St_ 1 = St-1 U(St_I + xt) = St. LEMMA ]G*[, There are absolute constants Co>0 and 7 ' > 0 and a function go(h) for 0 < 3 < 1 with the following property: I f q>=qo(6), 81q, B, C~[0, q) are both 4-free, IB1 --> ( V - s o ) q , then I C l --> ( ? - s o ) q , G ~ q, ~ q IGI _-> , 9 ]G*[ >=v"q. REMARK. An analogous lemma can be similarly proved with ~ = 73 (instead of ? = 7 4 ) on the assumption that 73 >0. We then easily arrive at a contradiction, which proves Roth's theorem 73 =0. For this purpose choose a q~3nz(e ). Next choose a 3-free set A ~ [0, 3q) with IAI =>37q and represent it as A = B U ( C + q ) U(D+2q) Acla Malhemalica Academlae Scie*atiarum Hungaric~e zo, i969 ON SETS OF INTEGERS 95 with B, C, D ~ [0, q); and finally set G = D f q ~-q, ~-q . One easily obtains the inequalities ]BI =>(7--2e)g, IC] =>(7-2~)q, ]GI =>(7-8s) q 1 1 If we take e ~ ~- eo, e <=-j~ 7 and q large enough, we can apply the lemma with 6 = 1 ? and get ]G*I->7'q>0 which means that there is a triplet (b, c, d) with b-2c+d = O. But (b, c + q, d + 2q) is then a 3-progression in A, a set that was supposed to be 3-free. PROOF OF LEMMA IG*X.S e t 1 2 eo = 1 0 0 7 , m=n,(eo), m and fix an l such that l ~ 2 4 - - , say ? W e shall prove the lemma with ~2 qo(6) = 3p(6, l) + 3m, 7= 50.2 t" With these choices we have q >=p(6, l) and can therefore find a set of type St in G. We consider s,= {y}+ {o, x l } + ... + {o, x,} for all i = 0 , 1, ..., l; where we take S o = {y}. For each i we define L i= 2c-s; cCCO ~q,~-q , sCSi . Since S i ~ ~ q, ~- q one has L ~ [0, q). 1 With [C 1_->(7-%)q and ~ q > m = n~(eo) we obtain [Lo/ C ~-q, q => ( ? - 5 e o ) ~ = ~ v q , * 1 since 5eo< ~- 7* The derivation of this inequality is the only extent to which we use the hypothesis that C is 4-free. Acta 3/lathematlca Academlae Scientiarum Hungaricae 2o, !969 E. SZEMEREDI From the fact that ILt[<--q and LoC=L~C=... we infer that there is some i<=l such that [Z~l- IZ~-~] < q: l " We decompose this L~-I into maximal progression (rood x~). We shall denote by L t h e union of those of these progressions which have 3m or more elements, and by ~ the union of the remaining ones. From S~ = S~_~ U(S~_~ +x3 one sees that Li = L~_1 U (L~_1 - xi). Each maximal progression (mod xi) in L~_ 1 produces therefore one and only one new element in L,. Hence IL[ < 3m(lL~]- ]Li_~]) < 3m q and ILl = [ Z ' - l l - IEl --> lZ~ IEl --> t q~-8~q since by our choice of l we have l ~ 24m. 7 Now let us drop m elements from each end of each of the progressions (mod x~ composing E, and denote the remaining set by M. Since every progression in L has a length of at least 3rn we have 1 [MI => ~-ILl > ~ q. By construction [0, q ) - M can be represented as the union of disjoint progressions (rood x~) each of length at least m. Thus we can apply the Simple Lemma with e = d = eo and obtain 72 y2 1L~nBl ~ I~nBI ~ IMnBI ~ ~[M[--2eoq ~ ~ q--28oq ~ ~ q , since eo has been chosen suitably. By definition, L~ A B is the set of those b in B which have a representation In S~ there are at most 2~ elements. Therefore at least one y contained in S~ has the property that the equation b-2c+ y = 0 2 has at least ~ solutions (b, r In another notation this means that I{Y}*t->~'q, Acre Matbematica Academlae $cientlarum Hungarlcae zo, z969 97 ON SETS OF INTEGERS where we have put ]j2 50.2 Z The statement of lemma [G*[ is now immediate. F r o m y E St ~ G we see that Ia*l->I{Y}*[ =>~'q. PROOF OF LEMMA(H0 ..... Hk). We first fix some number h such that 1 - < ?', for example log 7' 1 2 ) 1 We now start from some G 0 ~ ~-q, ~-q with [Go[-->]~7q and put go = [G*[. Next we define by recursion for i = 1, ..., h r~ = {~ ',} , G c G ~ - I , IGI --> ~ - [ a ~ - i -~- , g~ =m,nO* G6rl and fixe one Gi in Fi for which IG*[ : g ~ . From G~E F~ we see that IGI >= ~~ I a i-1 I >. . . . . > o >= [2) 12 ' 1 (~h+l [G~I~ [ ~ J q, Thus, if we take 6 = ~ - and qo = qo(6) we can apply lemma [G*I for all q ->qo and obtain gi=IG*l >--?'q, for i = 1 , 2 . . . . . h. Since clearly go ~- q there is a j <=h such that otherwise we should have the contradiction ?'q~=gh< 1-7 go<-- 1--~ q<?'q. Acta Mathematica Academiar Scientiarum Hungaricae zo, I969 98 E. SZEMERI~DI j H = Gs_ ~. From the meaning of gs IGI~ 2 ]HI, then GCFj and therefore Set with this G~H, and = gJ -- and gs- ~ it follows that if gJ- Moreover we have [HI = ]Gs-I[ => ~ - [ ~ J q. At first we apply this process to G o = ~- q, ~ q and call the set H obtained H 1 . Then we take Go = ~-q, ~-q 7 H, and if this set contains at least 7q elements we obtain a s e t / / 2 from it. Next we take Go = 3 q' 3- q q (H~ U Hz) to get a set Hs, and so on. As soon as we are left with [lq, 2q}-I(H1UH2(3...OHk) < ~ 2 q we stop the procedure and call this remaining set Ho. Since the sets HK are obviously disjoint and IHKI_-> [ j q for K=l,2,...,k this occurs certainly after a finite number of steps. To be precise, we see that k<=..~ By {~_1[72_/h+lq}-1=2[2/h+l. constructionHoUH1U...UHk= Io*l for all 2, ..., q,~q and if GC=HK, IG[=~IH~I then This is precisely the statement of lemma (Ho, ..., H~). and PROOFOFLl~MMABCDE.Letustakenandqtobeintegersso let A be a 4-free set contained in [0, 4nq) which satisfies IAI -->74nq. Then we can decompose A into with B, C, D, E ~ A = B (J (C+nq) U (D +2nq) (5 ( E + 3nq) [0, nq) and (in an obvious notation) B = U ( B + x q ) with B~=C[0,q), x<n Acta Mathematica Acaderaiae Scientiarum Hungaricete 20, 1969 that nq>=6n4(3} 99 ON SETS OF INTEGERS similarly for C, D, E. For their respective cardinalities we get easily the estimates 1BI, ICI, Iol, That A is 4-free is reflected in the fact that Q(b, c, d, e) has no solutions with bEB, cEC, d~D, eEE. More precisely: If Q(x,y, z, w) holds, then Q(b, c,d,e) is insolvable with b CB~, c E Cy, d ~ D~, e CEw. Moreover all of the sets B~, Cr, D,, Ew are 4-free. Let us call a set B etc. c= [0, q) full if ]B[ _->(~-/31)q, and poor otherwise. Clearly lemma BCDE will be proved if we can show that there are u quadruples (b, c, d, e) such that all Bb'S are equal, all Cc's are equal, all Bb's , Co's, Dd'S, Ee's are full, and the e's form an arithmetic progression. We shall use all the ideas from the proof of lemma [G*[ but not only these; moreover the technique will be more involved. We can easily provide a set ~3 with positive density (about 2-q) such that all Bb for b ~ ~3 are equal and full. Similarly we find a dense set E with all Ce for c E equal and full. We have then a set of type Se in ~ through which we 'project' ~3 onto the levels of D and E. The points e defined by Q(b, s, ~, e) are plentiful and are arranged into long progressions. Hence it can be shown that almost all E~ with these e's are full. The same could be done for the sets De with d from Q(b, s, d, ~) but unfortunately not in the necessary simultaneous way, since the relation between the e's and the d's is not unique and this relationship weakens the larger I is taken. The idea which overcomes this difficulty is to use not only one set ~, but a large number of them, go, g i ..... ~r-~ generated from one of them by shifting go = go + ~, such that C~ = C~,, if c and c' belong to the same set EQ. This again introduces long progressions on the levels of D and E, which can be exploited independently of the former ones. As a result we get u quadruples of the required type for at least one ~ with b E ~3 and all C E go, and so all B b as well as C~ coincide. We shall use the following simple counting argument a couple of times: If n ~ a ~ >=(~-ea)n and a ~ ( 7 a~=l satisfy a~<=(y-eO is +/32)for all x, then the number R of terms a~ which R<= /32 q- e3 /31 PROOF. (7--e3)n<=(7-e~)R+(7+~2)(n--R), (el +e2)R<=(e2 +e3)n. We list n o w the parameters used in the proof, in the order of their dependence. The reader may check them as they occur. e, u and qo are supposed to be given, /32- e1 16U' q = max (qo, n4(e2)), /32 /33 - 150.2q' m = max (2u, n4(e3)), 7* l = 75m- 2 q, e4- 600.2q+2 z, r = rt4(e4) , /3 = sufficiently small n = sufficiently large, 6r]n. Acta Mathematica Academiae Scientiarum Hungaricae zo, z969 100 B. SZEMERI~DI We can safely dispense with specifying e and n since there is no feedback to the other parameters. A small ~ only demands a large n. By an already repeatedly used argument we get ~'lBx] = BA[O,lnq I >(V-e)~ X<-6 Z [C l n 11 ~-~y<~- : cn[7' 7)--> provided only that n is large enough. We set ez = ~ ,~t (V-e) nq6 and take q >=n~(~z),so that we then have for all x, y, z, w, [B I, It, I, IDA, IE [ <= (Y+ez)q. Bx,O ~ = x < ~n- and the By the above counting argument the number of poor number of poor Cy, --<-6 -y<-3 is each at most 1 - ~ + 6-<=8-'--6 if~ is small enough. Consequently more than half of the Bx are full. There are only 2 q subsets of (0, so there is a full B(o) = (0, q) such that q), Bb=B(o ) for bE~ 0, , with 1 ~ t = > 1 2 . 2 q. We next look at C, and assuming that rln we consider the r-tuples n (Cm,, C,,,+1 . . . . . Cmr+r-1), n ~ <= m < 3r" Since not more than 1 of the Cj are poor, not more than 1 of the r-tuples contain more than 1 poor sets. There are only 2 qr different r-tuples, so we find C(o), .... C(r- x), not more than ~- of them being poor, and Cc+~=C(e) for ~c n = 6 ' so that t/ cE~ and oE[O,r), [El > 12r.2 ~r By lemma p(3,/) we see that E contains a subset of type S, = {y} + {0, xi} + . . . + {0, x,}. With the sets s,= (y} + (o, xl} +... +{0, x3 we form L~= { 3 5 - 2 b ; sE Si, bE~}. Acta Matbematica Academ~ae Sdentlamra Hungar~cae zoi z969 i 101 ON SETS OF INTEGERS Then we have ti C = , rl , = o >" = 12.2q ' Li = Li- 1 U (Li- 1 + 3xi)" For a suitable i<=l we have r/ IZ,l-[r,_~[ -<_ 7 " We decompose L~_, into maximal progression (mod 3x3, collect those progressions which are longer than 3m into L, and the remaining ones into 7,; as in the proof of lemma IG*] we get ILl <= 3m(IZ,l--lZ,- ,l) <= ]L'I-> I L ~ ]~l ~ {.11 272q 3mn l ' 3m) n l " n~25.2q. (Here we have taken l ~72m. 2q). Dropping the first m and the last m elements of each of the progressions collected into L, we obtain a set we shall call g. Then 1 . n lel ~ T IEI > 75.2q and [0, n)7 g is the union of disjoint progressions (rood 3xi), none of which contains fewer than m elements. If we start from &+OC=g+O instead of &, O<=o<r we get ~ + 3 0 instead of E. Thus the complement of ~ + 3 0 too is composed of disjoint progressions, each of length not less than m. We now show that if m is large enough then almost all E~ with e E C (or # + 30) are full. In particular we show that the following conditions are sufficient: m => n,(g3) , ~2 where % = 150.2q " The set M = U [eq, (e + 1)q) eEg has the property of the set M in the Simple Lemma. (The progressions have the modulus 3qxf and are each of length at least m; d =g3). Therefore Z [Eel = [EAM} >=vIMl --(g+%)qn = Yqlel--(e+~a)qn -> eEg >=eqIe1-2%qn =>(y - 150"2%a)ql•/ = (~ -e2)qle 1. Since lee[ =<(~ +e2)q for all e, the 'counting argument' applies, showing that the number of poor Ee, e <E is at most 2gz 1 e, [#l = g h - l g l 9 Aeta Maeberaatlca Ac~demiae Sclentlarum Hungaricae zo, z9@ 102 E. SZEMEP,~DI More generally, for each O = 0 . . . . . r - 1 1 there are a t most 8uu 18[ poor sets Ee+30, e E& Each e E g by construction occurs in at least one quadruple (b, s, d, e) with b E ~3 and s E Sl. To each e E 8 we attach one such quadruple making the d, as well as the b and the s, a function of e, d = q~(e). Let N = {~o(e); eEg}. Since S1 has at most 21 elements any particular d in D can arise as a value qo(e) at most 2 ~ times. We consider the quadruples (b, s + O, ~o(e)+ 20, e +30), e E S , oE[O, r). We want now to show that for at least one 0 Cs+0 is full (independent of e since Cs+o= C(~)), and almost all Do(e) + 20 are full (counted with multiplicity). We do this by considering all the O together. The basic tool is again the Simple Lemma. Before applying it, however, we have to remove the multiplicities with which the Cg(e)+ 0 occur. There are two sources of multiplicity: the m~pping ~o(e) = d , and the forming of the sum d + ~. We deal first with the case when ~o is one to one, where only one of these sources is present. Set / ' ~' -- d ; d E ~ , Z [Da+2.ol <- (7-~2)qr } 9 0=0 We construct a subset N " c=N, with the property that consecutive elements have a difference of at least 4r, but 1 , lYl --> ~ 19 I. For this purpose we may go from left to right retaining for our set ~ " the first element not ruled out by the restriction upon the differences. Since we exclude at most 4 r - 1 elements for each one which we keep we obtain the stated inequality. Now, each element in ~ " = ~ " + {0, 2, 4 . . . . . 2(r - 1)} is uniquely represented. Therefore we have r-J- I U Dxl = Z x E ~ ~' Z IDa+2el <= (7-ez)rq [@"l. dE~"~=0 By construction the complement of ~ " consists of progressions (mod 2), each of length at least r. (No difficulty arises when considering elements to the left of the first and to the right of the last elements in N " , respectively, since N + 2~ c_ 6- n, n . Therefore the left hand side can be estimated by the Simple Lemma. Acta Matbematica Academiae Sclentiarum Hungaricae 2o~ z969 103 ON SETS OF INTEGERS We take M= U [xq,(x+l)q)~ ~~ r=n4(54), e4 <- ~2z 600.2 ~ and obtain I Y Dx] = I D A M I >= y q l ~ " l - - ( ~ + ~ 4 ) q n = xE ~" -~ rqr 1@"1- (~ + 54)qn = > yqr ]~"1 - z54qn. Putting these estimates together gives r e r l Y l <= 254n, [~'1 <- 4r[Yi <= 8 5 ~ n . Next we have the estimate r--1 r--1 Z Z IDd+2~l~ Z (*) --> ( l ~ l - [ ~ ' l ) ( ~ - ~ 2 ) r q -> Z [Dd+2ol-> (:'-~2)[ l~l-854nlrq'e2 ) n In the present special case we have [~]=18]->75.2~ " We therefore get the further inequality Z Z a~e=o [Dd+20] ~ ( 7 - - 8 2 ) [1-8"75 "2q• / e2) r q l ~ [ >= -~ (y - 5 2 ) ( 1 - ez)rq [@1 >- (~ - 2e2)rq 1@[. By the 'counting argument' we infer that not more than 3 e 2 r l D ] = . 3 r 51 IOU [8[ sets Da+2o, taken with their multiplicity, are poor. For at most one half of the O's 3 can we have more than [E[ p o o r sets Dd+2o. 1 If we drop these numbers 0, of which there at most ~ r, and also those 0 for which C(.o) is poor, there being no more than 1 r of them, some of the numbers e remain. So far we have proved: 3 There is a number oE[0, r) such that Cs+o=C(o) is full, at most ~u 18] of 1 the sets D~(o+2o , e E 8 are poor, and at most 8uu lel of the sets E~+3o are poor. 1 Hence for at most 2u-u]81 elements e E C we have either E~+so or D~(~)+: o poor. We call these e E 8 'bad'. The density of the bad elements in g is at most 1 N o w recall that ~ is composed of disjoint arithmetic progressions of length at least m. Acta Mathematica Academiae Scientiarum Hungaricae 20, ~969 104 E. SZEMEREDI: ON SETS OF INTEGERS W e c a n take m =>2u. If one of every u consecutive elements of such a progression were a bad one, the density of bad elements in any particular progression in g would be at least 2 2 3u - 1 3u and so therefore would be the density of bad elements in the whole of g. Since we have disproved this there exists an arithmetic progression of at least u good elements in g, q.e.d. Rather little has to be changed in the general case when the elements d E ~ are taken with the multiplicities of d = cp(e) not necessarily all equal to one. Set @i = {d; d = cp(e) for exactly i elements e E g } , Each N~ can be treated in exactly the same way that N was until we reach the formula ( ~ ) . However, in order to make the formula useful this time we must take a smaller e4 (and therefore a larger r): ~2 /3'$ -- 600.2 q+2t ' We have then ,._1 .~' ~ ' ]Dd+2~l => (7-e2) r = n4(e4). [[Nil- S e 4 n I rq. dE~io=O ~2 J Multiplying by i and summing gives ~' ID~(e)+2el=> ( 7 - e z ) eEg 0 = 0 Ig[ =8/34n i rq e /32 The counting argument again shows that there is an o E [0, r) such that for at most 3 8u Ig[ elements e Eg the sets Do(e)+2o are full, and the proof is finished as above. We have now completed the proof of lemma B C D E and with it the proof of the theorem. The author wishes to express his thanks to E. WIRSING and P. D. T. A. EU~OT% who helped considerably in the final formulation of the proof. (Received 20 October 1967) MTA MATEMATIKAI KUTATO INTI~ZETE, BUDAPEST, V., RE/~LTANODA U. 13-15 Acta Mathematica Academlae Scientiarum Hungaricae 20, i969

1/--pages