01 统计学基础--总体样本和统计量

主要介绍了统计学的基础知识,如总体与样本、抽样的方式等相关概念,统计学中常用的统计量以及经验分布函数。
展开查看详情

1. Lec1: oN, ÚÚOþ Ü•² 2012 c 2 14 F 1 oN, Äk·‚wwÚO(Statistics) ½Â: /A branch of mathematics dealing with the collection, analysis, interpretation and presentation of masses of numerical data0—Webster’s New Collegiate Dictionary /The branch of the scientific method which deals with the data obtained by counting or measuring the properties of populations0— Fraser (1958) /The entire science of decision making in the face of uncertainty0— Freund and Walpole (1987) /The technology of the scientific method concerned with (1) the design of exper- iments and investigations, (2) statistical inference0— Mood, Graybill and Boes (1974) ¤k ½ÂÑ¿›X: ÚO´˜‡±•£íä•8 nØ. 1.1 oN {ü `, oN´·‚¤a, @ ‡N|¤ 8Ü. 'X ~1.1. b½˜1 ¬k10000‡, Ù¥k ¬•k¢¬, • O¢¬Ç, ·‚ l¥Ä ˜Ü ©, X100‡?1u . džz‡ ¬=•˜‡‡N, ù110000‡ ¬¡•oN (Population), ù1 ¬ êþ10000¡•oNNþ½öoNŒ . eoN¥‡N ê8•k•‡, K¡•k•oN (Finite population), ÄK¡•Ã•oN (Infinite population). oN f8¡•foN (Subpopulation). XJØÓ foNkXØÓ A (½ö5Ÿ), @oòù ØÓ foN3ïÄm©žB«©m5U •Ð )oN. 'X,«A½ †¬ 1

2.éA½ foNkXØÓ A, @oXJØ\«©ù ØÓ foN, KT†¬ AŒUÒ ¬ · ØUEOÑ5. , , «OÑ5ù ØÓ foN•Œ±¼ •O( O(íä). 'X‡ïÄ<‚ p ©Ù, KUì5O,<« r<+©•˜ foN, U ŠÑ•O( íä. l , UìoN´ÄkX5Ÿ»É foN, Œ±roN©•“ ÓŸ” ( Homogeneity)Ú“ ÉŸ” (Heterogeneity)oN. ýŒÜ©ÚO•{Ñ´‡¦oNÓŸ . ùp·‚=•ÄÓŸoN. ?˜Ú, <‚ý ¤'% Ø´oNS‡N , ´'%‡Nþ ˜‘(½A‘) êþ• I, XF1 Æ·, "‡ º€. 3þ~¥e ¬• ¬^0L«, e ¬•¢¬^1L«, ·‚ '% ‡N Š´0„´1. Ïd·qŒ¼ oN Xe½Â: oNŒ±w¤k¤k‡Nþ ,«êþ•I ¤ 8Ü, Ïd§´ê 8Ü. duz‡‡N Ñy´‘Å , ¤±ƒA ‡Nþ êþ•I Ñy•‘k‘Å5. l Œ±rd«êþ•Iw¤‘ÅCþ(Random variable,{P•r.v.), Têþ•I3oN¥ © ÙÒ´d‘ÅCþ ©Ù. ±þ~5`², b½10000• ¬¥¢¬ê•100‡, Ù{ • ¬, ¢¬Ç•0.01. ·‚½Â‘ÅCþXXe: 1 ¢¬ X= 0 ¬, ÙVÇ©Ù•0–1©ÙB(1, 0.01). ÏdA½‡Nþ êþ•I´r.v. X * Š(Observation). ù ˜5, oNŒ±^˜‡‘ÅCþ9Ù©Ù5£ã, ¼ Xe½Â: ½  1.1. oN´˜‡VÇ©Ù. Q,oNŒ±À•´˜‡VÇ©Ù, ÏdedVÇ©Ù•“ xx©Ù”, •²~¡oN•“ xxoN”. 'Xe• ©Ù, K¡•“ oN”; e••ê©Ù, K¡•“ •êoN” . 1.2 (Sample)´ d o N ¥(U , « K)Ä ˜Ü©‡N|¤ 8 Ü. ¤•¹ ‡ N ê 8 = ¡ •“ N þ” ½ ö“ Œ ” (sample size). ‡ N Ñ5|¤ L §¡•Ä (sampling), Ä • ª k ü «: V Ç Ä (probability sampling) Ú š V Ç Ä (nonprobability sampling). • VÇÄ ¤¢VÇÄ , ´•oN¥ z‡‡Nѱ˜‡¯kŒ±(½O( VÇ Ä Ñ5Š• . VÇÄ •ªk: {ü‘ÅÄ , åÄ ,© Ä ,õ ãÄ (•[žëwÄ N ). Ù¥~^ Ò´{ü‘ÅÄ . ù«Ä •ªkü‡A : (1) z‡‡NѱƒÓ VÇ Ä . ù¿›Xz‡‡NÑäk“L5. (2) ‡N ŠƒmƒpÕá. Ïd, eP •X1 , · · · , Xn , oN•X, K3{ü‘ÅÄ •ªe, X1 , · · · , Xn †o NX´ÕáÓ©Ù , ~P• X1 , · · · , Xn i.i.d X d{ü‘ÅÄ ¼ (X1 , · · · , Xn )¡•{ü‘Å . ^êÆŠóòù˜½ÂQã Xe: 2

3. ½Â 1.2. k˜oNF, X1 , · · · , Xn •lF ¥Ä Nþ•n ,e (i) X1 , · · · , Xn ƒpÕá, (ii) X1 , · · · , Xn ƒÓ©Ù, =Ók©ÙF, K¡X1 , · · · , Xn •{ü‘Å , kž{¡•{ü ½‘Å . oN•F, X1 , · · · , Xn •loN¥Ä {ü‘Å , KX1 , · · · , Xn éÜ©Ù• n F (x1 ) · F (x2 ) · · · · F (xn ) = F (xi ) i=1 eF k—Ýf,KÙéÜ—Ý• n f (x1 ) · f (x2 ) · · · · f (xn ) = f (xi ) i=1 oNNþ ž,•k?1k˜£Ä âU¼ {ü‘Å . oNNþ Œ½¤Ä 3oN¥¤Ó'~ ž, Œ±Cq@•Ã˜£Ä ¼ ´{ü‘Å . ùp·‚=•Ä{ü‘Å , ±e{¡ . • šVÇÄ ù´• oN¥ , ‡NvkŬ Äѽö‡N ÄÑ VÇØUO( (½ž Ä •{. Ïd‡N ÄÑ IO´Äua, oN ˜ b ŠÑ . d u‡N´š‘Å ÄÑ, ÏdšVÇÄ ØU OÄ Ø . šVÇÄ •ªkó,Ä (Accidental sampling), Ä (Quota sampling)Ú8 Ä (Purposive sampling) . 3‡N Ä ƒc, O Ä Œ •n ÒŒ±À•´‘ÅCþ, Ïd~P•X1 , · · · , Xn ; ‡N Ä , ÒLy•äN êŠx1 , · · · , xn (¡• *ÿŠ½ö ˜| Š, ŒU Š‰Œ~¡• ˜m, P•X . Ïd QŒ±À•´‘ÅCþ(Ä c), q Œ±À•´äN êŠ(Ä ). 1.3 Ä ©Ù Œ±À•´‘ÅCþ, l VÇ©Ù¡•Ä ©Ù. ‡û½Ä ©Ù, Ò‡Šâ * Š äN•I 5Ÿ(ù 9k' ;’•£) , ±9éÄ •ªÚéÁ ?1 •ª ), d ~~„7L\˜ <• b½. e¡w˜ ~f: ~1.2. ˜Œ1 ¬ kN ‡, Ù¥¢¬M ‡, N ®•, M ™•. y3l¥ÄÑn‡u Ù¥¢ ¬ ‡ê, ^± OM ½¢¬Çp = M/N. Ä •ª•: ؘ£Ä , ˜gʇ, •gÄ ,† Ä n‡•Ž. ¦Ä ©Ù. kò¯Kêþz. Xi L«1igÄÑ ,- 1 ÄÑ •¢¬ Xi = 0 ÄÑ •Ü‚¬, ¬X1 , · · · , Xn ¥ z˜‡Ñ•U 0, 1Š. ‰½˜| x1 , · · · , xn , z‡xi •0½1. ·‚¤¦ Ä ©Ù•P (X1 = x1 , · · · , Xn = xn ). eP¯‡Ai = {Xi = xi }, |^VǦ{úª P (A1 · · · An ) = P (A1 )P (A2 |A1 ) · · · P (An |A1 A2 · · · An−1 ) 3

4.ØJ¦ÑÄ ©Ù. •Bu?Ø, kwn = 3. x1 = 1, x2 = 0, x3 = 1,K P (X1 = 1, X2 = 0, X3 = 1) = P (X1 = 1)P (X2 = 0|X1 = 1)P (X3 = 1|X1 = 1, X2 = 0) M N −M M −1 M M −1 N −M = · · = · · N N −1 N −2 N N −1 N −1 n 阄œ/, P i=1 xi = a,|^VǦ{úª´¦ P (X1 = x1 , X2 = x2 , · · · , Xn = xn ) M M −1 M −a+1 N −M N −M −n+a+1 = · ··· · ··· , (1.1) N N −1 N −a+1 N −a N −n+1 n x1 , · · · xn Ñ• 0½ 1, … xi = a ž•þã(J(Ù{œ/• 0) . i=1 dþãOŽŒ„ X1 , · · · , Xn Ø´ƒpÕá , Ä ©Ù´|^¦{úª, ÏL^‡VÇ OŽÑ5 . ~1.3. E±þ~•~, Ä •ªU•k˜£Ä , =zgÄ Pe(J, , òÙ˜£ ,2 Ä1 ‡, † Ä n‡•Ž, ¦Ä ©Ù. 3k˜£Ä œ/, zgÄ ž, N ‡ ¬¥ z˜‡ ±1/N VÇ ÄÑ, džP (Xi = 1) = M/N, P (Xi = 0) = (N − M )/N, k a n−a M N −M P (X1 = x1 , · · · , Xn = xn ) = , (1.2) N N n x1 , · · · xn Ñ• 0½ 1, … xi = a ž•þã(J(Ù{œ/• 0). i=1 Œ„d~'þ~‡{ü, Ï• ~¥ X1 , · · · , Xn ´ÕáÓ©Ù , þ~¥X1 , · · · , Xn Ø Õá. n/N é ž, (1.1)Ú(1.2) Oé . Ï n/N é ž,Œrþ~¥ Ø£Ä Š k˜£Ä 5?n. 1.4 ÚOíä loN¥Ä ˜½Œ íäoN VÇ©Ù •{¡•ÚOíä(Statistical Inference) oN©ÙF /ª®•, •´¹kk•‡™•ëêž, ‡ïÄ ¯K~~Ly•éëê ,«íä. 'X ~1.4. b Ø Ñl ©ÙN (0, σ 2 ), ò,ÔN¡-ng, X1 , · · · , Xn Ä ©ÙéN ´ n 1 f (x1 , · · · , xn ) = (2π)−n/2 exp{− (xi − a)2 } 2σ 2 i=1 édÁ ÚOí䌱´ÔN -þ ¯ O(^X5 O), ½ö¡-°Ý.• . Ïdùa ¯K¡•´ëêÚO. 4

5. oN©Ù/ª™•ž¤?1 ÚOíä¡•šëêÚOíä, šëêÚOíä ̇ 8 ´éoN©ÙŠÑíä. ÚOíä •)e n•¡SN: (1) JÑ««ÚOíä •{. (2) OŽk'íä•{5U êþ•I, Xcã~f¥^X¯ ON (a, σ 2 )¥ a^P (|X ¯ − a| > c)L«íä5U êþ•I. (3) 3˜½ ^‡Ú`û5OKeÏé•` ÚOíä•{, ½y²,«ÚOíä•{´•` . 2 ÚOþ ½  2.1. d ŽÑ þ´ÚOþ (Statistic), ½ , ÚOþ ´ ¼ê. éù˜½Â·‚ŠXeA:`²: (1) Ú O þ • † k ', Ø U † ™ • ë ê k '. ~ XX ∼ N (a, σ 2 ), X1 , · · · , Xn ´ n n l o NX¥ Ä i.i.d. , K i=1 Xi Ú i=1 Xi2 Ñ ´ Ú O þ, aÚσ 2 • ™ • ë ê ž, n n i=1 (Xi − a)Ú i=1 Xi2 /σ 2 ÑØ´ÚOþ. (2) du äkü-5, = QŒ±w¤äN ê, qŒ±w¤‘ÅCþ; ÚOþ ´ ¼ê, ÏdÚOþ •äkü-5. Ï•ÚOþ ŒÀ•‘ÅCþ(½‘Å•þ), Ïdâk VÇ©ÙŒó, ù´·‚|^ÚOþ?1ÚOíä •â. !eZ~^ ÚOþ 1. þŠ X1 , · · · , Xn ´l,oNX¥Ä ,K¡ n ¯= 1 X Xi . n i=1 • þŠ(Sample mean). §©O‡N oNêÆÏ" &E. 2. • X1 , · · · , Xn ´l,oNX¥Ä ,K¡ n 1 ¯ 2, Sn2 = (Xi − X) n−1 i=1 • • (Sample variance).§©O‡N oN• &E, Sn ‡N oNIO &E. 3. Ý X1 , · · · , Xn •loNF ¥Ä , K¡ n 1 an,k = Xik , k = 1, 2, · · · n i=1 • k ¯ :Ý. AOk = 1ž, an,1 = X,= þŠ. ¡ n 1 ¯ k , k = 2, 3, · · · mn,k = (Xi − X) n i=1 5

6.• k ¥%Ý.AOk = 2ž,mn,2 = Sn2 ,= • . :ÝÚ¥%ÝÚ¡• Ý (Sample moments). 4. ‘‘Å•þ Ý (X1 , Y1 ), · · · , (Xn , Yn )•l ‘oNF (x, y)¥Ä ,K n n ¯= 1 X Xi , 2 SX = 1 ¯ 2 (Xi − X) n i=1 n−1 i=1 n 1 1 Y¯ = nYi , SY2 = (Yi − Y¯ )2 n i=1 n−1 i=1 n 1 ¯ i − Y¯ ) SXY = (Xi − X)(Y n i=1 ©O¡•XÚY þŠ! • 9X ÚY • (Sample covariance). 5. gSÚOþ9Ùk'ÚOþ X1 , · · · , Xn •loNF ¥Ä , rÙUŒ ü •X(1) ≤ X(2) ≤ · · · ≤ X(n) ,K ¡(X(1) , X(2) , · · · , X(n) )•gSÚOþ (Order statistic), (X(1) , · · · , X(n) ) ?˜Ü©•¡•g SÚOþ . |^gSÚOþŒ±½Âe ÚOþ: (1) ¥ ê: X( n+1 ) n•Ûê m1/2 = 1 2 (2.1) 2 [X(n/2) + X(n/2+1) ] n•óê ¥ ê (Sample median)‡NoN¥ ê &E. oN©Ù'u,:é¡ž, é¡¥%Q ´oN¥ êq´oNþŠ, džm1/2 •‡NoNþŠ &E. (2) 4Š: X(1) ÚX(n) ¡• 4 ŠÚ4ŒŠ,§‚Ú¡• 4Š (Extreme values of sample). 4ŠÚOþ3'u/³¯KÚá Á ÚO©Û¥´~^ ÚOþ. (3) p© ê (0 < p < 1): Œ½Â•X[(n+1)p] ,d?[a]L«¢êa êÜ©. p = 1/2, n•Ûêž, d½Â†(1)¥ ¥ êƒÓ. p© ê(Sample p-fractile)‡N o Np© ê&E. (4) 4 : R = X(n) − X(1) ,¡• 4 (Sample range), §´‡NoN©ÙÑÙ§Ý &E. 6. CÉXê X1 , · · · , Xn •loNF ¥Ä ,K¡ Vˆ = Sn /X ¯ (2.2) • CÉXê (Sample coefficient of variation). §‡N oNCÉXê(Population coefficient of variation) cν &E. oNCÉXê ½Â´: cν = V ar(X)/E(X),§´ïþoN©ÙÑÙ §Ý þ, ùÑ٧ݴ±oNþŠ•ü 5Ýþ. 6

7. 7. ÝXê X1 , · · · , Xn •loNF ¥Ä ,K¡ n n mn,3 √ 3/2 βˆ1 = 3/2 = n ¯ 3 (Xi − X) ¯ 2 (Xi − X) (2.3) mn,2 i=1 i=1 • ÝXê (Sample skewness). §‡N oN ÝXê &E, oN ÝXê(Population 3/2 skewness)½Â´: β1 = µ3 /µ2 ,d?µi (i = 2, 3)´oN i ¥%Ý. βs ´‡NoN©Ù šé ¡5½/ -50 ˜«Ýþ. ©ÙN (a, σ ) 2 Ý•". 8. ¸ÝXê X1 , · · · , Xn •loNF ¥Ä ,K¡ n n 2 mn,4 βˆ2 = 2 − 3 = n ¯ 4 (Xi − X) ¯ 2 (Xi − X) −3 (2.4) mn,2 i=1 i=1 • ¸ÝXê (Sample kurtosis). §‡N oN¸ÝXêβk &E. oN¸ÝXê(Population kurtosis)½Â´:β2 = µ4 /µ22 − 3,Ù¥µi (i = 2, 4)Xc¤ã. βk ´‡NoN©Ù—Ý-‚3¯ê NC /¸0 k€§Ý ˜«Ýþ. ©ÙN (a, σ 2 ) ¸Ý•". n!² ©Ù¼ê ½  2.2. X1 , · · · , Xn •goNF (X)¥Ä i.i.d. , òÙUŒ ü •X(1) ≤ X(2) ≤ · · · ≤ X(n) ,é?¿¢êx,¡e ¼ê    0 x < X(1) k  Fn (x) = X(k) ≤ x < X(k+1) , k = 1, 2, · · · , n − 1 (2.5)  n   1 X(n) ≤ x •² ©Ù¼ê (Empirical distibution function). ´„² ©Ù¼ê´üNšümëY¼ê, äk©Ù¼ê Ä 5Ÿ. §3x = X(k) , k = 1, 2, 3, · · · , n?kmä, §´3z‡mä:a ÌÝ•1/n F¼ê. eP«5¼ê 1 x∈A I[A] (x) = 0 Ù¦, KFn (x)ŒL• n 1 Fn (x) = I[Xi ≤x] . (2.6) n i=1 d½ÂŒ•Fn (x)´=•6u X1 , X2 , · · · , Xn ¼ê, Ïd§´ÚOþ. §ŒU Š •0, 1/n, 2/n, · · · , (n − 1)/n, 1.ePYi = I[Xi ≤x] , i = 1, 2, · · · , n,KP (Yi = 1) = F (x), P (Yi = n 0) = 1 − F (x), …Y1 , Y2 , · · · , Yn , i.i.d. ∼ b(1, F (x)), nFn (x) = Yi ∼ b(n, F (x)),Ïdk i=1 n n P (Fn (x) = k/n) = P Yi = k = [F (x)]k [1 − F (x)]n−k i=1 k 7

8.|^ ‘©Ù 5ŸŒ•Fn (x)äke Œ 5Ÿ: (1) d¥%4•½n, K n → ∞žk √ n(Fn (x) − F (x)) L −−−−→ N (0, 1). F (x)(1 − F (x)) (2) dBenoulliŒê½Æ, K3n → ∞žk P Fn (x) −−−→ F (x) (3) dBorelrŒê½Æ, K3n → ∞žk P ( lim Fn (x) = F (x)) = 1 n→∞ (4) •?˜Ú, ke Glivenko-Cantelli Theorem (1933): ½ n 2.1. F (x)•r.v. X ©Ù¼ê, X1 , · · · , Xn • goNF (x) {ü‘Å , Fn (x)• Ù² ©Ù¼ê, PDn = sup |Fn (x) − F (x)|,Kk −∞<x<∞ P ( lim Dn = 0) = 1. n→∞ 8