Classification; Naïve Bayes classifier; Nearest-neighbor classifier. Eager vs Lazy learners. Eager learners: learn the model as soon as the training data becomes ...

Hafenkranich发布于2018/06/12 00:00

1.Lecture outline Classification Naïve Bayes c lassifier Nearest-neighbor classifier

2.Eager vs Lazy learners Eager learners: learn the model as soon as the training data becomes available Lazy learners: delay model-building until testing data needs to be classified Rote classifier: memorizes the entire training data

3.k-nearest neighbor classifiers k -nearest neighbors of a record x are data points that have the k smallest distance to x

4.k -nearest neighbor classification Given a data record x find its k closest points Closeness: Euclidean, Hamming, Jaccard distance Determine the class of x based on the classes in the neighbor list Majority vote Weigh the vote according to distance e.g., weight factor, w = 1/d 2 Probabilistic voting

5.Characteristics of nearest-neighbor classifiers Instance of instance-based learning No model building (lazy learners) Lazy learners: computational time in classification Eager learners: computational time in model building Decision trees try to find global models, k-NN take into account local information K-NN classifiers depend a lot on the choice of proximity measure

6.Bayes Theorem X, Y random variables Joint probability: Pr(X= x,Y =y) Conditional probability: Pr(Y=y | X=x) Relationship between joint and conditional probability distributions Bayes Theorem :

7.Bayes Theorem for Classification X : attribute set Y : class variable Y depends on X in a non- determininstic way We can capture this dependence using Pr( Y |X) : Posterior probability vs Pr( Y ) : Prior probability

8.Building the Classifier Training phase: Learning the posterior probabilities Pr(Y|X) for every combination of X and Y based on training data Test phase: For test record X’ , compute the class Y’ that maximizes the posterior probability Pr(Y’|X’)

9.Bayes Classification: Example X’=(Home Owner=No, Marital Status=Married, AnnualIncome =120K) Compute: Pr( Yes|X ’) , Pr( No|X ’) pick No or Yes with max Prob. How can we compute these probabilities??

10.Computing posterior probabilities Bayes Theorem P(X) is constant and can be ignored P(Y): estimated from training data; compute the fraction of training records in each class P(X|Y) ?

11.Naïve Bayes Classifier Attribute set X = {X 1 ,…, X d } consists of d attributes Conditional independence: X conditionally independent of Y , given X : Pr(X|Y,Z) = Pr(X|Z) Pr(X,Y|Z) = Pr(X|Z) xPr (Y|Z)

12.Naïve Bayes Classifier Attribute set X = {X 1 ,…, X d } consists of d attributes

13.Conditional probabilities for categorical attributes Categorical attribute X i Pr(Xi = xi|Y =y) : fraction of training instances in class y that take value x i on the i -th attribute Pr( homeOwner = yes|No ) = 3/7 Pr( MaritalStatus = Single| Yes ) = 2/3

14.Estimating conditional probabilities for continuous attributes? Discretization ? How can we discretize ?

15.Naïve Bayes Classifier: Example X’ = ( HomeOwner = No, MaritalStatus = Married, Income=120K) Need to compute Pr(Y|X’) or Pr(Y) xPr (X’|Y) But Pr(X’|Y) is Y = No : Pr(HO= No|No ) xPr (MS= Married|No ) xPr (Inc=120K|No) = 4/7x4/7x0.0072 = 0.0024 Y=Yes : Pr(HO= No|Yes ) xPr (MS= Married|Yes ) xPr (Inc=120K|Yes) = 1x0x1.2x10 -9 = 0

16.Naïve Bayes Classifier: Example X’ = ( HomeOwner = No, MaritalStatus = Married, Income=120K) Need to compute Pr(Y|X’) or Pr(Y) xPr (X’|Y) But Pr(X’|Y = Yes) is 0 ? Correction process: n c : number of training examples from class y j that take value x i n: total number of instances from class y j m: equivalent sample size (balance between prior and posterior) p: user-specified parameter (prior probability)

17.Characteristics of Naïve Bayes Classifier Robust to isolated noise points noise points are averaged out Handles missing values Ignoring missing-value examples Robust to irrelevant attributes If X i is irrelevant, P( X i |Y ) becomes almost uniform Correlated attributes degrade the performance of NB classifier

• Hafenkranich
• Independent consultant specializing in database, web and M2M applications

### 相关Slides

• 视觉任务之间是否有关系，或者它们是否无关？例如，表面法线可以简化估算图像的深度吗？直觉回答了这些问题，暗示了视觉任务中存在结构。了解这种结构具有显著的价值;它是传递学习的基本概念，并提供了一种原则性的方法来识别任务之间的冗余，例如，无缝地重用相关任务之间的监督或在一个系统中解决许多任务而不会增加复杂性。 我们提出了一种完全计算的方法来建模视觉任务的空间结构。这是通过在隐空间中的二十六个2D，2.5D，3D和语义任务的字典中查找（一阶和更高阶）传递学习依赖性来完成的。该产品是用于任务迁移学习的计算分类地图。我们研究了这种结构的后果，例如：非平凡的关系，并利用它们来减少对标签数据的需求。例如，我们表明，解决一组10个任务所需的标记数据点总数可以减少大约2/3（与独立训练相比），同时保持性能几乎相同。我们提供了一套用于计算和探测这种分类结构的工具，包括用户可以用来为其用例设计有效监督策略。

• 尽管最近在生成图像建模方面取得了进展，但是从像ImageNet这样的复杂数据集中成功生成高分辨率，多样化的样本仍然是一个难以实现的目标。为此，我们以最大规模训练了生成性对抗网络，并研究了这种规模所特有的不稳定性。我们发现将正交正则化应用于生成器使得它适合于简单的“截断技巧”，允许通过截断潜在空间来精确控制样本保真度和多样性之间的权衡。我们的修改导致模型在类条件图像合成中达到了新的技术水平。当我们在ImageNet上以128×128分辨率进行训练时，我们的模型（BigGAN）的初始得分（IS）为166.3，Frechet初始距离（FID）为9.6，比之前的最优IS为52.52，FID为18.65有了显著的提升。

• 2017年，以斯坦福大学为首、包括吴恩达、李开复等一众大咖专家团队齐力打造的人工智能指数（AI Index）重磅年度报告首次发布。从学术、业界发展、政府策略等方面对全年的人工智能全球发展进行了回顾，堪称全年人工智能最强报告。 该重点介绍了人工智能领域的投资和工作岗位前所未有的增长速度，尤其是在游戏和计算机视觉领域进展飞速。

• 18年12月12日，哈佛大学，麻省理工学院，斯坦福大学以及OpenAI等联合发布了第二届人工智能指数（AI Index）年度报告。 人工智能领域这一行业的发展速度，不仅仅是通过实际产品的产生以及研究成果来衡量，还要考虑经济学家和政策制定者的预测和担忧。这个报告的目标是使用硬数据衡量人工智能领域的发展。 报告中多次提及了中国人工智能的发展以及清华大学： 美国仅占到全球论文发布内容的17%，欧洲是论文最高产的国家，18年发表的论文在全球范围内占比28%，中国紧随其后，占比25%。； 大学人工智能和机器学习相关课程注册率在全球范围都有大幅提升，其中最瞩目的是清华大学，相关课程2017年的注册率比2010年高出16倍，比2016年高出了将近3倍； 各国对人工智能应用方向重视不同。中国非常重视农业科学，工程和技术方面的应用，相比于2000年，2017年，中国加大了对农业方面的重视。 吴恩达也在今天的推特中重磅推荐了这份报告，称“数据太多了”，并划重点了两个报告亮点：人工智能在业界和学界都发展迅速；人工智能的发展仍需要更加多样包容。