贝叶斯分类器报告

更新时间：2023-11-19 01:04:01 阅读量：教育文库文档下载

说明：文章内容仅供预览，部分内容可能不全。下载后的文档，内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的，是否完整无缺。

人工智能——贝叶斯分类器

自动化系模式识别与智能系统研究所姓名：曹达学号：23220101153239

实验报告

一、实验目的

通过上机编程加深对贝叶斯分类器分类过程的理解，同时提高分析问题、解决问题、实际操作的能力。

二、实验数据说明

实验数据来源于http://archive.ics.uci.edu/ml/，详细说明请见附件一。

数据源的完整名称是Wine Data Set，是对3种不同的酒进行分类。这三种酒包括13种不同的属性。13种属性分别为：Alcohol，Malic acid，Ash，Alcalinity of ash，Magnesium，Total phenols，Flavanoids，Nonflavanoid phenols，Proanthocyanins，Color intensity，Hue，OD280/OD315 of diluted wines，Proline。在 “wine.data”文件中，每行代表一种酒的样本，共有178个样本；一共有14列，其中，第一列为类标志属性，共有三类，分别记为“1”，“2”，“3”；后面的13列为每个样本的对应属性的样本值。其中第1类有59个样本，第2类有71个样本，第3类有48个样本。

三、朴素贝叶斯分类算法分析

贝叶斯分类器是用于分类的贝叶斯网络。该网络中应包含类结点C，其中C 的取值来自于类集合( c1 , c2 , ... , cm)，还包含一组结点X = ( X1 , X2 , ... , Xn)，表示用于分类的特征。对于贝叶斯网络分类器，若某一待分类的样本D，其分类特征值为x = ( x1 , x2 , ... , x n) ，则样本D 属于类别ci 的概率P( C = ci | X1 = x1 , X2 = x 2 , ... , Xn = x n) ，( i = 1 ,2 , ... , m) 应满足下式：

P( C = ci | X = x) = Max{ P( C = c1 | X = x) , P( C = c2 | X = x ) , ... , P( C = cm | X = x ) }

而由贝叶斯公式：

P( C = ci | X = x) = P( X = x | C = ci) * P( C = ci) / P( X = x)

其中，P( C = ci) 可由领域专家的经验得到,而P( X = x | C = ci) 和P( X = x) 的计算则较困难。

四．实验结果

使用贝叶斯分类算法，最后得到测试样本是属于哪个类的酒。以下是详细实现过程。

人工智能——贝叶斯分类器

自动化系模式识别与智能系统研究所姓名：曹达学号：23220101153239

实验结果如下图所示。实验源码请见附件三。

人工智能——贝叶斯分类器

自动化系模式识别与智能系统研究所姓名：曹达学号：23220101153239

附件一：

实验数据说明：

1. Title of Database: Wine recognition data Updated Sept 21, 1998 by C.Blake : Added attribute information

2. Sources:

(a) Forina, M. et al, PARVUS - An Extendible Package for Data Exploration, Classification and Correlation. Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno, 16147 Genoa, Italy. (b) Stefan Aeberhard, email: stefan@coral.cs.jcu.edu.au (c) July 1991 3. Past Usage:

(1)S. Aeberhard, D. Coomans and O. de Vel, Comparison of Classifiers in High Dimensional Settings, Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland. (Also submitted to Technometrics). The data was used with many others for comparing various classifiers. The classes are separable, though only RDA has achieved 100% correct classification.(RDA : 100%, QDA 99.4%, LDA 98.9%, 1NN 96.1% (z-transformed data)) (All results using the leave-one-out technique) In a classification context, this is a well posed problem with \behaved\class structures. A good data set for first testing of a new classifier, but not very challenging.

(2) S. Aeberhard, D. Coomans and O. de Vel,\RDA\Tech. Rep. no. 92-01, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland.(Also submitted to Journal of Chemometrics).Here, the data was used to illustrate the superior performance of the use of a new appreciation function with RDA. 4. Relevant Information:

-- These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars.The analysis determined the quantities of 13 constituents found in each of the three types of wines. -- I think that the initial data set had around 30 variables, but for some reason I only have the 13 dimensional version.I had a list of what the 30 or so variables were, but a.) I lost it, and b.), I would not know which 13 variables are included in the set.

-- The attributes are (dontated by Riccardo Leardi, riclea@anchem.unige.it ) 1) Alcohol

2) Malic acid

3) Ash

4) Alcalinity of ash 5) Magnesium

6) Total phenols 7) Flavanoids

8) Nonflavanoid phenols 9) Proanthocyanins

人工智能——贝叶斯分类器

自动化系模式识别与智能系统研究所姓名：曹达学号：23220101153239

10)Color intensity

11)Hue

12)OD280/OD315 of diluted wines 13)Proline 5. Number of Instances class 1 59 class 2 71

class 3 48

6. Number of Attributes

All attributes are continuous.No statistics available, but suggest to 7. For Each Attribute:

standardise variables for certain uses (e.g. for us with classifiers which are NOT scale invariant)

NOTE: 1st attribute is class identifier (1-3) 8. Missing Attribute Values:

None

9. Class Distribution: number of instances per class class 1 59

class 2 71 class 3 48

附件二：

实验数据：http://archive.ics.uci.edu/ml/machine-learning-databases/wine/

附件三：源程序：

BayesianClassifier.h #include #include #include #include #include #include #include using namespace std; // 1) Alcohol // 2) Malic acid // 3) Ash

// 4) Alcalinity of ash // 5) Magnesium

// 6) Total phenols // 7) Flavanoids

// 8) Nonflavanoid phenols

人工智能——贝叶斯分类器

自动化系模式识别与智能系统研究所姓名：曹达学号：23220101153239

// 9) Proanthocyanins

// 10)Color intensity // 11)Hue

// 12)OD280/OD315 of diluted wines // 13)Proline int TrainNum = 130; 所有训练数据的范围 int TestNum = 48; struct OriginalData {

double A1; double A2; double A3; double A4;

double A5; double A6;

double A7; double A8; double A9; double A10; double A11;

double A12; double A13; double A14; };

BayesianClassifier.cpp #include

#include #include

#include %using namespace std;

const int Shuxing=13;//属性总数

ifstream f;

vector trainData; //存放训练数据 vector testData; //存放测试数据 double A[3]; //先验概率 int m;

//存放每一类型，每种属性中某数值的概率 map C1_map[Shuxing]; map C2_map[Shuxing]; map C3_map[Shuxing]; //从文件中读取数值

void DataRead(vector &data, const char* fileName) {

人工智能——贝叶斯分类器

自动化系模式识别与智能系统研究所姓名：曹达学号：23220101153239

f.open(fileName); int ZHjiang;

if (fileName[0] == 'w') ZHjiang = TrainNum; else

ZHjiang = TestNum; string line;

OriginalData wine;

for (int i = 0; i < ZHjiang; i++) {

f >> line;

while (line.find(',') > 0 && line.find(',') < line.length()) {

line[line.find(',')] = ' ';

}

istringstream stream(line);

stream >> wine.A1 >> wine.A2 >> wine.A3 >> wine.A4 >> wine.A5 >> wine.A6 >> wine.A7 >> wine.A8 >> wine.A9 >> wine.A10 >> wine.A11 >> wine.A12 >> wine.A13 >> wine.A14;

}

data.push_back(wine);

f.close(); }

void bayes() {

int count1 = 0, count2 = 0, count3 = 0; int i;

for(i = 0; i < TrainNum ; i++) { if(trainData[i].A1 == 1)

{

count1 ++;

}

if(trainData[i].A1 == 2) { }

if(trainData[i].A1 == 3) {

count3 ++;

}//统计三类数据,各自求和

count2 ++;

}

A[0] = (double)count1/(double)TrainNum; //求先验概率 A[1] = (double)count2/(double)TrainNum;

人工智能——贝叶斯分类器

自动化系模式识别与智能系统研究所姓名：曹达学号：23220101153239

A[2] = (double)count3/(double)TrainNum;

map::iterator pipei; for(i = 0 ; i < TrainNum; i++) { if(trainData[i].A1 == 1) //求P(Xk|C1) 中Xk的个数

{

int j=0;

for(;j< 13 ;j++) {

double temp = *(&trainData[i].A2+j); pipei = C1_map[j].find(temp); if(pipei == C1_map[j].end()) { }

C1_map[j].insert(map::value_type(temp,1));

else { double j = pipei->second; }

pipei->second = j + 1;

}

if(trainData[i].A1 == 2) //求P(Xk|C2) 中Xk的个数 {

int j = 0;

for(;j< 13 ;j++) { }

double temp = *(&trainData[i].A2+j);

pipei = C2_map[j].find(temp); if(pipei == C2_map[j].end()) { } else { }

double j = pipei->second; pipei->second = j + 1;

C2_map[j].insert(map::value_type(temp,1));

}

if(trainData[i].A1 == 3) //求P(Xk|C3) 中Xk的个数 {

人工智能——贝叶斯分类器

自动化系模式识别与智能系统研究所姓名：曹达学号：23220101153239

}

int j = 0;

for(;j< 13 ;j++) { double temp = *(&trainData[i].A2+j); pipei = C3_map[j].find(temp); }

if(pipei == C3_map[j].end()) { } else { }

double j = pipei->second; pipei->second = j + 1;

C3_map[j].insert(map::value_type(temp,1));

}

//概率

for(i = 0; i < Shuxing; i++) {

for(pipei=C1_map[i].begin(); pipei!=C1_map[i].end(); ++pipei) { }

double num = pipei->second;

pipei->second = (double)num/(double)count1;

for(pipei=C2_map[i].begin(); pipei!=C2_map[i].end(); ++pipei) {

double num = pipei->second;

pipei->second = (double)num/(double)count2;

}

for(pipei=C3_map[i].begin(); pipei!=C3_map[i].end(); ++pipei) {

double num = pipei->second;

pipei->second = (double)num/(double)count3;

}

void houyan()//计算后验分布,找出最大值 {

int i,j,k;

double p[3];

for(i = 0; i

人工智能——贝叶斯分类器

自动化系模式识别与智能系统研究所姓名：曹达学号：23220101153239

double pXC[3]={0,0,0};

for(j = 0; j < 3; j++) { map::iterator pipei; //计算p(X|C1)

for(k = 0; k < Shuxing; k++) {

pipei = C1_map[k].find(*(&testData[i].A2+k)); if(pipei != C1_map[k].end()) { }

pXC[0] =pXC[0] + pipei->second;

}

p[0] = A[0] * pXC[0]; //计算p(X|C2)

for(k = 0; k < Shuxing; k++) {

pipei = C2_map[k].find(*(&testData[i].A2+k)); if(pipei != C2_map[k].end()) {

pXC[1] =pXC[1] + pipei->second;

} }

p[1] = A[1]*pXC[1]; //计算p(X|C3)

for(k = 0; k < Shuxing; k++) {

pipei = C3_map[k].find(*(&testData[i].A2+k)); if(pipei != C3_map[k].end()) { }

pXC[2] =pXC[2] + pipei->second;

}

p[2] = A[2]*pXC[2];

}

//找出最大值

if(p[0] > p[1] && p[0] >p[2]) { } else {

cout<

m++;

人工智能——贝叶斯分类器

自动化系模式识别与智能系统研究所姓名：曹达学号：23220101153239

}

if(p[1] > p[2])

{ cout<

} else {

cout<

m++; } }

}

void main() {

double tp,fp;

cout<<\概率最大值 \<<\所属类别\<

DataRead(testData,\); houyan(); tp=(double)m/51; fp=1-tp; cout<<\正确率为：\<

本文来源：https://www.bwwdw.com/article/5b5v.html

相关文章：

正在阅读：

贝叶斯分类器报告11-19

杨美玲：京城地产真正女大亨04-17

接待工作应注意的三个问题06-25

2018年版中国旅游O2O项目可行性研究报告模板12-19

绍兴市市属中小学教学案例获奖名单04-25

中职生班主任学年评语-精品范文文档06-07

2010宁夏回族自治区高考试卷含解析理论考试试题及答案07-18

“学雷锋,树新风”主题班会02-16

学雷锋树新风主题班会教案大全08-22

化工原理王晓红版习题答案第三章09-25

上一篇：excel2010上机操作题 - 图文下一篇：引物合成常见问题分析