贝叶斯分类器报告

更新时间:2023-11-19 01:04:01 阅读量: 教育文库 文档下载

说明:文章内容仅供预览,部分内容可能不全。下载后的文档,内容与下面显示的完全一致。下载之前请确认下面内容是否您想要的,是否完整无缺。

人工智能——贝叶斯分类器

自动化系模式识别与智能系统研究所 姓名:曹达 学号:23220101153239

实验报告

一、实验目的

通过上机编程加深对贝叶斯分类器分类过程的理解,同时提高分析问题、解决问题、实际操作的能力。

二、实验数据说明

实验数据来源于http://archive.ics.uci.edu/ml/,详细说明请见附件一。

数据源的完整名称是Wine Data Set,是对3种不同的酒进行分类。这三种酒包括13种不同的属性。13种属性分别为:Alcohol,Malic acid,Ash,Alcalinity of ash,Magnesium,Total phenols,Flavanoids,Nonflavanoid phenols,Proanthocyanins,Color intensity,Hue,OD280/OD315 of diluted wines,Proline。在 “wine.data”文件中,每行代表一种酒的样本,共有178个样本;一共有14列,其中,第一列为类标志属性,共有三类,分别记为“1”,“2”,“3”;后面的13列为每个样本的对应属性的样本值。其中第1类有59个样本,第2类有71个样本,第3类有48个样本。

三、朴素贝叶斯分类算法分析

贝叶斯分类器是用于分类的贝叶斯网络。该网络中应包含类结点C,其中C 的取值来自于类集合( c1 , c2 , ... , cm),还包含一组结点X = ( X1 , X2 , ... , Xn),表示用于分类的特征。对于贝叶斯网络分类器,若某一待分类的样本D,其分类特征值为x = ( x1 , x2 , ... , x n) ,则样本D 属于类别ci 的概率P( C = ci | X1 = x1 , X2 = x 2 , ... , Xn = x n) ,( i = 1 ,2 , ... , m) 应满足下式:

P( C = ci | X = x) = Max{ P( C = c1 | X = x) , P( C = c2 | X = x ) , ... , P( C = cm | X = x ) }

而由贝叶斯公式:

P( C = ci | X = x) = P( X = x | C = ci) * P( C = ci) / P( X = x)

其中,P( C = ci) 可由领域专家的经验得到,而P( X = x | C = ci) 和P( X = x) 的计算则较困难。

四.实验结果

使用贝叶斯分类算法,最后得到测试样本是属于哪个类的酒。以下是详细实现过程。

1

人工智能——贝叶斯分类器

自动化系模式识别与智能系统研究所 姓名:曹达 学号:23220101153239

实验结果如下图所示。实验源码请见附件三。

2

人工智能——贝叶斯分类器

自动化系模式识别与智能系统研究所 姓名:曹达 学号:23220101153239

附件一:

实验数据说明:

1. Title of Database: Wine recognition data Updated Sept 21, 1998 by C.Blake : Added attribute information

2. Sources:

(a) Forina, M. et al, PARVUS - An Extendible Package for Data Exploration, Classification and Correlation. Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno, 16147 Genoa, Italy. (b) Stefan Aeberhard, email: stefan@coral.cs.jcu.edu.au (c) July 1991 3. Past Usage:

(1)S. Aeberhard, D. Coomans and O. de Vel, Comparison of Classifiers in High Dimensional Settings, Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland. (Also submitted to Technometrics). The data was used with many others for comparing various classifiers. The classes are separable, though only RDA has achieved 100% correct classification.(RDA : 100%, QDA 99.4%, LDA 98.9%, 1NN 96.1% (z-transformed data)) (All results using the leave-one-out technique) In a classification context, this is a well posed problem with \behaved\class structures. A good data set for first testing of a new classifier, but not very challenging.

(2) S. Aeberhard, D. Coomans and O. de Vel,\RDA\Tech. Rep. no. 92-01, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland.(Also submitted to Journal of Chemometrics).Here, the data was used to illustrate the superior performance of the use of a new appreciation function with RDA. 4. Relevant Information:

-- These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars.The analysis determined the quantities of 13 constituents found in each of the three types of wines. -- I think that the initial data set had around 30 variables, but for some reason I only have the 13 dimensional version.I had a list of what the 30 or so variables were, but a.) I lost it, and b.), I would not know which 13 variables are included in the set.

-- The attributes are (dontated by Riccardo Leardi, riclea@anchem.unige.it ) 1) Alcohol

2) Malic acid

3) Ash

4) Alcalinity of ash 5) Magnesium

6) Total phenols 7) Flavanoids

8) Nonflavanoid phenols 9) Proanthocyanins

3

人工智能——贝叶斯分类器

自动化系模式识别与智能系统研究所 姓名:曹达 学号:23220101153239

10)Color intensity

11)Hue

12)OD280/OD315 of diluted wines 13)Proline 5. Number of Instances class 1 59 class 2 71

class 3 48

6. Number of Attributes

13

All attributes are continuous.No statistics available, but suggest to 7. For Each Attribute:

standardise variables for certain uses (e.g. for us with classifiers which are NOT scale invariant)

NOTE: 1st attribute is class identifier (1-3) 8. Missing Attribute Values:

None

9. Class Distribution: number of instances per class class 1 59

class 2 71 class 3 48

附件二:

实验数据:http://archive.ics.uci.edu/ml/machine-learning-databases/wine/

附件三: 源程序:

BayesianClassifier.h #include #include #include #include #include #include #include using namespace std; // 1) Alcohol // 2) Malic acid // 3) Ash

// 4) Alcalinity of ash // 5) Magnesium

// 6) Total phenols // 7) Flavanoids

// 8) Nonflavanoid phenols

4

人工智能——贝叶斯分类器

自动化系模式识别与智能系统研究所 姓名:曹达 学号:23220101153239

// 9) Proanthocyanins

// 10)Color intensity // 11)Hue

// 12)OD280/OD315 of diluted wines // 13)Proline int TrainNum = 130; 所有训练数据的范围 int TestNum = 48; struct OriginalData {

double A1; double A2; double A3; double A4;

//

double A5; double A6;

double A7; double A8; double A9; double A10; double A11;

double A12; double A13; double A14; };

BayesianClassifier.cpp #include

#include #include

#include %using namespace std;

const int Shuxing=13;//属性总数

ifstream f;

vector trainData; //存放训练数据 vector testData; //存放测试数据 double A[3]; //先验概率 int m;

//存放每一类型,每种属性中某数值的概率 map C1_map[Shuxing]; map C2_map[Shuxing]; map C3_map[Shuxing]; //从文件中读取数值

void DataRead(vector &data, const char* fileName) {

5

人工智能——贝叶斯分类器

自动化系模式识别与智能系统研究所 姓名:曹达 学号:23220101153239

f.open(fileName); int ZHjiang;

if (fileName[0] == 'w') ZHjiang = TrainNum; else

ZHjiang = TestNum; string line;

OriginalData wine;

for (int i = 0; i < ZHjiang; i++) {

f >> line;

while (line.find(',') > 0 && line.find(',') < line.length()) {

line[line.find(',')] = ' ';

}

istringstream stream(line);

stream >> wine.A1 >> wine.A2 >> wine.A3 >> wine.A4 >> wine.A5 >> wine.A6 >> wine.A7 >> wine.A8 >> wine.A9 >> wine.A10 >> wine.A11 >> wine.A12 >> wine.A13 >> wine.A14;

}

data.push_back(wine);

f.close(); }

void bayes() {

int count1 = 0, count2 = 0, count3 = 0; int i;

for(i = 0; i < TrainNum ; i++) { if(trainData[i].A1 == 1)

{

count1 ++;

}

if(trainData[i].A1 == 2) { }

if(trainData[i].A1 == 3) {

count3 ++;

}//统计三类数据,各自求和

count2 ++;

}

A[0] = (double)count1/(double)TrainNum; //求先验概率 A[1] = (double)count2/(double)TrainNum;

6

人工智能——贝叶斯分类器

自动化系模式识别与智能系统研究所 姓名:曹达 学号:23220101153239

A[2] = (double)count3/(double)TrainNum;

map::iterator pipei; for(i = 0 ; i < TrainNum; i++) { if(trainData[i].A1 == 1) //求P(Xk|C1) 中Xk的个数

{

int j=0;

for(;j< 13 ;j++) {

double temp = *(&trainData[i].A2+j); pipei = C1_map[j].find(temp); if(pipei == C1_map[j].end()) { }

C1_map[j].insert(map::value_type(temp,1));

else { double j = pipei->second; }

pipei->second = j + 1;

}

}

if(trainData[i].A1 == 2) //求P(Xk|C2) 中Xk的个数 {

int j = 0;

for(;j< 13 ;j++) { }

double temp = *(&trainData[i].A2+j);

pipei = C2_map[j].find(temp); if(pipei == C2_map[j].end()) { } else { }

double j = pipei->second; pipei->second = j + 1;

C2_map[j].insert(map::value_type(temp,1));

}

if(trainData[i].A1 == 3) //求P(Xk|C3) 中Xk的个数 {

7

人工智能——贝叶斯分类器

自动化系模式识别与智能系统研究所 姓名:曹达 学号:23220101153239

}

int j = 0;

for(;j< 13 ;j++) { double temp = *(&trainData[i].A2+j); pipei = C3_map[j].find(temp); }

if(pipei == C3_map[j].end()) { } else { }

double j = pipei->second; pipei->second = j + 1;

C3_map[j].insert(map::value_type(temp,1));

}

//概率

for(i = 0; i < Shuxing; i++) {

for(pipei=C1_map[i].begin(); pipei!=C1_map[i].end(); ++pipei) { }

double num = pipei->second;

pipei->second = (double)num/(double)count1;

for(pipei=C2_map[i].begin(); pipei!=C2_map[i].end(); ++pipei) {

double num = pipei->second;

pipei->second = (double)num/(double)count2;

}

for(pipei=C3_map[i].begin(); pipei!=C3_map[i].end(); ++pipei) {

double num = pipei->second;

pipei->second = (double)num/(double)count3;

}

}

}

void houyan()//计算后验分布,找出最大值 {

int i,j,k;

double p[3];

for(i = 0; i

8

人工智能——贝叶斯分类器

自动化系模式识别与智能系统研究所 姓名:曹达 学号:23220101153239

double pXC[3]={0,0,0};

for(j = 0; j < 3; j++) { map::iterator pipei; //计算p(X|C1)

for(k = 0; k < Shuxing; k++) {

pipei = C1_map[k].find(*(&testData[i].A2+k)); if(pipei != C1_map[k].end()) { }

pXC[0] =pXC[0] + pipei->second;

}

p[0] = A[0] * pXC[0]; //计算p(X|C2)

for(k = 0; k < Shuxing; k++) {

pipei = C2_map[k].find(*(&testData[i].A2+k)); if(pipei != C2_map[k].end()) {

pXC[1] =pXC[1] + pipei->second;

} }

p[1] = A[1]*pXC[1]; //计算p(X|C3)

for(k = 0; k < Shuxing; k++) {

pipei = C3_map[k].find(*(&testData[i].A2+k)); if(pipei != C3_map[k].end()) { }

pXC[2] =pXC[2] + pipei->second;

}

p[2] = A[2]*pXC[2];

}

//找出最大值

if(p[0] > p[1] && p[0] >p[2]) { } else {

9

cout<

m++;

人工智能——贝叶斯分类器

自动化系模式识别与智能系统研究所 姓名:曹达 学号:23220101153239

}

if(p[1] > p[2])

{ cout<

} else {

cout<

m++; } }

}

void main() {

double tp,fp;

cout<<\概率最大值 \<<\所属类别\<

DataRead(testData,\); houyan(); tp=(double)m/51; fp=1-tp; cout<<\正确率为:\<

10

本文来源:https://www.bwwdw.com/article/5b5v.html

Top