贝叶斯分类器报告
更新时间:2023-11-19 01:04:01 阅读量: 教育文库 文档下载
人工智能——贝叶斯分类器
自动化系模式识别与智能系统研究所 姓名:曹达 学号:23220101153239
实验报告
一、实验目的
通过上机编程加深对贝叶斯分类器分类过程的理解,同时提高分析问题、解决问题、实际操作的能力。
二、实验数据说明
实验数据来源于http://archive.ics.uci.edu/ml/,详细说明请见附件一。
数据源的完整名称是Wine Data Set,是对3种不同的酒进行分类。这三种酒包括13种不同的属性。13种属性分别为:Alcohol,Malic acid,Ash,Alcalinity of ash,Magnesium,Total phenols,Flavanoids,Nonflavanoid phenols,Proanthocyanins,Color intensity,Hue,OD280/OD315 of diluted wines,Proline。在 “wine.data”文件中,每行代表一种酒的样本,共有178个样本;一共有14列,其中,第一列为类标志属性,共有三类,分别记为“1”,“2”,“3”;后面的13列为每个样本的对应属性的样本值。其中第1类有59个样本,第2类有71个样本,第3类有48个样本。
三、朴素贝叶斯分类算法分析
贝叶斯分类器是用于分类的贝叶斯网络。该网络中应包含类结点C,其中C 的取值来自于类集合( c1 , c2 , ... , cm),还包含一组结点X = ( X1 , X2 , ... , Xn),表示用于分类的特征。对于贝叶斯网络分类器,若某一待分类的样本D,其分类特征值为x = ( x1 , x2 , ... , x n) ,则样本D 属于类别ci 的概率P( C = ci | X1 = x1 , X2 = x 2 , ... , Xn = x n) ,( i = 1 ,2 , ... , m) 应满足下式:
P( C = ci | X = x) = Max{ P( C = c1 | X = x) , P( C = c2 | X = x ) , ... , P( C = cm | X = x ) }
而由贝叶斯公式:
P( C = ci | X = x) = P( X = x | C = ci) * P( C = ci) / P( X = x)
其中,P( C = ci) 可由领域专家的经验得到,而P( X = x | C = ci) 和P( X = x) 的计算则较困难。
四.实验结果
使用贝叶斯分类算法,最后得到测试样本是属于哪个类的酒。以下是详细实现过程。
1
人工智能——贝叶斯分类器
自动化系模式识别与智能系统研究所 姓名:曹达 学号:23220101153239
实验结果如下图所示。实验源码请见附件三。
2
人工智能——贝叶斯分类器
自动化系模式识别与智能系统研究所 姓名:曹达 学号:23220101153239
附件一:
实验数据说明:
1. Title of Database: Wine recognition data Updated Sept 21, 1998 by C.Blake : Added attribute information
2. Sources:
(a) Forina, M. et al, PARVUS - An Extendible Package for Data Exploration, Classification and Correlation. Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno, 16147 Genoa, Italy. (b) Stefan Aeberhard, email: stefan@coral.cs.jcu.edu.au (c) July 1991 3. Past Usage:
(1)S. Aeberhard, D. Coomans and O. de Vel, Comparison of Classifiers in High Dimensional Settings, Tech. Rep. no. 92-02, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland. (Also submitted to Technometrics). The data was used with many others for comparing various classifiers. The classes are separable, though only RDA has achieved 100% correct classification.(RDA : 100%, QDA 99.4%, LDA 98.9%, 1NN 96.1% (z-transformed data)) (All results using the leave-one-out technique) In a classification context, this is a well posed problem with \behaved\class structures. A good data set for first testing of a new classifier, but not very challenging.
(2) S. Aeberhard, D. Coomans and O. de Vel,\RDA\Tech. Rep. no. 92-01, (1992), Dept. of Computer Science and Dept. of Mathematics and Statistics, James Cook University of North Queensland.(Also submitted to Journal of Chemometrics).Here, the data was used to illustrate the superior performance of the use of a new appreciation function with RDA. 4. Relevant Information:
-- These data are the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars.The analysis determined the quantities of 13 constituents found in each of the three types of wines. -- I think that the initial data set had around 30 variables, but for some reason I only have the 13 dimensional version.I had a list of what the 30 or so variables were, but a.) I lost it, and b.), I would not know which 13 variables are included in the set.
-- The attributes are (dontated by Riccardo Leardi, riclea@anchem.unige.it ) 1) Alcohol
2) Malic acid
3) Ash
4) Alcalinity of ash 5) Magnesium
6) Total phenols 7) Flavanoids
8) Nonflavanoid phenols 9) Proanthocyanins
3
人工智能——贝叶斯分类器
自动化系模式识别与智能系统研究所 姓名:曹达 学号:23220101153239
10)Color intensity
11)Hue
12)OD280/OD315 of diluted wines 13)Proline 5. Number of Instances class 1 59 class 2 71
class 3 48
6. Number of Attributes
13
All attributes are continuous.No statistics available, but suggest to 7. For Each Attribute:
standardise variables for certain uses (e.g. for us with classifiers which are NOT scale invariant)
NOTE: 1st attribute is class identifier (1-3) 8. Missing Attribute Values:
None
9. Class Distribution: number of instances per class class 1 59
class 2 71 class 3 48
附件二:
实验数据:http://archive.ics.uci.edu/ml/machine-learning-databases/wine/
附件三: 源程序:
BayesianClassifier.h #include
// 4) Alcalinity of ash // 5) Magnesium
// 6) Total phenols // 7) Flavanoids
// 8) Nonflavanoid phenols
4
人工智能——贝叶斯分类器
自动化系模式识别与智能系统研究所 姓名:曹达 学号:23220101153239
// 9) Proanthocyanins
// 10)Color intensity // 11)Hue
// 12)OD280/OD315 of diluted wines // 13)Proline int TrainNum = 130; 所有训练数据的范围 int TestNum = 48; struct OriginalData {
double A1; double A2; double A3; double A4;
//
double A5; double A6;
double A7; double A8; double A9; double A10; double A11;
double A12; double A13; double A14; };
BayesianClassifier.cpp #include
#include
#include %using namespace std;
const int Shuxing=13;//属性总数
ifstream f;
vector
//存放每一类型,每种属性中某数值的概率 map
void DataRead(vector
5
人工智能——贝叶斯分类器
自动化系模式识别与智能系统研究所 姓名:曹达 学号:23220101153239
f.open(fileName); int ZHjiang;
if (fileName[0] == 'w') ZHjiang = TrainNum; else
ZHjiang = TestNum; string line;
OriginalData wine;
for (int i = 0; i < ZHjiang; i++) {
f >> line;
while (line.find(',') > 0 && line.find(',') < line.length()) {
line[line.find(',')] = ' ';
}
istringstream stream(line);
stream >> wine.A1 >> wine.A2 >> wine.A3 >> wine.A4 >> wine.A5 >> wine.A6 >> wine.A7 >> wine.A8 >> wine.A9 >> wine.A10 >> wine.A11 >> wine.A12 >> wine.A13 >> wine.A14;
}
data.push_back(wine);
f.close(); }
void bayes() {
int count1 = 0, count2 = 0, count3 = 0; int i;
for(i = 0; i < TrainNum ; i++) { if(trainData[i].A1 == 1)
{
count1 ++;
}
if(trainData[i].A1 == 2) { }
if(trainData[i].A1 == 3) {
count3 ++;
}//统计三类数据,各自求和
count2 ++;
}
A[0] = (double)count1/(double)TrainNum; //求先验概率 A[1] = (double)count2/(double)TrainNum;
6
人工智能——贝叶斯分类器
自动化系模式识别与智能系统研究所 姓名:曹达 学号:23220101153239
A[2] = (double)count3/(double)TrainNum;
map
{
int j=0;
for(;j< 13 ;j++) {
double temp = *(&trainData[i].A2+j); pipei = C1_map[j].find(temp); if(pipei == C1_map[j].end()) { }
C1_map[j].insert(map
else { double j = pipei->second; }
pipei->second = j + 1;
}
}
if(trainData[i].A1 == 2) //求P(Xk|C2) 中Xk的个数 {
int j = 0;
for(;j< 13 ;j++) { }
double temp = *(&trainData[i].A2+j);
pipei = C2_map[j].find(temp); if(pipei == C2_map[j].end()) { } else { }
double j = pipei->second; pipei->second = j + 1;
C2_map[j].insert(map
}
if(trainData[i].A1 == 3) //求P(Xk|C3) 中Xk的个数 {
7
人工智能——贝叶斯分类器
自动化系模式识别与智能系统研究所 姓名:曹达 学号:23220101153239
}
int j = 0;
for(;j< 13 ;j++) { double temp = *(&trainData[i].A2+j); pipei = C3_map[j].find(temp); }
if(pipei == C3_map[j].end()) { } else { }
double j = pipei->second; pipei->second = j + 1;
C3_map[j].insert(map
}
//概率
for(i = 0; i < Shuxing; i++) {
for(pipei=C1_map[i].begin(); pipei!=C1_map[i].end(); ++pipei) { }
double num = pipei->second;
pipei->second = (double)num/(double)count1;
for(pipei=C2_map[i].begin(); pipei!=C2_map[i].end(); ++pipei) {
double num = pipei->second;
pipei->second = (double)num/(double)count2;
}
for(pipei=C3_map[i].begin(); pipei!=C3_map[i].end(); ++pipei) {
double num = pipei->second;
pipei->second = (double)num/(double)count3;
}
}
}
void houyan()//计算后验分布,找出最大值 {
int i,j,k;
double p[3];
for(i = 0; i 8 人工智能——贝叶斯分类器 自动化系模式识别与智能系统研究所 姓名:曹达 学号:23220101153239 double pXC[3]={0,0,0}; for(j = 0; j < 3; j++) { map for(k = 0; k < Shuxing; k++) { pipei = C1_map[k].find(*(&testData[i].A2+k)); if(pipei != C1_map[k].end()) { } pXC[0] =pXC[0] + pipei->second; } p[0] = A[0] * pXC[0]; //计算p(X|C2) for(k = 0; k < Shuxing; k++) { pipei = C2_map[k].find(*(&testData[i].A2+k)); if(pipei != C2_map[k].end()) { pXC[1] =pXC[1] + pipei->second; } } p[1] = A[1]*pXC[1]; //计算p(X|C3) for(k = 0; k < Shuxing; k++) { pipei = C3_map[k].find(*(&testData[i].A2+k)); if(pipei != C3_map[k].end()) { } pXC[2] =pXC[2] + pipei->second; } p[2] = A[2]*pXC[2]; } //找出最大值 if(p[0] > p[1] && p[0] >p[2]) { } else { 9 cout< m++; 人工智能——贝叶斯分类器 自动化系模式识别与智能系统研究所 姓名:曹达 学号:23220101153239 } if(p[1] > p[2]) { cout< } else { cout< m++; } } } void main() { double tp,fp; cout<<\概率最大值 \<<\所属类别\< DataRead(testData,\); houyan(); tp=(double)m/51; fp=1-tp; cout<<\正确率为:\< 10
正在阅读:
贝叶斯分类器报告11-19
杨美玲:京城地产真正女大亨04-17
接待工作应注意的三个问题06-25
绍兴市市属中小学教学案例获奖名单04-25
中职生班主任学年评语-精品范文文档06-07
2010宁夏回族自治区高考试卷含解析理论考试试题及答案07-18
“学雷锋,树新风”主题班会02-16
学雷锋 树新风主题班会教案大全08-22
化工原理王晓红版习题答案第三章09-25
- exercise2
- 铅锌矿详查地质设计 - 图文
- 厨余垃圾、餐厨垃圾堆肥系统设计方案
- 陈明珠开题报告
- 化工原理精选例题
- 政府形象宣传册营销案例
- 小学一至三年级语文阅读专项练习题
- 2014.民诉 期末考试 复习题
- 巅峰智业 - 做好顶层设计对建设城市的重要意义
- (三起)冀教版三年级英语上册Unit4 Lesson24练习题及答案
- 2017年实心轮胎现状及发展趋势分析(目录)
- 基于GIS的农用地定级技术研究定稿
- 2017-2022年中国医疗保健市场调查与市场前景预测报告(目录) - 图文
- 作业
- OFDM技术仿真(MATLAB代码) - 图文
- Android工程师笔试题及答案
- 生命密码联合密码
- 空间地上权若干法律问题探究
- 江苏学业水平测试《机械基础》模拟试题
- 选课走班实施方案
- 叶斯
- 报告
- 分类