文章詳情頁

Python機器學(xué)習(xí)之底層實現(xiàn)KNN

瀏覽：23日期：2022-06-16 11:07:57

一、導(dǎo)入數(shù)據(jù)

借助python自帶的pandas庫導(dǎo)入數(shù)據(jù)，很簡單。用的數(shù)據(jù)是下載到本地的紅酒集。

代碼如下（示例）：

import pandas as pddef read_xlsx(csv_path): data = pd.read_csv(csv_path) print(data) return data二、歸一化

KNN算法中將用到距離，因此歸一化是一個重要步驟，可以消除數(shù)據(jù)的量綱。我用了歸一化，消除量綱也可以用標準化，但是作為新手，我覺得歸一化比較簡單。

其中最大最小值的計算用到了python中的numpy庫，pandas導(dǎo)入的數(shù)據(jù)是DateFrame形式的，np.array()用來將DateFrame形式轉(zhuǎn)化為可以用numpy計算的ndarray形式。

代碼如下（示例）：

import numpy as npdef MinMaxScaler(data): col = data.shape[1] for i in range(0, col-1):arr = data.iloc[:, i]arr = np.array(arr) #將DataFrame形式轉(zhuǎn)化為ndarray形式，方便后續(xù)用numpy計算min = np.min(arr)max = np.max(arr)arr = (arr-min)/(max-min)data.iloc[:, i] = arr return data三、分訓(xùn)練集和測試集

先將數(shù)據(jù)值和標簽值分別用x和y劃分開，設(shè)置隨機數(shù)種子random_state，若不設(shè)置，則每次運行的結(jié)果會不相同。test_size表示測試集比例。

def train_test_split(data, test_size=0.2, random_state=None): col = data.shape[1] x = data.iloc[:, 0:col-1] y = data.iloc[:, -1] x = np.array(x) y = np.array(y) # 設(shè)置隨機種子，當隨機種子非空時，將鎖定隨機數(shù) if random_state:np.random.seed(random_state)# 將樣本集的索引值進行隨機打亂# permutation隨機生成0-len(data)隨機序列 shuffle_indexs = np.random.permutation(len(x)) # 提取位于樣本集中20%的那個索引值 test_size = int(len(x) * test_size) # 將隨機打亂的20%的索引值賦值給測試索引 test_indexs = shuffle_indexs[:test_size] # 將隨機打亂的80%的索引值賦值給訓(xùn)練索引 train_indexs = shuffle_indexs[test_size:] # 根據(jù)索引提取訓(xùn)練集和測試集 x_train = x[train_indexs] y_train = y[train_indexs] x_test = x[test_indexs] y_test = y[test_indexs] # 將切分好的數(shù)據(jù)集返回出去 # print(y_train) return x_train, x_test, y_train, y_test四、計算距離

此處用到歐氏距離，pow()函數(shù)用來計算冪次方。length指屬性值數(shù)量，在計算最近鄰時用到。

def CountDistance(train,test,length): distance = 0 for x in range(length):distance += pow(test[x] - train[x], 2)**0.5 return distance五、選擇最近鄰

計算測試集中的一條數(shù)據(jù)和訓(xùn)練集中的每一條數(shù)據(jù)的距離，選擇距離最近的k個，以少數(shù)服從多數(shù)原則得出標簽值。其中argsort返回的是數(shù)值從小到大的索引值，為了找到對應(yīng)的標簽值。

tip:用numpy計算眾數(shù)的方法

import numpy as np#bincount（）：統(tǒng)計非負整數(shù)的個數(shù)，不能統(tǒng)計浮點數(shù)counts = np.bincount(nums)#返回眾數(shù)np.argmax(counts)

少數(shù)服從多數(shù)原則，計算眾數(shù)，返回標簽值。

def getNeighbor(x_train,test,y_train,k): distance = [] #測試集的維度 length = x_train.shape[1] #測試集合所有訓(xùn)練集的距離 for x in range(x_train.shape[0]):dist = CountDistance(test, x_train[x], length)distance.append(dist) distance = np.array(distance) #排序 distanceSort = distance.argsort() # distance.sort(key= operator.itemgetter(1)) # print(len(distance)) # print(distanceSort[0]) neighbors =[] for x in range(k):labels = y_train[distanceSort[x]]neighbors.append(labels)# print(labels) counts = np.bincount(neighbors) label = np.argmax(counts) # print(label) return label

調(diào)用函數(shù)時：

getNeighbor(x_train,x_test[0],y_train,3)六、計算準確率

用以上KNN算法預(yù)測測試集中每一條數(shù)據(jù)的標簽值，存入result數(shù)組，將預(yù)測結(jié)果與真實值比較，計算預(yù)測正確的個數(shù)與總體個數(shù)的比值，即為準確率。

def getAccuracy(x_test,x_train,y_train,y_test): result = [] k = 3 # arr_label = getNeighbor(x_train, x_test[0], y_train, k) for x in range(len(x_test)):arr_label = getNeighbor(x_train, x_test[x], y_train, k)result.append(arr_label) correct = 0 for x in range(len(y_test)):if result[x] == y_test[x]: correct += 1 # print(correct) accuracy = (correct / float(len(y_test))) * 100.0 print('Accuracy:', accuracy, '%') return accuracy總結(jié)

KNN算是機器學(xué)習(xí)中最簡單的算法，實現(xiàn)起來相對簡單，但對于我這樣的新手，還是花費了大半天時間才整出來。

在github上傳了項目：https://github.com/chenyi369/KNN

到此這篇關(guān)于Python機器學(xué)習(xí)之底層實現(xiàn)KNN的文章就介紹到這了,更多相關(guān)Python底層實現(xiàn)KNN內(nèi)容請搜索好吧啦網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持好吧啦網(wǎng)！

Python 編程

上一條：Python pygame實現(xiàn)中國象棋單機版源碼下一條：用python搭建一個花卉識別系統(tǒng)

相關(guān)文章：

1. JSP數(shù)據(jù)交互實現(xiàn)過程解析2. jsp實現(xiàn)登錄界面3. msxml3.dll 錯誤 800c0019 系統(tǒng)錯誤:-2146697191解決方法4. CSS3實現(xiàn)動態(tài)翻牌效果仿百度貼吧3D翻牌一次動畫特效5. 刪除docker里建立容器的操作方法6. 概述IE和SQL2k開發(fā)一個XML聊天程序7. XML入門的常見問題(二)8. asp批量添加修改刪除操作示例代碼9. jsp+servlet實現(xiàn)猜數(shù)字游戲10. jsp實現(xiàn)簡單用戶7天內(nèi)免登錄

排行榜

					
					Vue中父子組件的值傳遞與方法傳遞
配置PHP使之能同時支持GIF和JPEG
Python xlrd/xlwt 創(chuàng)建excel文件及常用操作
刪除docker里建立容器的操作方法
將Git存儲庫克隆到本地IntelliJ IDEA項目中的詳細教程
python爬蟲使用requests發(fā)送post請求示例詳解
JSP數(shù)據(jù)交互實現(xiàn)過程解析
詳解c#與js的rsa加密互通
Java SE 6中JDBC 4.0的增強特性
ASP.NET MVC使用異步Action的方法
CSS3實現(xiàn)動態(tài)翻牌效果 仿百度貼吧3D翻牌一次動畫特效