我是机器学习的初学者。
过程是:
我有不同的日志文件(系统日志、MSSQL Server 日志、Linux 日志、MySQL 日志、FTP 日志、IIS 日志)。如果给出任何输入,我将使用机器学习技术找出哪种类型的日志。每个日志都有不同的格式。有些日志没有结构格式(Linux、MySQL 日志、FTP 日志)。在我的分析中,这些都是使用 KNN 算法(机器学习)实现的。但我不知道如何实现这个?请对此提出任何建议。
日志类型格式:
Linux系统日志:
Jan 5 08:39:01 iei-Virtual-Machine CRON[48622]: (root) CMD ( [ -x /usr/lib/php5/maxlifetime ] && [ -x /usr/lib/php5/sessionclean ] && [ -d /var/lib/php5 ] && /usr/lib/php5/sessionclean /var/lib/php5 $(/usr/lib/php5/maxlifetime))
MySQL错误日志:
2018-01-05 10:55:20 18856 [Warning] Unsafe statement is written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. Statements writing to a table with an auto-increment column after selecting from another table are unsafe because the order in which rows are retrieved determines what (if any) rows will be written. This order cannot be predicted and may differ between master and the slave. Statement: CALL SubmitGetChangeDetectionInfo(@_SubmitGetChangeDetectionInfo_0
事件簿:
IE038,System log,Error,20/11/2017 12:47:51 PM,TerminalServices-Printers,1111,None,Driver HP Deskjet 3520 series required for printer IEC057(Mahendran) Printer is unknown. Contact the administrator to install the driver before you log in again.
FTP日志:
2018-01-04 00:00:01 162.254.209.219 INFOEVOL\EC 192.168.0.13 63346
DataChannelOpened - - 0 0 c1df3130-60e6-4678-9dcd-39177cc60d06 -IISAppslog:
2017-11-06 03:25:16 192.168.0.13 GET /IEIAppsLogin.aspx
param=OwjgKJLT+ikfpFnxYbvZS/QWXTFP4GEXmT+qM7TeXTMqi5D7DKexzYjZc3aJNB0x 90 -
182.73.50.19 Mozilla/5.0+(Windows+NT+6.3;+Win64;+x64)+AppleWebKit/537.36+
(KHTML,+like+Gecko)+Chrome/61.0.3163.100+Safari/537.36
http://182.73.50.19/AppsLogin.aspx 302 0 0 234
MSSQL 服务器:
01/10/2018 07:07:07,Logon,Unknown,Login failed for user 'ms SQL'. Reason: Could not find a login matching the name provided. [CLIENT: 49.65.2.226]