·ÖÀàÄ£Ð͵ij¡¾°£º 1.Ô¤²â»¥ÁªÍøÓû§¶ÔÔÚÏß¹ã¸æµÄµã»÷¸ÅÂÊ£¨¶þ·ÖÀàÎÊÌ⣩£»2.¼ì²âÆÛÕ©£¨¶þ·ÖÀàÎÊÌ⣬ÆÛÕ©»òÕß²»ÆÛÕ©£©£»3.Ô¤²âÍÏÇ·´û¿î£¨¶þ·ÖÀàÎÊÌ⣩£»4.¶ÔͼƬ¡¢ÒôƵ¡¢ÊÓÆµ½øÐзÖÀࣨ¶à·ÖÀàÎÊÌ⣩£»5.¶ÔÐÂÎÅ¡¢ÍøÒ³»òÕ߯äËûÄÚÈݽøÐзÖÀà»òÕß´ò±êÇ©£¨¶à·ÖÀࣩ£»6.·¢ÏÖÀ¬»øÓʼþ¡¢À¬»øÒ³Ãæ¡¢ÍøÂçÈëÇÖºÍÆäËû¶ñÒâÐÐΪ
£»7.¼ì²â¹ÊÕÏ£¬±ÈÈç¼ÆËã»úϵͳ»òÕßÍøÂç¹ÊÕϼì²â£»8.Ô¤²â¹Ë¿Í»òÕ߿ͻ§ÖÐËÓпÉÄÜֹͣʹÓÃij¸ö²úÆ·»ò·þÎñ¡£
·ÖÀàÄ£Ð͵ÄÖÖÀࣺ
1.ÏßÐÔÄ£ÐÍ
ÔÀí£º¶ÔÑù±¾µÄÔ¤²â½á¹û£¨Ä¿±ê±äÁ¿£©½øÐн¨Ä££¬¼´¶ÔÊäÈëÌØÕ÷Ó¦Óüòµ¥µÄÏßÐÔÔ¤²âº¯Êý¡£
1.1 Âß¼»Ø¹é
1.2 ÏßÐÔÖ§³ÖÏòÁ¿»ú
2.ÆÓËØ±´Ò¶Ë¹Ä£ÐÍ£¨Ç°Ì᣺¼ÙÉè¸÷¸öÌØÕ÷Ö®¼äÌõ¼þ¶ÀÁ¢£©
3.¾ö²ßÊ÷Ä£ÐÍ
ÈýÖÖ·ÖÀàÄ£Ð͵ľö²ßº¯Êý£º
¡¡¡¡
MLlib¹¹½¨·ÖÀàÄ£ÐÍÈçºÎʹÓã¿
1.´ÓÊý¾Ý¼¯ÖгéÈ¡ºÏÊʵÄÌØÕ÷
MLlibÖеķÖÀàÄ£ÐÍͨ¹ýLabelPoint¶ÔÏó²Ù×÷£¬ÆäÖÐÁËÄ¿±ê±äÁ¿£¨Label£©ÓëÌØÕ÷ÏòÁ¿£º
case class LabePoint(label : Double, features : Vector)
´ýѵÁ·Êý¾Ý£º
val data = records.map{ r => ... LabelPoint(lable,
Vectors.dense(features)) }
2.ѵÁ··ÖÀàÄ£ÐÍ
2.1 ѵÁ·ÏßÐԻعéÄ£ÐÍ£º
2.2 ѵÁ·SVMÄ£ÐÍ
2.3 ѵÁ·NaiveBaysÄ£ÐÍ
2.4 ѵÁ·¾ö²ßÊ÷Ä£ÐÍ
3.ʹÓ÷ÖÀàÄ£ÐÍÔ¤²â
val prediction = xxModel.predict(dataPoint.features)
×¢Ò⣺Âß¼»Ø¹é¡¢SVM¡¢NaiveBaysÔÚ¶þ·ÖÀàÖеÄÔ¤²âֵΪ1 »ò 0£¬µ«¾ö²ßÊ÷µÄÔ¤²âֵΪ0µ½1Ö®¼äµÄʵÊý£¬Ê¹ÓÃʱÐèҪʹÓÃãÐÖµÅжϡ£
¡¡¡¡
ÆÀ¹À·ÖÀàÄ£Ð͵ÄÐÔÄÜ
ͨ³£ÔÚ¶þ·ÖÀàÖÐʹÓÃµÄÆÀ¹À·½·¨°üÀ¨£ºÔ¤²âÕýÈ·ÂÊÓë´íÎóÂÊ¡¢×¼È·ÂÊÓëÕÙ»ØÂÊ¡¢×¼È·ÂÊ-ÕÙ»ØÂÊÏ·½µÄÃæ»ý¡¢ROCÇúÏß¡¢ROCÇú
ÏßϵÄÃæ»ýºÍF-Measure¡£
?´íÎóÂÊ£ºÑµÁ·Ñù±¾Öб»´íÎó·ÖÀàµÄÊýÄ¿³ýÒÔÑù±¾×ÜÊý? ׼ȷÂÊ(Precision)
= ϵͳ¼ìË÷µ½µÄÏà¹ØÎļþ / ϵͳËùÓмìË÷µ½µÄÎļþ×ÜÊý£¨ËÑË÷ÒýÇæ£©? ÕÙ»ØÂÊ(Recall) = ϵͳ¼ìË÷µ½µÄÏà¹ØÎļþ
/ ϵͳËùÓÐÏà¹ØµÄÎļþ×ÜÊý£¨ËÑË÷ÒýÇæ£©

ͨ³££¬×¼È·ÂʺÍÕÙ»ØÂʵظºÏà¹Ø£¬¸ß׼ȷÂʳ£³£¶ÔÓ¦µÍÕÙ»ØÂÊ£¬·´Ö®ÒàÈ»¡£Èç¹ûÁ½Õß¶¼µÍ£¬ËµÃ÷Ä£ÐͳöÁËÎÊÌâ¡£
²»Í¬³¡¾°ÏÂ׼ȷÂʺÍÕÙ»ØÂʵÄÒªÇó²»Í¬£¬Èç¹ûÊÇ×öËÑË÷£¬ÄǾÍÊDZ£Ö¤ÕٻصÄÇé¿öÏÂÌáÉý׼ȷÂÊ£¨Äþ¿É´íÅУ¬²»ÄÜ©ÅУ©£»Èç¹û×ö¼²²¡¼à²â¡¢·´À¬»ø£¬ÔòÊDZ£×¼È·ÂʵÄÌõ¼þÏ£¬ÌáÉýÕٻأ¨Äþ¿É©ÅУ¬²»ÄÜ´íÅУ©¡£
ROC(Receiver Operating Characteristic)ÇúÏߣº
ÕæÑôÐÔÂÊ£ºÕæÑôÐÔµÄÑù±¾Êý / (ÕæÑôÐÔ + ¼ÙÒõÐÔÑù±¾ÊýÖ®ºÍ)¼ÙÑôÐÔÂÊ£º¼ÙÑôÐÔµÄÑù±¾Êý
/ (¼ÙÑôÐÔ + ÕæÒõÐÔÑù±¾ÊýÖ®ºÍ)AUC£ºROCϵÄÃæ»ý£¨AUCΪ1.0ʱ±íʾһ¸öÍêÃÀµÄ·ÖÀàÆ÷£¬0.5±íʾһ¸öËæ»úµÄÐÔÄÜ£©

¡¡¡¡
˵Ã÷£º
ÓÃROC curveÀ´±íʾ·ÖÀàÆ÷µÄperformanceºÜÖ±¹ÛºÃÓ᣿ÉÊÇ£¬ÈËÃÇ×ÜÊÇÏ£ÍûÄÜÓÐÒ»¸öÊýÖµÀ´±êÖ¾·ÖÀàÆ÷µÄºÃ»µ¡£
ÓÚÊÇArea Under roc Curve(AUC)¾Í³öÏÖÁË¡£¹ËÃû˼Ò壬AUCµÄÖµ¾ÍÊÇ´¦ÓÚROC
curveÏ·½µÄÄDz¿·ÖÃæ»ýµÄ´óС¡£
ͨ³££¬AUCµÄÖµ½éÓÚ0.5µ½1.0Ö®¼ä£¬½Ï´óµÄAUC´ú±íÁ˽ϺõÄPerformance¡£
P/RºÍROCÊÇÁ½¸ö²»Í¬µÄÆÀ¼ÛÖ¸±êºÍ¼ÆË㷽ʽ£¬Ò»°ãÇé¿öÏ£¬¼ìË÷ÓÃǰÕߣ¬·ÖÀࡢʶ±ðµÈÓúóÕß¡£
APÊÇΪ½â¾öP£¬R£¬F-measureµÄµ¥µãÖµ¾ÖÏÞÐԵġ£ÎªÁ˵õ½ Ò»¸öÄܹ»·´Ó³È«¾ÖÐÔÄܵÄÖ¸±ê£¬¿ÉÒÔ¿´¿¼²ìÏÂͼ¡£
¡¡¡¡
¿ÉÒÔ¿´³ö£¬ËäÈ»Á½¸öϵͳµÄÐÔÄÜÇúÏßÓÐËù½»µþµ«ÊÇÒÔÔ²µã±êʾµÄϵͳµÄÐÔÄÜÔÚ¾ø´ó¶àÊýÇé¿öÏÂÒªÔ¶ºÃÓÚÓ÷½¿é±êʾµÄϵͳ¡£´ÓÖÐÎÒÃÇ¿ÉÒÔ
·¢ÏÖÒ»µã£¬Èç¹ûÒ»¸öϵͳµÄÐÔÄܽϺã¬ÆäÇúÏßÓ¦µ±¾¡¿ÉÄܵÄÏòÉÏÍ»³ö¡£
¸ü¼Ó¾ßÌåµÄ£¬ÇúÏßÓë×ø±êÖáÖ®¼äµÄÃæ»ýÓ¦µ±Ô½´ó¡£×îÀíÏëµÄϵͳ£¬ Æä°üº¬µÄÃæ»ýÓ¦µ±ÊÇ1¡£
ÈçºÎÌáÉýÄ£Ð͵ÄÐÔÄÜ£¿
1.ÌØÕ÷±ê×¼»¯
ÔʼÊý¾Ý±ê×¼»¯ºóÌØÕ÷Âú×ãÕý̬·Ö²¼£¬¼´Ã¿¸öÌØÕ÷ÊÇ0¾ùÖµºÍµ¥Î»±ê×¼²î£¬·½·¨£º
±ê×¼»¯¹¤¾ß£º
SparkµÄStandardScaler·½·¨£¬
¡¡¡¡

2.¶ÔÀà±ðÌØÕ÷ʹÓà 1-of-k±àÂë
ÀýÈ磺ij¸öÌØÕ÷ÓÐ10¸öÀà±ð£¬ÔòÐè´´½¨Ò»¸ö³¤Îª10µÄÏòÁ¿£¬È»ºó¸ù¾ÝÑù±¾ËùÊôÀà±ðË÷Òý£¬½«¶ÔÓ¦µÄά¶È¸³Öµ1£¬ÆäËûΪ0¡£
3.Ä£ÐͲÎÊýµ÷ÓÅ
¸ù¾Ý²»Í¬Ä£ÐÍѵÁ·Ê±Ê¹ÓõIJ»Í¬²ÎÊý£¬ÀûÓÃÉÏһҳģÐÍÐÔÄÜÆÀ¼ÛÖ¸±ê£¬ÑµÁ·³ö×î¼ÑµÄÄ£ÐÍ¡£
4.½»²æÑéÖ¤
ÔÀí£º²âÊÔÄ£ÐÍÔÚδ֪Êý¾ÝÉϵÄÐÔÄÜ
·½Ê½£º½«Êý¾ÝËæ»úµÄ·ÖΪѵÁ·¼¯ºÍ²âÊÔ¼¯£¬³£Ó÷ַ¨50/50¡¢60/40¡¢80/20¡£
|