博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
what difference between libfm and libffm
阅读量:7126 次
发布时间:2019-06-28

本文共 4769 字,大约阅读时间需要 15 分钟。

https://www.kaggle.com/users/25112/steffen-rendle/forum

 

Congratulations to Yu-Chin, Wei-Sheng, Yong and Michael!

There have been several questions about the relationship between FM and FFM. Here are my thoughts about the differences and similarities.

Notation

  • m categorical variables (="fields")
  • k is the factorization dimension of FM
  • k' is the factorization dimension of FFM

Models (slightly simplified)

  • FM is defined as

      y(x) = sum_i sum_j>i 〈v_i,v_j〉 x_i x_j

  • FFM is defined as

      y(x) = sum_i sum_j>i 〈v^J_i,v^I_j〉 x_i x_j

The difference between both models is that FFM assumes that the factors between interactions (e.g. v_i of (I,J) and v_i of (I,L)) are independent whereas FM uses a shared parameter space.

Number of parameters and costs

  • FFM has k' * (m-1) parameters per predictor variable.
  • FM has k parameters per predictor variable.
  • FFM has a runtime complexity of k' * m * (m-1) / 2 = O(k' * m^2) per training example
  • FM has a runtime complexity of k * m = O(k * m) per training example (because the nested sums can be decomposed due to parameter sharing).

That means from a cost point of view, an FFM with dimensionality k' should be compared to an FM with an m times larger dimension, i.e. k=k'*m. With this choice both FFM and FM have the same number of parameters (memory costs) and the same runtime complexity.

Expressiveness

FFM and FM have different assumptions on the interaction matrix V. But given a large enough k and k', both can represent any possible second order polynomial model.

The motivation of FM and FFM is to approximate the (unobserved) pairwise interaction matrix W of polynomial regression by a low rank solution V*V^t = W. FM and FFM have different assumptions how V looks like. FFM assumes that V has a block structure:

         | v^2_1  v^1_2  0      0      0     |
         | v^3_1  0      v^1_3  0      0     |
         | v^4_1  0      0      v^1_4  0     |
V(FFM) = | v^5_1  0      0      0      v^1_5 |
         | 0      v^3_2  v^2_3  0      0     |
         | 0      v^4_2  0      v^2_4  0     |
         | ...                               |

FM does not assume such a structure:

V(FM) = | v_1  v_2  v_3 v_4 v_5 |

(Note that v are not scalars but vectors of length k' (for FFM) or of length k (for FM). Also to shorten notation, one entry v in the matrices above represents all the v vectors of a "field"/ categorical variable.)

If the assumption of FFM holds, then FFM needs less parameters than FM to describe V because FM would need parameters to capture the 0s.

If the assumption of FM holds, then FM needs less parameters than FFM to describe V because FFM would need to repeat values of vectors as it requires separate parameters.

 

==============================

You are very welcome!

(2) They are similar but not the same. The FM model in the paper you provided

is field unaware. The difference between equation 1 in the paper and the
formula on page 14 of our slide is that our w is not only indexed by j1 and
j2, but also indexed by f1 and f2. Consider the example on page 15, if
Rendle's FM is applied, it becomes:

w376^Tw248x376x248 + w376^Tw571x376x571 + w376^Tw942x376x942

+ w248^Tw571x248x571 + w248^Tw942x248x942
+ w571^Tw942x571x942

BTW, we use k = 4, not k = 2.

Please let me know if you have more questions. :)

 

Inspector wrote:

1) I think this helped a lot. I was confused what the hashing trick was doing. I was thinking perhaps the value of a feature, say 5a9ed9b0, was REPLACED by an integer. So, I understand now that one-hot-encoding is still being used, it is just the indexing of the data which is improved (memory wise) when hashed.

2) So this is essentially the same model as shown in the Rendle paper  (equation 1) , where you used k=2 (the number of factors)? Is this correct?

Thanks very much!!

 

Some hints about the usage of libFM:

@Kapil: The order of features in the design matrix has no effect on the model -- for sure you should use the same ordering in training/ test set and in each line of each file. Theoretically there might be a difference because the learning algorithm iterates from the first to the last feature. So changing the order, might change the convergence slightly.

about K2: The larger K2, the more complex the model gets. Usually, the larger K2, the better, but too large values can also overfit. So start with small values of K2 and increase it (e.g. double it) until you get the best quality (on your holdout set). Runtime depends linearly on K2.

about generating libFM files: If your data is purely categorical and in some kind of CSV or TSV format, you can also use the Perl-script in the "script/"-folder of libFM to generate libFM-compatible files.

about "linear regression" and libFM: A factorization machine (=FM) includes linear regression. E.g. if you choose K2=0, then libFM does exactly the same as linear regression. If you choose K2>0, then an FM is "linear regression + second order polynomial regression with factorized pairwise interactions".

转载地址:http://exhel.baihongyu.com/

你可能感兴趣的文章
Centos 7.0 Linux - 给普通用户加sudo权限
查看>>
load data infile出现“ERROR 13 (HY000): Can't get stat of '/tmp/test2.txt' (Errcode: 2)”问题
查看>>
主席树的学习
查看>>
TOPCODER->使用方法->(1)如何注册
查看>>
(How to)使用IE9的F12开发人员工具分析模拟登陆网站(百度首页)的内部逻辑过程
查看>>
matplotlib的函数作用
查看>>
(转)医疗IT运维系统
查看>>
Oracle 给表添加备注
查看>>
python 设计模式之备忘录模式 (还没开始写)
查看>>
Linux 如何测试 IO 性能(磁盘读写速度)
查看>>
在vs中跑动ransac
查看>>
如果说需要注册数据中心,这样才能使用demo部署数据中心license证需要申请,使用云之间-工作流程.........
查看>>
idea之快速查看类所在jar包
查看>>
Android自定义控件_自绘控件
查看>>
[C语言] 时间操作,把1970年开始秒数计算的时间,转换为字符串格式输出;
查看>>
HDU 4871 Shortest-path tree(树分治+spfa)(待续)
查看>>
记录C#-WPF布局面板
查看>>
H.264开源解码器评测
查看>>
js简单解密(eval解密)
查看>>
win10屏幕变灰怎么解决?
查看>>