{ Snipperize } /hash

Snippets about hash

Here are the latest snippets talking about hash. Please choose your favorite one or add a new one.

利用simhash来进行文本去重复

传统的hash函数能够将一样的文本生成一样的hash函数,但是,通过simhash方法,能够差不多相同的文档得到的hash函数也比较相近。 Charikar's hash 通过Charikar‘s hash,能够将比较相似度的文档得到比较相近的fingerprint。 该算法的流程如下: * Document is split into tokens (words for example) or super-tokens (word tuples) * Each token is represented by its hash value; a traditional hash function is used * Weights are associated with tokens * A vector V of integers is initialized to 0, length of the vector corresponds to the desired hash size in bits * In a cycle for all token's hash values (h), vector V is updated: o ith element is decreased by token's weight if the ith bit of the hash h is 0, otherwise o ith element is increased by token's weight if the ith bit of the hash h is 1 * Finally, signs of elements of V corresponds to the bits of the final fingerprint 该hash不是将文档总体计算hash值,而是将文档中的每个token计算哈希值,对文档中每个token的hash值,按照位 对hash值进行求和,如果当前token的hash值在该位上是0,则减去1,如果在该位上是1,则加上1.将所有的token按照这种方式累加,求的最终的值作为fingerprint。

Python / simhash, hash, Charikar, similarity, duplicate / by ThePeppersStudio (25 days, 6.81 hours ago)

Generate Unique Random Hash

PHP / mt_rand, uniqid, sha1, hash / by ThePeppersStudio (308 days, 7.79 hours ago)

Create Mysql Hash in Python

MySQL 4.1+ hashes are double-SHA1 with a "*" tagged on the front.

Python / mysql, hash, sha1 / by ThePeppersStudio (389 days, 14.76 hours ago)

  • 1