English | 简体中文 | 繁體中文 | Русский язык | Français | Español | Português | Deutsch | 日本語 | 한국어 | Italiano | بالعربية

Python Uses Apriori Algorithm for Association Rule Mining

Finding implicit relationships between items in large-scale datasets is known as association analysis or association rule learning. The process is divided into two steps:1. Extracting frequent itemsets.2. Extracting association rules from frequent itemsets.

Frequent itemsets refer to a collection of items that often appear together.
Association rules suggest a strong relationship between two items.
The support of an itemset is defined as the proportion of records in the dataset that contain the itemset, used to represent the frequency of the itemset. Support is defined on the itemset.
Credibility or confidence is a measure for a rule like {Diapers}-The association rules for {Wine} are defined. The credibility of this rule is defined as 'Support ({Diapers, Wine})'/Support ({Diapers})

Finding frequent itemsets

Apriori principle: If an itemset is frequent, then all its subsets are also frequent. Conversely, if an itemset is a non-frequent itemset, then all its supersets are also non-frequent.

The Apriori algorithm is a method for discovering frequent itemsets. The algorithm first generates a list of itemsets containing all single items, then scans the transaction records to check which itemsets meet the minimum support requirements. Those itemsets that do not meet the minimum support requirements are removed. Then, the remaining sets are combined to generate itemsets containing two elements. Next, the transaction records are scanned again, and itemsets that do not meet the minimum support requirements are removed. This process is repeated until all itemsets are removed.
Apriori pseudocode

When the number of items in the list is greater than 0:
    Check the data to confirm that each itemset is frequent
    Retain frequent itemsets and build k+1List of candidate itemsets composed of items

Mining association rules from frequent itemsets

If the credibility is greater than the minimum credibility, it can be considered as containing association rules. It can be observed that if a rule does not meet the minimum credibility requirement, then all subsets of the rule will also not meet the minimum credibility requirement.
You can start with a frequent itemset, then create a rule list, where the right part of the rule contains only one element, and then test these rules, merge them next, and create a new rule list by merging all the remaining rule rights, where the rule right part contains two elements, and so on.

Each frequent itemset:
    while(len(L)>1)
        (k rule list)
        Satisfy the minimum confidence
        Create k+1Rules

Overall code:

import numpy as np
def loadDataSet():
  return [[1, 3, 4], [2, 3, 5], [1, 2, 3, 5], [2, 5]
def createC1(dateSet):
  c1 = []
  for line in dateSet:
    for item in line:
      if not [item] in c1:
        c1.append([item])
  c1.sort()
  return list(map(frozenset,c1))
def scanData(data,ck,minSupport):#Find itemsets that meet the minimum support degree
  ssCnt = {}
  for tid in data:
    for can in ck:
      if can.issubset(tid):
        if can not in ssCnt.keys():
          ssCnt[can] = 0
        ssCnt[can] += 1
  numItems = len(data)
  retList = []
  supportData = {}
  for key in ssCnt.keys():
    support = ssCnt[key]/numItems
    if support >= minSupport:
      retList.append(key)
    supportData[key] = support
  return retList, supportData
def aprioriGen(Lk,k): # According to k-1Generate k-itemsets from itemsets
  retList = []
  for i in range(lenLk):
  for j in range(i
    ,lenLk):+1= list(Lk[i])[:k
      l1 = list(Lk[i])[:k-2]
      l2 = list(Lk[j])[:k-2]
      l1.sort()
      l2.sort()
      if l1 == l2:
        retList.append(Lk[i] | Lk[j])
  return retList
def apriori(dataSet,minSupport = 0.5)#Generate frequent itemsets
  c1 = createC1(dataSet)
  D = list(map(set,dataSet))
  l1,supportData = scanData(D,c1,minSupport)
  L = [l1]
  k = 2
  while(len(L[k-2])>0):
    ck = aprioriGen(L[k-2],k)
    lk,supk = scanData(D,ck,minSupport)
    k = k + 1
    L.append(lk)
    supportData.update(supk)
  return L,supportData
def generaterRules(L,supportData,minConf=0.7)#Generate rules
  bigRuleList = []
  for i in range(1,len(L)):
    for freqSet in L[i]:
      H1 = [frozenset([item]) for item in freqSet]
      if i>1:
        rulesFromConseq(freqSet,H1,supportData,bigRuleList,minConf)
      else:
        calcConf(freqSet,H1,supportData,bigRuleList,minConf)
  return bigRuleList
def calcConf(freqSet,H,suppurtData,brl,minConf = 0.7)#Calculate rules that meet the confidence level
  prunedH = []
  for conseq in H:
    conf = suppurtData[freqSet/suppurtData[freqSet-conseq]
    if conf > minConf:
      brl.append((freqSet-conseq,conseq,conf))
      prunedH.append(conseq)
  prunedH.append(conseq)
return prunedH7def rulesFromConseq(freqSet, H, supportData, brl, minConf=0.
  ):#Recursive rule generation
  m = len(H[0])+1) >
    ):1 if len(freqSet) >= (m
    = calcConf(freqSet, H, supportData, brl, minConf)1if (len(Hmp 1) >
      ):1 Hmp1= aprioriGen(Hmp+1)
      , m1, supportData, brl, minConf)
data = [line.split() for line in open('mushroom.dat').readlines()]
L, support = apriori(data, minSupport=0.3)
for i in range(len(L)):
  for item in L[i]:
    if item & {2':
      print(item)

Code and Dataset Download:Apriori

That's all for this article. I hope it will be helpful to everyone's learning and that everyone will support the Yelling Tutorial more.

Declaration: The content of this article is from the Internet, and the copyright belongs to the original author. The content is contributed and uploaded by Internet users spontaneously. This website does not own the copyright, has not been manually edited, and does not assume relevant legal liabilities. If you find any content suspected of copyright infringement, please send an email to: notice#w3Please send an email to codebox.com (replace # with @ when sending an email) to report violations, and provide relevant evidence. Once verified, this site will immediately delete the infringing content.

You May Also Like