rumus Apel Vs Jeruk


T= Type / ClassTe
\mathcal{T} = \{ T | \exists(s,w3type T) \in D\} Te

I = instance Te
I(T,D)= set of instance with type T in dataset DTe
I(T,D) = \{ s | \exists(s,w3type T) \in D\}Te

P(T) = set of distinct property in Type $T$Te
P(T) = \{ p | s \in I(T,D) and \exists(s,p,o) \in D\}Te

OC(p,I(T,D) = occurrence property $p$ in $I(T,D$Te
OC(p,I(T,D) = | \{ s | s \in I(T,D) and \exists(s,p,o) \in D\} |Te

Coverage
CV(T,D) = \frac{\sum_{p \in P(T) OC(p,I(T,D)}}{|P(T)| \times |I(T,D)| }Te

Weight
WT(CV(T,D)) = \frac{|P(T)| \times |I(T,D)|}{\sum_{T' \in \mathcal{T}} |P(T')|+|I(T',D)}Te

Coherence
CH(\mathcal{T},D) = \sum_{T \in \mathcal{T}} WT(CV(T,D)) \times CV(T,D)Te

D = real datasetTe
D' = new dataset after removing coinTe
|D| < |D'| and $D \subset D’$Te

what is coin ?
removing a set of triples with the same subject and propertyTe

\mathcal{T}(s)= \{Ts^1,...,Ts^n\}Te

A1 \Longrightarrow We do not completely remove property $p$ from any of the types {$Ts^1$,…,$Ts^n$ }. That is, after the removal, for each type there will exist instances that have property p.Te

A2 \Longrightarrow We do not completely remove instance $s$ from the dataset. This can be very easily enforced by keeping the triples {$s$, rdf:type, $Ts^i$ } in the dataset.Te

Weight is the same after removing the triples
but the coverage is changed : Te
CV(T,D)' = \frac{\sum_{q \in P(T)-p} OC(q,I(T,D)) + OC(p,I(T,D) -1) }{|P(T)| \times |I(T,D)|}Te

CH(\mathcal{T}, D')= \mathcal{T}Te
|D'| = \sigmaTe

coin(\mathcal{T}(s),p)=CH(\mathcal{T},D) - CH(\mathcal{T},D)'Te

|coin(S,p)|= number of subjects that are instance of all the types in $S$ and have at least one triple with property $p$Te
|coin(S,p)|=|\{s \in \bigcap_{T \in S} I(T,D)| \exists(s,p,v) \in D \}|Te

C1 \Longrightarrow the amount by which we decrease coherence (by removing coins) should be less than or equal than the amount we need to remove to get from $CH(\tau, D)$ (the coherence of the original dataset) to \lambda (the desired coherence).Te

X(S,p) the integer programming variable representing the number of coins to remove for each type of coin.Te
\tau = number of types
\pi = the number of properties in the dataset
worst case the number of variables for D can be 2^\tau \piTe

sets x and y, x \subseteq y if all elements of x are also elements of y
Te
C1 \Longrightarrow \sum_{S \subseteq \mathcal{T},p} coin (S,p) \times X(S,p) \leq CH(\mathcal{T},D) - \lambda Te

M \Longrightarrow the amount by which we decrease coherence should be maximized.

M \Longrightarrow MAXIMIZE \sum_{S \subseteq D,p} coin (S,p) \times X(S,p) Te

C2 \Longrightarrow \forall S \subseteq ,p 0 \leq X(S,p) \leq |coin(S,p)|-1 Te

ct(S,p) =average number of triples per coin typeTe
C3 \Longrightarrow (1-\rho) \times (|D| - \sigma ) \leq \sum_{S \subseteq \mathcal{T},p } X(S,p) \times ct(S,p) Te

$latexC4 \Longrightarrow \sum_{S \subseteq \mathcal{T},p } X(S,p) \times ct(S,p) \leq (1+ \rho ) \times (|D| – \sigma )$

sumber : https://researcher.ibm.com/researcher/files/us-sduan/sigmod2011_RDF_benchmark_duan.pdf

  1. wah, ilmu baru neh bagi saya
    bener-bener baru…

  2. Waw..hanya orang berilmu yang bisa membaca kode-kode itu.
    Saya tidak bisa menulis rumus apel vs jeruk, tapi kalo mangga dan lengkeng dikawin sama durian, bisa….

  3. pusing dah …kalu dah ngomongin ttg rumus ..
    hehe

  1. No trackbacks yet.

Tinggalkan komentar