**Introduction**

**Probabilistic Context-Free Grammar**

In the above we can see the **Probabilistic Context-Free Grammar**. It looks the same as context-free grammar, but now we have probabilities assigned to each rule in the grammar. Here, I have a set of context-free rules and will assume that **'S'** is the start symbol under the context-free grammar, and we have probabilities for each rule. These probabilities have one critical property: if I take any nonterminal in the grammar, let's take **VP**, for example. There are many possible ways of expanding that nonterminal. So **VP** can be extended in three different ways. We can write it as **VI** or **VT NP** or **V PPP**. We can Notice that the three different probabilities associated with the three options sum to one **(0.4+0.4+0.2=1.)** This is the key constraint on the probabilities in a probabilistic context-free grammar. If we look at any nonterminal, we must have these probabilities summing to 1. Here is another example: NP; we can rewrite it in two different ways: a determiner followed by a noun or a noun followed by a prepositional phrase. These two probabilities sum to 1. So these probabilities have a clear interpretation which is the following

They are the conditional probabilities conditioned on a particular nonterminal. We have multiple different ways of writing nonterminals. We now distribute those other options over those different ways of redirecting that nonterminal.

Let's take a particular parse tree under this grammar.

Notice that every one of the rules I am using here is in this grammar. Hence, this is a valid parse tree under the underlying context-free grammar. We will now assign probabilities to this parse tree, which is simply a product of possibilities for the different rules. If we look at the first rule, **S** goes to **NPVP** with a probability of 1.0. NP goes to **DTNN** with a chance of 0.3; **DT** goes to **'the'** with a probability of 1.0. To calculate the possibility of the entire tree, we need to multiply together the chances of the different rules.

The final expression looks like- **1.0 x 0.3 x 1.0 x 0.7 x 0.4 x 1.0**.

Given a parse-tree t âˆˆ TG containing rules **Î±1 â†’ Î²1, Î±2 â†’ Î²2, . . . , Î±n â†’ Î²n**, the probability of t under the PCFG is

Where **q(Î± â†’ Î²)** is the probability for the rule

Probabilistic context-free grammars (PCFGs) are defined as follows:

A PCFG consists of:

1. A context-free grammar **G = (N, Î£, S, R)**.

2. A parameter **q(Î± â†’ Î²**)

for each rule **Î± â†’ Î², âˆˆ R**. The parameter **q(Î± â†’ Î²)** can be interpreted as the conditional probability of choosing rule **Î± â†’ Î²** in a left-most derivation, given that the nonterminal being expanded is Î±.For any **X âˆˆ N**, we have the constraint

In addition we have **q(Î± â†’ Î²) â‰¥ 0** for any **Î± â†’ Î² âˆˆ R**

**Parsing with PCFGs**

For sentence **s**, how do we find the highest-scoring parse tree for s, or more explicitly, how do we find

We can take the help of the CKY algorithm for this problem. The Cocke-Younger-Kasami algorithm applies to a restricted type of PCFG, which is in Chomsky standard form (CNF). While the restriction to grammar in CNF might initially seem restrictive, it is not a reasonable assumption. We can convert any PCFG into an equivalent grammar in CNF.

CNF (Chomsky Normal Form) is a context-free grammar **G = (N, Î£, R, S)** if each rule Î± â†’ Î² âˆˆ R represents one of two following forms.

â€¢** X â†’ Y1 Y2, X âˆˆ N, Y1 âˆˆ N, Y2 âˆˆ N. **

**â€¢ X â†’ Y, X âˆˆ N, Y âˆˆ Î£**.

There is a nonterminal X rewriting for each rule in the grammar as two nonterminal symbols, Y1 Y2. Or, it can consist of a nonterminal X rewriting as exactly one terminal symbol Y.