Directed information is an information theory measure that quantifies the information flow from the random string to the random string . The term directed information was coined by James Massey and is defined as[1]
The essence of directed information is causal conditioning. The probability of causally conditioned on is defined as[5]
.
This is similar to the chain rule for conventional conditioning except one conditions on "past" and "present" symbols rather than all symbols . To include "past" symbols only, one can introduce a delay by prepending a constant symbol:
.
It is common to abuse notation by writing for this expression, although formally all strings should have the same number of symbols.
One may also condition on multiple strings: .
Causally conditioned entropy
The causally conditioned entropy is defined as:[2]
Similarly, one may causally condition on multiple strings and write
.
Properties
A decomposition rule for causal conditioning[1] is
.
This rule shows that any product of gives a joint distribution .
The causal conditioning probability is a probability vector, i.e.,
.
Directed Information can be written in terms of causal conditioning:[2]
.
The relation generalizes to three strings: the directed information flowing from to causally conditioned on is
.
Conservation law of information
This law, established by James Massey and his son Peter Massey,[12] gives intuition by relating directed information and mutual information. The law states that for any , the following equality holds:
Estimating and optimizing the directed information is challenging because it has terms where may be large. In many cases, one is interested in optimizing the limiting average, that is, when grows to infinity termed as a multi-letter expression.
Estimation
Estimating directed information from samples is a hard problem since the directed information expression does not depend on samples but on the joint distribution which may be unknown. There are several algorithms based on context tree weighting[14] and empirical parametric distributions[15] and using long short-term memory.[16]
Optimization
Maximizing directed information is a fundamental problem in information theory. For example, given the channel distributions , the objective might be to optimize over the channel input distributions .
Massey's directed information was motivated by Marko's early work (1966) on developing a theory of bidirectional communication.[26][27] Marko's definition of directed transinformation differs slightly from Massey's in that, at time , one conditions on past symbols only and one takes limits:
Marko defined several other quantities, including:
Total information: and
Free information: and
Coincidence:
The total information is usually called an entropy rate. Marko showed the following relations for the problems he was interested in:
and
He also defined quantities he called residual entropies:
and developed the conservation law and several bounds.
Relation to transfer entropy
Directed information is related to transfer entropy, which is a truncated version of Marko's directed transinformation .
The transfer entropy at time and with memory is
where one does not include the present symbol or the past symbols before time .
Transfer entropy usually assumes stationarity, i.e., does not depend on the time .
References
^ abcMassey, James (1990). "Causality, Feedback And Directed Information". Proceedings 1990 International Symposium on Information Theory and its Applications, Waikiki, Hawaii, Nov. 27-30, 1990.
^Permuter, Haim Henry; Weissman, Tsachy; Goldsmith, Andrea J. (February 2009). "Finite State Channels With Time-Invariant Deterministic Feedback". IEEE Transactions on Information Theory. 55 (2): 644–662. arXiv:cs/0608070. doi:10.1109/TIT.2008.2009849. S2CID13178.
^ abKramer, G. (January 2003). "Capacity results for the discrete memoryless network". IEEE Transactions on Information Theory. 49 (1): 4–21. doi:10.1109/TIT.2002.806135.
^Permuter, Haim H.; Kim, Young-Han; Weissman, Tsachy (June 2011). "Interpretations of Directed Information in Portfolio Theory, Data Compression, and Hypothesis Testing". IEEE Transactions on Information Theory. 57 (6): 3248–3259. arXiv:0912.4872. doi:10.1109/TIT.2011.2136270. S2CID11722596.
^Simeone, Osvaldo; Permuter, Haim Henri (June 2013). "Source Coding When the Side Information May Be Delayed". IEEE Transactions on Information Theory. 59 (6): 3607–3618. arXiv:1109.1293. doi:10.1109/TIT.2013.2248192. S2CID3211485.
^Charalambous, Charalambos D.; Stavrou, Photios A. (August 2016). "Directed Information on Abstract Spaces: Properties and Variational Equalities". IEEE Transactions on Information Theory. 62 (11): 6019–6052. arXiv:1302.3971. doi:10.1109/TIT.2016.2604846. S2CID8107565.
^Massey, J.L.; Massey, P.C. (September 2005). "Conservation of mutual and directed information". Proceedings. International Symposium on Information Theory, 2005. ISIT 2005. pp. 157–158. doi:10.1109/ISIT.2005.1523313. ISBN0-7803-9151-9. S2CID38053218.
^Jiao, Jiantao; Permuter, Haim H.; Zhao, Lei; Kim, Young-Han; Weissman, Tsachy (October 2013). "Universal Estimation of Directed Information". IEEE Transactions on Information Theory. 59 (10): 6220–6242. arXiv:1201.2334. doi:10.1109/TIT.2013.2267934. S2CID10855063.
^Quinn, Christopher J.; Kiyavash, Negar; Coleman, Todd P. (December 2015). "Directed Information Graphs". IEEE Transactions on Information Theory. 61 (12): 6887–6909. arXiv:1204.2003. doi:10.1109/TIT.2015.2478440. S2CID3121664.
^ abNaiss, Iddo; Permuter, Haim H. (January 2013). "Extension of the Blahut–Arimoto Algorithm for Maximizing Directed Information". IEEE Transactions on Information Theory. 59 (1): 204–222. arXiv:1012.5071. doi:10.1109/TIT.2012.2214202. S2CID3115749.
^ abPermuter, Haim; Cuff, Paul; Van Roy, Benjamin; Weissman, Tsachy (July 2008). "Capacity of the Trapdoor Channel With Feedback". IEEE Transactions on Information Theory. 54 (7): 3150–3165. arXiv:cs/0610047. doi:10.1109/TIT.2008.924681. S2CID1265.
^ abElishco, Ohad; Permuter, Haim (September 2014). "Capacity and Coding for the Ising Channel With Feedback". IEEE Transactions on Information Theory. 60 (9): 5138–5149. arXiv:1205.4674. doi:10.1109/TIT.2014.2331951. S2CID9761759.
^ abSabag, Oron; Permuter, Haim H.; Kashyap, Navin (January 2016). "The Feedback Capacity of the Binary Erasure Channel With a No-Consecutive-Ones Input Constraint". IEEE Transactions on Information Theory. 62 (1): 8–22. doi:10.1109/TIT.2015.2495239. S2CID476381.
^ abPeled, Ori; Sabag, Oron; Permuter, Haim H. (July 2019). "Feedback Capacity and Coding for the $(0,k)$ -RLL Input-Constrained BEC". IEEE Transactions on Information Theory. 65 (7): 4097–4114. arXiv:1712.02690. doi:10.1109/TIT.2019.2903252. S2CID86582654.
^ abShemuel, Eli; Sabag, Oron; Permuter, Haim H. (March 2024). "Finite-State Channels With Feedback and State Known at the Encoder". IEEE Transactions on Information Theory. 70 (3): 1610–1628. arXiv:2212.12886. doi:10.1109/TIT.2023.3336939.
^ abAharoni, Ziv; Sabag, Oron; Permuter, Haim Henri (18 August 2020). "Reinforcement Learning Evaluation and Solution for the Feedback Capacity of the Ising Channel with Large Alphabet". arXiv:2008.07983 [cs.IT].
^Sabag, Oron; Permuter, Haim Henry; Pfister, Henry (March 2017). "A Single-Letter Upper Bound on the Feedback Capacity of Unifilar Finite-State Channels". IEEE Transactions on Information Theory. 63 (3): 1392–1409. arXiv:1604.01878. doi:10.1109/TIT.2016.2636851. S2CID3259603.
^Sabag, Oron; Huleihel, Bashar; Permuter, Haim Henry (2020). "Graph-Based Encoders and their Performance for Finite-State Channels with Feedback". IEEE Transactions on Communications. 68 (4): 2106–2117. arXiv:1907.08063. doi:10.1109/TCOMM.2020.2965454. S2CID197544824.
^Marko, H. (December 1973). "The Bidirectional Communication Theory--A Generalization of Information Theory". IEEE Transactions on Communications. 21 (12): 1345–1351. doi:10.1109/TCOM.1973.1091610. S2CID51664185.