Towards More Objective Translations for a Non-Synesthesiac: Light to sound

     The goal of this experiment is to create a basis and model for an objective, scientific translation from light to sound. To clarify, while the concept of synesthesia is an inspiration for the project it is essentially of no consequence due to the experiment’s focus on measurements of the physical world and function of universal organs i.e. eyes and ears. Synesthesia, on the other hand, has been classified as a phenomenon, which occurs as a result of the brain’s construction, most likely stemming from the formation of neural pathways of perception at a young age. Synesthesia is unique to the individual, so that a common input may result in many different outputs amongst a group of synesthesiacs. Additionally, there is sufficient evidence that synesthesia colors do not interact with an individuals’ determination of color appearance (Robertson, Sagiv). 

     While many ‘conversions’ of light to sound exist, each version contains a considerable amount of ‘arbitrary decisions’. These can range from the spectrum of color being graphed onto a curve where sound is triggered based on a colors’ assigned location within a graph, to mapping frequencies of colors onto the 12 note equal temperament system. The reason these conversions can be said to be ‘arbitrary’ is due to the presence of personal preference interjected by the creator, e.g. utilizing only 12 notes, or choosing the spectra of a sound to represent a color. In this project, light’s physical properties are determined alongside the way in which these properties are perceived by the human eye; then, using only scientific understanding of the physics and phenomena of waves, light is ‘translated’ and reproduced as the closest replication in audible sound. Through understanding the differences and similarities of the physical nature of light and sound as well as the perception of these stimuli it is possible to make a more accurate translation of the information. As in any translation however, not all information is preserved or properly represented, but that which is, attempts to be free of all subjective interference. A translation as such, then allows the possibility of being built upon using the emergence of new scientific data to create a more accurate version. 

     The fundamental assumption, which this translation is based on is the ‘rule of octaves’ in sound, wherein a frequency exactly twice the amount of another frequency is perceived as the same, only registrally displaced. By calculating the frequency of a light wave and transposing it into the audible spectrum, the result is theoretically the same, only registrally displaced. This theory however, is impossible to prove fully due to the fact that human perception of light does not extend to an octave. Therefore, the assumption is that if we could see an octave of light, we would perceive the frequency presented as twice itself as the same, but in a different register. The first crucial step in translation is reconciling the fundamental physical difference between light and sound. Sound propagates as a mechanical wave in a longitudinal form and requires a medium to exist, whereas light propagates as an electromagnetic wave moving transversely, but does not require a medium to travel. However, when travelling through solids sound can propagate in a transverse manner, so wave propagation can exist cross-modally. Although light can exist in a vacuum, it is still affected by the atmosphere in which we perceive it normally, similar to sound waves. Therefore, the one major difference of a light and a sound wave is that of mechanical versus electromagnetic, both of which, follow the wave equation and exhibit identifiable wave behaviors. In terms of wavelength, sound exists in around the same section of the electromagnetic spectrum as do radio waves: 20,000Hz = 0.0172 m and 20Hz = 17.2m. The wavelength of light is then radically less than that of sound. To determine an equivalent frequency in the audible range using fλ=v, we take the speed of light in air, 299,792,458 m/s and divide it by one of the wavelengths present in light. The resulting frequencies derived can be anywhere from 400 to 800 trillion Hz. To transpose the frequencies into the audible range, this frequency is ‘halved’ 36 times. 

    Determining the wavelengths present in a light source and/or reflective objects can be done in two ways. The first makes use of the digital or LCD representation of a light source and/or reflective object. Here the image can be processed and broken into color proportions. This tool designed by L. Jégou exports the palette of colors present in the makeup of an image. Thereafter, each individual color is processed by an ‘image color summarizer’, designed by Martin Krzywinski. The program outputs the colors HSV values. From the ‘Hue’ value the wavelength can be determined by the formula: Hue = (650 - wavelength)*240/(650-475) (Dukes). “Visible light is about 400 to 700 nm, with 650 nm being the frequency of red. 475 is blue. Both indigo and violet fall below this limit, but the CIE model that defines the theoretical map of wavelength to hue has to treat colors in that range separately, and it could be that 475 nm is the effective limit for standard Hue calculation, perhaps requiring negative Hue.”(Roberson, Dukes) In the example case of this project analyzing the image of an incandescent bulb close-up, 58 different Hue values are present. While the Saturation (S) and Value (V) are equally as important to the make-up of the LCD represented color, it is difficult to translate these values as a function of sound without making arbitrary decisions e.g. saturation or richness represented as timbre. Thus, the result is uniform amplitude and timbral frequencies.

Incandescent bulb: close up photo taken from the program 'Theremino' with a light-grating camera
     To represent these frequencies aurally, using sine waves is the most reasonably objective method at this point in the research. In graphing electromagnetic waves, each frequency consists of an electric field oscillating at a right angle to a complimentary magnetic field. Thereby, a Fourier transform will generate sinusoidal functions. However, true sinusoids are actually only present in perfect monochromatic plane waves e.g. a laser. Since, this project lacks the tools to analyze the true shape of the variation in electric and magnetic fields across space, graphical descriptions of these changing field vectors yield the closest to objective and most non-arbitrary representation of how light wave exist in air. This line of logic is also present in the second method of translation. 

     This method makes use of the program ‘Theremino Spectrometer’ and a light-grating camera from Microsoft. The ‘Theremino Spectrometer’ analyzes the image from the light-grating camera and outputs a graph with wavelength on the x-axis and intensity on the y-axis. The translation process can be implemented for every whole numbered wavelength in the visible spectrum, approximately 380 nm to 720 nm, which yields 340 different frequencies. Due to the fact that amplitude can be more objectively translated and applied through this process, it yields a more accurate translation than that of the first method. However, this translation assumes a whole-numbered bias i.e. using only the data gathered from only whole numbered wavelengths as opposed to any waves present in between; there is still a deal of research necessary to determine how to appropriately deal with this bias. A quantum understanding of the function of light as both a particle and wave including the role of photons in perception will have to be thoroughly explored to come to a more suitable conclusion. Studying ‘Planck time’, of which the reciprocal is ‘Planck frequency’ offers some insight into the smallest intervals of photons present in a light source, but has yet to be understood enough to yield application.

Theremino Spectrometer graph of an incandescent light bulb
     Similarly, however, our eyes are not in fact able to process such discrete wavelength information. Evidently, the smallest perceptible wavelength difference is not known, but according to ‘The Handbook of Perception’ “…normal human, on the other hand, can, with ease, discriminate, say, a 5-nm wavelength difference between 590 and 595 nm…”. (Carterette, Friedman). Just as the smallest perceivable frequency intervals, or ‘Just Noticeable Differences’ (JND’s) in the audible spectrum vary across the spectrum, so does wavelength discrimination vary within the visible spectrum; however, the data of these intervals is presently lacking. David L. MacAdam’s diagram (below) in the ‘CIE color space’ offers a potential insight into these limits of color difference, but no exact figures have been procured as of yet.

    The reason for these variations in JND’s is due to the color receptors in our eyes, cones, humans have three types of cones referred to Small (S), Medium (M), and Large (L) “These three curves indicate how sensitive the corresponding cone is to each wavelength. The highest point on each curve is called the “peak wavelength”, indicating the wavelength of radiation that the cone is most sensitive to.” (Wong).

    “Receptor curves have overlap in their spectral sensitivity” and “critical information is not extent to which receptors are activated, but the relative activation of the two receptor types.” (Carterette, Friedman). Due to this phenomenon, it is the total excited area underneath a cone in the graph above that determines the intensity of stimulation for a cone. So, while the measurement of wavelength intensity for an LCD represented image may differ from one of the physical object, as long as the excited area within a cone’s sensitivity is equivalent, the images are perceived as the same. In order to create a more accurate translation it will be crucial to develop a transform based on the sensitivity of these cones, as well as a sufficient graph for the JND’s of light and to be able determine the data in terms of wavelength, which needs to be measured. Additionally, just as ‘Equal-Loudness Curves’ are present in the perception of sound, it will be important to determine an accurate representation of this phenomenon optically (wave amplitude) and apply the necessary transformation in translation by taking into account the inevitable graph differences. The luminosity function (pictured below) shows human’s relative brightness sensitivity as a function of wavelength; while this graph lacks an exactness, a transformation is currently being worked on to reconcile the ‘Fletcher-Munson Curves’ and this photopic relative brightness.

    In representing the translated frequencies using additive synthesis most closely mirrors how our eyes perceive light. White light is all wavelengths of the visible spectrum at equal intensity, color then, depends on the varying intensities of amplitude in reflected light. Therefore, the optical information our eyes perceive contains some degree of amplitude at all wavelengths (unless the light is monochromatic e.g. a laser); the activation of the cones in our eyes is a result of the additive synthesis of all frequencies in light. When the translated frequencies are presented additively in the audible spectrum, however, musical beats are always present. All wavelengths create beats when interacting with one another, the reason these beats are not perceivable optically is that most visible waves are incoherent, and the non-linear processes cause the beat to become a component of the signal. Additionally, due to the rapid oscillation of light waves, sources with a small enough relative frequency difference to be observed by the eye are near impossible. This intrinsic fault of translation seems to be only quality, which lacks a potential solution at this stage. However, to re-iterate no translations, even those between languages, are truly able to preserve and portray the exactness of the initial message or data. 

     Human perception inherently disregards a vast quantity of physically objective stimuli. The information, which our minds process as sight is transmitted via narrow windows within the electromagnetic spectrum our eyes have become sensitive to. Sound only becomes audible once our basilar membranes can wrangle the size of a wave, but then audibility disappears if the wave becomes too short. These waveforms within the gaps on opposite ends of perception’s spectrums interact with our bodies, and more often then not we are unaware of their presence. We’ve seen how dangerous ultra-violet radiation can be our bodies, but we still do not grasp the full effect omnipresent radio waves, for instance, may have on us. Perhaps, the sensory bombardment of un-perceivable waveforms unknowingly disrupts our brain’s full potential. At the turn of the 19th century artists at the forefront of modernism attempted to ‘re-induce’ an ancient harmony of mind to nature they felt had been lost in the evolution of consciousness. It could be inferred their efforts predicated the 20th century technological explosion, which introduced an unquantifiable amount of non-perceptual information into our surroundings. Spiritual seekers have always retreated from society into the ‘silence’ of nature to awaken their spiritual bodies and explore the psychological depths of the mind, like the psychonauts of Tibet. Does this omnipresent non-perceptual information then stunt our abilities to evolve the capacities of our consciousness? And if we develop methods of translation between the non-perceptual and perceptual, is it possible to infer the effect the former may have on us?

This piece compiled in MAX/MSP includes the discrete frequencies and amplitudes for wavelengths 380nm – 720nm resulting in 420 additively synthesized sine waves. 
The amplitudes are derived from the Theremino graph of the incandescent bulb detailed above. 
The image below is a glimpse of the patch’s framework. All 420 sines are initiated simultaneously. 
Each discrete wavelength includes a subpatch, also shown below. 
Herein, the amplitude of the given wavelength is set. 
The patch also allows for amplitude change over time, and includes a variable time domain. 
This design is for future projects, which strive to translate video to sound in real time. 
The exported .wav file from the additive patch is then re-processed via sampling, EQ automation, harmonic distortion, and reverb for the composition.