Patent for “IPD correction, reprojection for accurate mixed-reality object placement”

Metaverse – Youding Zhu, Michael Bleyer, Denis Claude Pierre Demandolx, Raymond Kirk Price, Microsoft Technology Licensing LLC

Search Patent for “IPD correction, reprojection for accurate mixed-reality object placement”

Abstract for “IPD correction, reprojection for accurate mixed-reality object placement”

“Optimizations can be used to generate passthrough visualizations of Head Mounted Displays. A stereo camera pair is used to capture raw images. The user is measured at the interpupil distance. The images taken by the left camera’s center-line perspective are not parallel to the center-line perspectives captured by the right camera. To correct images, camera distortion corrections can be applied to raw images. To create images with parallel center-line perspectives, epipolar transforms are applied to the corrected images. The transformed images are then processed to create a depth map. The left and right passthrough visualizations can be generated by reprojecting the left and right images.

Background for “IPD correction, reprojection for accurate mixed-reality object placement”

Because they can create immersive experiences and environments, virtual-reality computers systems have been gaining a lot of attention recently. Virtual-reality computer systems often use one or more body devices such as a head-mounted (?HMD?) device. To render a virtual environment for a user. Virtual-reality systems that are not conventional completely block the real world and only display the virtual environment to the user via the HMD, however, can be used for this purpose. The user may lose touch with reality and become completely immersed in the virtual environment.

“Continued advancements in hardware capabilities, rendering technologies have greatly improved the realism displayed virtual objects within a virtual reality environment. Virtual objects can be placed in virtual-reality environments in order to create the illusion that the user is in a completely new environment. The virtual-reality environment updates automatically as the user moves about in the real world. This allows the user to see the virtual objects from a different perspective. This virtual-reality environment can also be called a computer-generated scene or simply a?scene. This is the definition of a?virtual reality environment. ?computer-generated scene,? Or simply “scene”? These terms can be interchanged to refer to an experience where virtual content is projected in virtual environments.

As we have discussed, a virtual reality head-mounted device blocks a user from seeing his/her real-world environment. However, in some cases, it may be possible for the user to see his/her real world environment even though the head-mounted device is still being used. Some VR systems can generate a “passthrough” effect. The user’s environment is visualized on the HMD of the VR system. The HMD displays this passthrough visualization so that the user can see his/her real-world environment. Cameras mounted on the user?s head-mounted devices capture visuals of their real-world surroundings. These visualizations are projected onto the HMD, allowing the user to view the real world without removing the head-mounted gadget.

While there are some technologies that can generate passthrough visualizations, they are severely lacking in the current technology. The current technology is expensive because it requires new hardware (i.e. cameras) to be mounted on the head-mounted device in order to capture the user’s actual-world environment. The current technology is unable to optimize its passthrough visualizations in order to maximize the user?s comfort while perceiving these visualizations. Each user perceives the environment differently due to differences in body composition (e.g. distance between eyes). These differences in body composition are not taken into account by the current passthrough technology when it generates its passthrough visualizations. While there are passthrough technologies that have been created, they fail to meet the needs of many users.

“The subject matter claimed herein does not limit to embodiments that solve disadvantages or operate in specific environments like those described above. This background is not intended to be a complete list of possible technology areas where the embodiments described herein could be used.

“Disclosed embodiments address computer systems, hardware storage devices and methods that improve passthrough visualizations for VR devices. Some of the disclosed embodiments involve capturing and reconstructing images using a head-mounted devices (HMD), and then processing those images so that they match the perspective of the user wearing the HMD.

“Some of the disclosed embodiments concern head-mounted devices (HMDs), which are designed to improve passthrough optics. Others embodiments concern methods of using HMDs.

The disclosed HMDs consist of a stereo camera pair that includes a left and right camera. Stereo camera pairs are used to capture images of the surrounding environment. A center-line perspective on an image taken by the left camera is not parallel to one captured by the right camera. The cameras are angled so that their fields are not parallel.

“Some disclosed embodiments allow for the determination of an interpupil distance (IPD). The left camera captures a raw left image and the right captures a raw right image. Camera distortion corrections are then applied to the raw right/left images to create a corrected left and corrected right image. The corrected left image is the same perspective as the original raw left image. The corrected right image has the exact same perspective as that of the raw right. The embodiments then apply epipolar transforms on the corrected left- and right-images to create a transformed left and right image. These transforms result in a center line perspective for the transformed left image being parallel to the center-line perspective for the transformed right. The depth map is then generated. The depth map is created by combining the left and right transformed images. The left and right passesthrough visualizations are then generated by reprojecting these transformed images. The depth map results are used to perform this reprojection. This reprojection also causes the left image’s centre-line perspective to align with the user’s left pupil, and the right image’s central-line perspective alignment to align with the user?s right pupil.

“This Summary presents a few concepts in simplified form. They are described in detail below in the Detailed Description. This Summary does not identify the key features or essential features in the claimed subject matter. It is also not meant to be used to determine the scope of the claimed matter.

“Additional features or advantages will be described in the following description. In part, these will be apparent from the description. Or, you may learn them by practicing the teachings. The appended claims provide examples of how to realize the features and benefits of the embodiments. The following description and the appended claims will make the features of the present embodiments more apparent. You can also learn them by practicing the embodiments described hereinafter.

“Some of the disclosed embodiments concern computer systems, hardware storage devices, as well as methods and means for improving passthrough visualizations for virtual realities (VR) devices by using images captured with head-tracking camera mounted to VR devices. Some embodiments provide methods and devices to reconstruct a perspective from an image rendered on a VR headset-mounted device (HMD), so that the captured perspective matches the perspective of the user wearing the HMD.

“The present embodiments can be used to overcome many technical problems and computational costs associated with creating a passthrough visualization. A passthrough visualization, as mentioned earlier, is a captured visualization that shows the user’s actual-world environment. The HMD displays the captured visualization so the user can see his/her real-world environment without having to take off the head-mounted device. The HMD displays a visual representation of the user’s actual-world surroundings. The HMD displays the final results of the camera and the HMD displays them.”

The present embodiments greatly improve passthrough technology in many ways. Some of the disclosed embodiments, for example, use existing hardware infrastructure (e.g. head tracking cameras) to generate passthrough visualizations, rather than installing new hardware. Because less hardware is needed, current embodiments are significantly lower in both production cost as well as user expense. The head-mounted device is lighter and will also offer greater comfort. The user will feel less fatigue and strain when wearing the head-mounted device.

The present embodiments are also more advanced than current technologies, generating custom perspective passthrough visualizations. Different people have different visual anatomy, which allows each person to see the environment in their own way. Different people may have different interpupil distances which can impact how they view their environment. We will now go into more detail about this.

“Initially, it was noted that humans were able to perceive depth. Because humans have two eyes working in tandem, The brain receives signals from both eyes when they are focused on the same object. The brain then uses the differences between the eye images to calculate depth. A person’s depth perception ability depends, at least partially, on their interpupil distance (i.e. The distance between two pupils. Each person’s perception of the environment and interpupil distance is different.

“The disclosed embodiments incorporate the user’s interpupil Distance (IPD), into the calculation of creating a passthrough visualization. Passthrough technologies that do not consider or accommodate different IPD are lacking in many cases. In contrast, the disclosed embodiments not only correct camera distortions but also reconstruct/alter perspective captured by a cam image so that the captured perspective matches user’s unique perspective. Because the visualizations can be customized for each user, embodiments greatly improve the quality and effectiveness of passthrough visualizations.

The disclosed embodiments improve the functionality and operations of the underlying computers used for passthrough visualizations. Some traditional passthrough technologies use a lot of computing resources as they process every pixel in a passthrough picture. However, the disclosed embodiments significantly reduce the computing resources required because they work with highly optimized images (e.g. images that have been down-sampled and filtered). One example of such optimization is to make the image’s pixels smoother in relation to each other. The embodiments improve the computer’s efficiency in generating passthrough visualizations.

“In certain embodiments, the HMD may be configured with a stereo camera pair that includes a left- and right-facing camera. Stereo camera pairs are used to capture images of the surrounding environment. The left camera’s center-line perspective is non-parallel to the right camera’s center-line perspective. The cameras are set up so that their fields are not parallel. The left camera captures the environment in raw left and the right captures the environment in raw right. Camera distortion corrections can be applied to raw images to create corrected left images or corrected right images. The corrected left image has the same center line perspective as the original raw left image and the corrected right has the same center line perspective as that of the raw right. The corrected left and right images are then transformed using epipolar transforms. These transforms result in a center line perspective for the transformed left image being paralleled to the center-line perspective for the transformed right. The embodiments then generate a depthmap that is based upon a combination the left and right images. The embodiments then generate a left-passthrough visualization and a right-passthrough visualization by reprojecting both the left and right images based on the IPD of the user as well as the depth map. This reprojection also causes the transformed left image?s center-line perspective and the transformed right picture’s centre-line perspective of the user to align with each other.

“After having described the various benefits and high-level attributes, the disclosure will now concentrate on FIG. 1. This is an introduction to an exemplary computer system. The discussion will continue with FIGS. 2-15, which discuss various architectures and supporting images. 2-15. Finally, we will detail various flow diagrams as well as methods with respect to the remaining figures. 16-19).”

“Exemplary Computing System”

“As shown in FIG. 1. An exemplary computing system 100 can take many different forms. FIG. FIG. 1 shows the computer system 100 as having a HMD 100A. The HMD 100A may contain the computer system 100, but the computer 100 could also include one or more connected computing components/devices. The computing system 100 can be implemented in any form, and not just the one shown in FIG. 1. A computer system 100 could include, for example, a computer with a desktop, a laptop or a tablet and a server, data center, and/or other computing systems.

“In its simplest configuration, the computer system 100 contains many different components. FIG. FIG. 1 illustrates that computer system 100 contains at least one hardware processor unit 105, input/output interfaces 110, graphics rendering engines, 115, one sensor 120, and storage.

“The storage 125 could be physical system memory. It may be volatile, nonvolatile or a combination of both. “Memory” may also be used. The term?memory? may also be used to refer to nonvolatile mass storage, such as physical storage media. The computing system 100 may also be distributed in terms of processing, memory and/or storage capabilities. The term “executable module” is used herein. ?executable component,? or even ?component? Software objects, routines or methods that can be executed on the computing device 100 may also be called?component? These components, modules and engines can be implemented as objects, processors, or programs that run on the computing system 100 (e.g. ”

“Disclosed embodiments can include or use a special-purpose, general-purpose, or computer, including computer hardware such as one or more processors (such processor 105) or system memory (such storage 125), as described in greater detail below. Other computer-readable media can also be used to store or carry computer-executable instructions. Computer-readable media can include any media that can be accessed using a general-purpose, or special-purpose, computer system. Computer-readable media that contain computer-executable instructions as data are called physical computer storage media. Transmission media are computer-readable media that contain computer-executable instruction. The current embodiments may include at least two distinct types of computer-readable media, computer storage media or transmission media.

“Computer storage media” refers to hardware storage devices such as RAM, RAM, EEPROM and CD-ROM. They can also be called Flash memory, phase-change memory (PCM), RAM, ROM or other types of memory or magnetic disk storage. Or any other medium that can store desired program code in the form of computer executable instructions, data or data structures that can be accessed using a general-purpose, special-purpose, or personal-purpose computer.

The computer system 100 can also be connected to external sensors 130 via a wired connection or wireless connection (e.g. one or more remote cameras or gyroscopes), acoustic sensors or magnetometers. The computer system 100 can also be connected via one or more wired and wireless networks 135 or 140 to remote systems(s), 140 which are capable of performing any of the processing described in relation to computer system 100.

“A user of the computer system 100 can perceive information (e.g. a virtual reality scene) during use through a display screen that is part of the I/O interface(s). 110 and that is visible by the user. Sensors 120/130 and 110/110 also contain gesture detection devices and eye trackers. ), etc.) These devices are capable of detecting the movement and positioning of real-world objects such as the user’s hand or stylus and/or any other objects that the user might interact with while immersed in the scene.

“Some instances monitor the position and movement of both the virtual and real objects. This monitoring detects any change in the position or movement of the objects. For example, a change in velocity, orientation, acceleration, or position. These movements can be absolute movements and/or relative movements, such as compared to a relative positioning of the HMD, and such that movements/positioning of the HMD will be calculated into the relative movements/positioning of the objects as they are presented in the scene.”

“The graphics rendering engine (115) is configured with the processor(s), 105 to render one or several virtual objects within the scene. The virtual objects respond to the movement of the user, and/or user inputs as they interact within the virtual scene.

“A ?network,? “A?network,? such as the network 135 in FIG. 1 is a network of data links and/or switches that allow electronic data to be transferred between computers, modules and/or other electronic devices. The computer correctly views the connection as a transmission medium when information is transmitted or received over a network (either wireless or hardwired). One or more communication channels will be used by the computer system 100 to communicate with network 135. Transmission media are a network that can carry data or desired program codes means. They can either be in the form or computer-executable directions or data structures. These computer-executable instruction can also be accessed via a general-purpose and special-purpose computer. Computer-readable media should include combinations of all of these.

“Upon reaching different computer system components, program codes can be in the form computer-executable instruction or data structures and transferred automatically from transmission medium to computer storage media (or vice-versa). Computer-executable instructions and data structures can be buffered within RAM of a network module (e.g., network interface card, or?NIC) to store them. It can then be transferred to computer system RAM or to less volatile computer storage media at the computer system. It is important to understand that computer storage media may be used in components of computer systems that also use transmission media, or even primarily.

“Computer-executable (or computer-interpretable) instructions comprise, for example, instructions that cause a general-purpose computer, special-purpose computer, or special-purpose processing device to perform a certain function or group of functions. Computer-executable instructions can be binaries, intermediate format instructions, such as assembly language or source code. The subject matter has been described using language that is specific to structural features and/or methodsological acts. However, it should be understood that the subject matter described in the appended claims does not necessarily include the above-described features or acts. The claims are not intended to be implemented in any way other than the examples shown.

“Those skilled in art will recognize that the embodiments can be used in network computing environments with many different configurations of computer systems including personal computers and laptops as well as message processors, multi-processor system, microprocessor-based consumer electronics, network computers, minicomputers mainframe computers, mobile phones, PDAs pagers routers switches, and the rest. These embodiments can also be used in distributed systems environments, where remote and local computer systems are linked through a network (either hardwired or wireless data links or a combination of both hardwired and wireless) to perform tasks (e.g. Cloud computing, cloud services, and the like. Program modules can be found in both local and distant memory storage devices when operating in a distributed system environment.

Computer systems can perform many functions, as we have seen. One of these functions is the ability to view passthrough content with a head-mounted device. FIG. 2 will be the focus of your attention. 2 illustrates an example of a virtual environment.

“Passthrough Visualizations”

“FIG. 2. This is a view of a virtual scene 200 from the user’s point of view. The virtual scene 200 contains different virtual content. The virtual scene 200 contains a world content 215 and a news content 215 respectively. The virtual scene 200 can include additional content beyond what is shown in FIG. 2.”

The virtual-reality HMD allows the user to see a portion 200 of the virtual scene 200. This is possible based on their field of vision 205. The display area of the HMD may influence the field of view 205. This is because the HMD has a limited display area. Although the virtual scene 200 may seem large, only a portion of it 200 is available at any given time. The orientation of the user will determine which portions are viewable. FIG. 2. A portion of the world content210 is out of the user’s view 205. Additional content can be viewed if the user moves their head to the right (e.g. new sections of the world content 215). While additional content will be displayed, existing content, such as the news content 215, will not be visible.

“As we have discussed, it might be beneficial to allow the user see their real-world environment through HMD without having the HMD removed. FIGS. FIGS. 3A and 3B illustrate such a situation.

“In particular, FIG. 3A shows an example scenario 300 where a user is wearing the head-mounted device305. The head-mounted device (305) includes a HMD that is used for projecting virtual content. The head-mounted device305 can be used to immerse the user in a virtual-reality scene. The user might want to see the real-world environment while in this scene. The scenario 300 shows that the user’s real world environment includes one or more objects 310 (e.g. the vase and table). These real-world objects 310 cannot be seen by the user because the HMD 305 blocks a view of real-world. These real-world objects 310 can be visualized on the HMD using the passthrough visualization methods discussed herein. FIG. 3B illustrates such a passthrough visualization. 3B.”

“Specifically, FIG. 3B illustrates a scenario 315 in which visualizations 320 are rendered on the HMD of the user. These visualizations 320 are called “passthrough visualizations”. Because they depict the real-world environment and are rendered on the HMD, they can be called “passthrough visualizations”. The user can view the real-world environment, as shown in the image.

“Now, let us focus our attention on FIG. 4. This is an example of a computer system 400 that can generate a passthrough visualization. It is important to note that computer system 400 contains all the capabilities, functionalities and features that were described in connection with computer system 100. 1.”

Computer system 400, as illustrated, includes a distortion component 410 and a distortion component 415. A depth map component 420 is also included. The reprojection component 425 is included. These components work together to reconstruct the perspective captured by the camera image. The captured perspective is then matched with the perspective of the user. It is worth noting that the interpupil distance of the user determines the perspective of the user. We will provide more information on each of these components later. FIG. 5.”

“FIG. FIG. 5 illustrates a head-mounted device 500 which can generate a virtual reality scene. The head-mounted 500 device is an analogous to FIG. 3. However, in this case, the head-mounted 500 has at least two cameras 505. Two cameras 505 form a stereo (i.e. Two cameras 505 form a stereo (i.e. Head tracking system.”

An inside-out head tracker system monitors the device’s position relative to its environment. This feat is achieved by using tracking cameras mounted on the device and pointed away from it. An outside-in tracking scheme uses cameras that are mounted in the surrounding environment and pointed towards the device. This is how inside-out head trackers are distinguished from outside head trackers.

“Here are the two cameras 505 mounted on the object to be tracked (i.e. The cameras 505 are mounted on the object being tracked (i.e. the head-mounted devices) and are oriented away. The two cameras 505 form part of an inside-out system for head tracking.

“Using the two cameras 505, the head mounted device 500 can interpolate its position relative to the surrounding environment. FIG. 6 provides more information about these cameras. 6.”

“FIG. 6 shows an abstract view showing an inside-out head tracker system. A head-mounted device 600 has a stereo camera pair (i.e. The left camera 605 is shown and the right camera 615. The left camera 605 is shown with a?field-of-view? 610. You will appreciate the camera’s “field of view”. The camera’s field of view is the area that the lens can capture and is then used to create a camera image. In some cases, the left camera 605 may be a wide angle camera. For example, the field of vision 610 could be wide-angle.

The right camera 615 has the same field of view as the left camera 605. In some cases, the right camera 615 is also a wide-angle camera. FIG. FIG. 6. The field of view610 and field of view620 both include an overlap region (i.e. Overlapped FOV 630 The left camera 605 is able to record the environment while the right camera 615 can do the same. The embodiments can also perform depth measurements using the stereo camera pair, thanks to the FOV 630 overlap. The left camera 605 is used, and the right camera 615. These recordings allow the head-mounted device 600 to locate itself within the environment. Additionally, these head-tracking/passthrough cameras (i.e. Left camera 605 and Right camera 615 may be wide-angle view cameras with overlaps with the field of view of a head-mounted device.

The left camera 605 (left) and right camera 615 (right) are placed apart at a preselected distance of 625 and are angled away. This allows for maximum capture of the surrounding environment. The distance 625 can be any distance. Usually, distance 625 is at least 7 cm (cm). You will appreciate that distances 625 can be longer than 7 cm. (or less than 7 cm). The distance between stereo camera pairs (i.e. The camera baseline is the distance between stereo camera pair (i.e. left camera 605 and right cam 615) Distance 625) at least 7 cm

“Here it is worth noting that most humans have an interpupil spacing of between 55 and 69 millimeters (mm). This variation in range can cause a passthrough visualization to be inaccurately generated. The visualization’s depths may not be comparable to what the user would see if they were actually present in real life. It will be appreciated that these incorrect depth determinations/calculations will result not only in false/inaccurate images but also in jarring and/or unpleasant experiences for the user. These problems will be avoided if the user follows the disclosure’s principles.

“Now, your attention will be directed towards FIG. 7. It is important to note that FIG. FIG. 6 features are repeated in FIG. 7 are repeated in FIG.

“Here, both the left and right cameras have a field view. The cameras now have an additional feature, which is a ‘center-line perspective.’ The center-line perspective 705 is on the left and 710 on the right. It was previously mentioned that the ‘field of view? The area that the camera can capture with its lens, and which is then included in an image. The corresponding “center-line perspective”? The camera’s most central area of view. A ‘center-line perspective’ is a different way to put it. The direction in which the camera is aiming. The range of the center line perspective can vary in some cases. It could be between 0 to 35 degrees, such as 0, 5, 10, 15, 20, 25, 30, or 35 degrees, or any other value between. In some embodiments, head-tracking cameras, such as the left camera 605 or the right camera 615 in FIG. To better match the FOV of a human, 6 are tilted down from the horizon. This downward tilt can be as low as 0 degrees to??45 degrees in some cases (e.g.,?5,?10,?15,?20,?25,?30,?35,?40 or??45 degrees).

“FIG. “FIG. The two center-line perspectives 705 and 710 are therefore angled in relation to each other. The angle 715 is an example of this non-parallel alignment. You will see that the angles 715, 720 could be any angle sufficient for the center-line perspectives not to have a parallel alignment. The angles 715 or 720 could be either above or below 90 degrees, but not the same as 90 degrees. Other times, the angles 715 or 720 could be within the range mentioned earlier (i.e. “0 to 35 degrees.

“It is worth noting that this ‘non-parallel alignment? This is advantageous for inside out tracking systems as it allows the cameras more area to be captured and improves tracking capabilities. The inside-out tracking system will be able to determine the position of objects in greater detail if there is more area captured. Passthrough systems are affected by non-parallel alignments. These distortions can cause problems in the visualization of passthrough images.

“It is worth noting that the current embodiments repurpose existing within-out tracking system hardware in order to generate a passthrough visualization. Because some inside-out tracking system cameras are angled in the way shown in FIG. 7), the current embodiments are able to operate even when such configurations exist (i.e. Anangled cameras. It will be apparent that the current inventions enhance the technology through repurposing or rather, dual purposing. Existing hardware. The current implementations are significantly lighter in weight and cost than virtual-reality system.

“As mentioned earlier, when left and right cameras serve dual purposes so that they perform passthrough functionalities in addition to normal inside out tracking operations), certain optimizations are required. These optimizations are performed by the disclosed embodiments, which advance technology.

FIG. 8 shows the first type of optimization. 8 is used to correct the camera’s distortions.

“In particular, FIG. FIG. 8 illustrates that a camera (i.e. The camera distortion 800 is visible on the left camera. Camera distortion 800 can sometimes be caused by different characteristics of the lens. A lens can have a convex, concave, or other distorting shapes, such as a wide-angle lens. These distortions can cause camera images to have different distortions. The lens shape can cause curved lines where straight lines should be produced, or straight lines where straight lines should be produced. There are two main types of lens distortions: a barrel distortion or a pincushion distortion.

A barrel distortion is when straight lines bend outward from the center of an image. Wide-angle cameras lenses are often used to create barrel distortion. This distortion is magnified when objects are placed too close to the lens. The lens can cause straight lines of the object to appear bent when it is placed too close to the wide-angle lens. These distortions can severely affect the object’s depth when it is viewed through a passthrough visualization.

A pincushion distortion is a distortion that straight lines are pulled inwardly at the center of an image. This is different from a barrel distortion. Images may appear thinner due to pincushion distortions. To allow objects to be perceived accurately in passthrough visualizations, both barrel distortions and pincushion distortions must be corrected.

Flare (i.e. Ghosts are undesired light reflections captured in an image. A repeated number of undesired reflections in an image), ghosts (i.e. light differences between the center and edge lines) chromatic aberrations Color shifts that look like a prism light effect, chromatic aberrations (i.e. edge blurring, and astigmatisms Linear or elliptical shifts in an image. Depending on the lens used, there may be one or more of these distortions in an image.

Other types of camera distortions could relate to camera shutter speed, resolution brightness abilities, intensity capabilities, exposure properties, and/or shutter speed. Although the above description only covered a small number of possible distortions, you will see that there may be other distortions. At a high level, cameras may exhibit one or more of the 800 camera distortions.

“The embodiments can be used to correct for one or more camera distortions 800. FIG. 8 illustrates one example. 8 shows how the embodiments can correct camera distortions 805. 8. The camera distortion corrections 805 may include optimizations for barrel distortion, pincushion distorsion, flare, ghosts and coma aberrations. Although the above description only covered a few possible corrections, the present embodiments can correct any type 800 of camera distortion. The present embodiments can perform one or more camera distortion corrections 805 to create an image that accurately represents the real-world environment.

FIG. 9 will be the next focus after we have described one type optimization, namely camera distortion correction. 9 illustrates an alternative type of optimization, the epipolar transform.

“As we have discussed, the center-line perspectives between the stereo cameras might not be in parallel. Indeed, FIG. FIG. 9 illustrates such a scenario. This non-parallel alignment can cause distortions when the stereo cameras pair are used to create a passthrough visualization. For clarity, a passthrough visualization that is accurate (i.e. To clarify, a passthrough visualization (i.e. one that is free from distortions) should have a center-line perspective that parallels the user’s left perspective. To achieve accurate passthrough visualizations, the center line perspective of a right passthrough visualization must parallel the center-line perspective for the user’s right.

The embodiments apply epipolar transforms 905 on the images taken by the right and left cameras to correct these distortions. The epipolar transform 905 adjusts/re-aligns a left-side image so that its center-line perspective is parallel to that of an image taken by the right camera. Such an alteration/re-alignment is shown in FIG. 9.”

“The epipolar transforms 905 aligns/aligns the center line perspective 915 in an image taken by the left camera so it is parallel to the centerline perspective 920 in an image taken by the right camera. The right angle 925 illustrates this parallel nature. Two new images are generated by the epipolar transforms 905. Two new images are created by the epipolar transforms 905. These images are transformed to have parallel center-line perspectives.

“To perform the epipolar transformations 905, the embodiments perform one, two, or three rotational transforms and/or translation transforms. These are three types of two-dimensional transforms that alter the canvas bitmap to generate a new image. These transforms include translation and scaling transformations. They are all well-known in the art, so they will not be covered in detail in this disclosure.

“After the camera distortion corrections 805 in FIG. 8 and the epipolar transformations 905 (FIG. 8 and the epipolar transforms 905 of FIG. 10.”

“In particular, FIG. The resulting depth map 1000 is shown in FIG. 10. The depth map 1000 is composed of multiple three-dimensional coordinates 1005, each coordinate representing a single pixel within a number of pixels that make up a particular image. This particular image is created by combining (1) a left image with both a camera distortion and an epipolar transformation and (2) a right picture that has been subject to both a camera correction and an Epipolar transform. The depth map 1000 shows distances between stereo cameras and objects in the environment.

It will be appreciated that the depthmap 1000 can be used as a disparity estimation. Image depth can be calculated by analysing and estimating the difference between two images. This is similar to how a human perceives depth. The depth map can include pixel depth coordinates or depth transform data that are used to determine the relative depth of each individual pixel in passthrough visualizations. The depth map may be a partial depth map, which does not include all pixels but a limited number of pixel depth data.

“The embodiments calculate depth by computing the disparity, i.e. The observed displacement) for the corresponding pixels in the images. Two sources of observation (i.e. Two observing sources (i.e. cameras) are involved. This is similar to the scenario where a user uses both eyes in order to see the displacement. A disparity value can then be calculated for every pixel of an image. Let’s say that a person was to examine her finger with both eyes. The person can observe displacements in her finger’s location if she closes one eye and looks at the finger with the other. The observed displacement will be large if the finger is close to the eyes. However, if the finger is too far from the eyes of the person, the observed displacement will be small. The observed displacement is therefore proportional to the distance from an observing source (e.g. an eye). The embodiments can then calculate the depth of each pixel by using the difference between the offset cameras. A three-dimensional model, also known as the depth map 1000, can be generated. The depth map 1000 can be used to generate the three-dimensional coordinates of each pixel or subset thereof in an image.

“FIG. FIG. 11 is another example of FIG. 10 (because FIG. 10 (because FIG. 10 are similar to FIG. FIG. FIG. 11 illustrates various three-dimensional coordinates 1100 which are included in depth maps. The depth map’s three dimensional coordinates, as mentioned above, at most depict a depth value for an image’s pixels. Some of the individual three-dimensional coordinates 1100 include example coordinate values of 1, 2, 4, 8, 9, and 9. These values should be used only for illustration and not considered to be actual values. FIG. 11 is an illustration of this situation. FIG. 11 is a simple example, and should not be taken as limiting the possibilities.

Some embodiments reduce the size of the left and right images in order to decrease the computing resource required for creating the depth maps. Some embodiments filter the depth map to remove certain three-dimensional coordinate values. The embodiments create a smooth? by downscaling images and filtering depth maps. By performing the temporal smoothing actions filtering and downscaling, the embodiments create a depth map.

“Downscaling left and right images can be done at any time before the depth map is created. The downscaling can occur, for example, immediately after raw images of an environment are taken by the cameras. The downscaling could also occur after corrections for camera distortions have been made to the images. Further, downscaling can occur after epipolar transformations have been applied to the images. The downscaling results in images with a lower resolution, regardless of the time. This lower resolution means that less computing resources are needed to create or use the depth maps. The depth map is created by an image that is a combination of both the left and right images.

“In addition, downscaling may be used to filter the depth map to reduce noise. Indeed, FIG. FIG. 12 illustrates an example of a depth map filtering scenario.

“As shown at FIG. 12 have been removed some three-dimensional coordinates. Some three-dimensional coordinates 1200, for example, are those with a value of??8 or?9?. For example, some three-dimensional coordinates 1200 (e.g. those with a value of?8? oder?9?)) have been removed from the depth map. have been removed from depth map. A smooth? depth map is created by the filtering (and/or downscaling). Because it is less noisy, depth maps are created.

“Noise in depth maps can negatively impact user experience. Noise can be caused by a large difference in the coordinate values of neighboring pixels. For example, the three-dimensional coordinates 1110 include coordinates with a value of 9. That are located adjacent to coordinates with a value of 1. The large disparity between 9 and 1 will be visible as noise in the passthrough visualizations. You will see that the values 9 & 1 (as well as the difference/disparity 9 & 1) are only examples and should not be considered to be limiting.

It is therefore a good idea to remove any noise. Flickering is a phenomenon that occurs when noise is present in the depth maps. Flickering is when adjacent (i.e. Flickering occurs when pixels that are adjacent (i.e. This causes pixels (which represent objects in passthrough visualizations) to have very different depths. The objects then ‘flicker. These differences in depth can cause obvious distortions to the quality of passthrough visualizations. The embodiments remove neighboring coordinates with similar depth values to increase the quality and realisability of passthrough visualizations. ?Similar,? This connotation means that neighboring pixels have coordinate values within a certain threshold value of one another. You will also notice that the depth map may filter out these coordinates and insert new coordinates in their place. These new coordinates would meet the threshold requirement. Some embodiments substitute the filtered coordinates by selected coordinates that meet the threshold requirement.

“An additional example would be useful. Some of the three-dimensional coordinates 11100 in FIG. 11 have values of ‘1,? ?2,? ?3,? ?4,? ?8,? ?8,? Some of the ‘9?? coordinates are also available. Some of the?9? coordinates are also adjacent to?1?? coordinates, ?2? coordinates, ?3? coordinates, and/or ?4? coordinates. Some of the?8? coordinates are also available. Coordinates are placed in similar locations. Because of the large depth difference between a?1 and a?9?, coordinates are placed in similar situations. A?1? coordinate is different from a??9?? The resulting passthrough images will show depth flickering (i.e. An object will appear to have multiple clearly different depths. This is a problem. As shown in FIG. FIG. 12.12 shows the?8? coordinates and the?9????????????????????????????????? coordinates. 12, the?8? coordinates and?9? Coordinates and the?9? Lower value coordinates that do not meet the threshold for neighboring coordinates have been removed from the depth map. These coordinates were removed from the depth map because they were too close to coordinates with lower depth values. In certain cases, embodiments might insert new coordinates after filtering. These new coordinates are selected to satisfy the threshold requirement. A smooth depth map will be generated.

“Some embodiments smooth the depth map by analyzing it to identify a grouping of neighboring coordinates that are associated with the corresponding pixels. Once this group of neighboring coordinates has been identified, it is possible to determine if all the coordinates within that group fall within a certain ‘depth threshold value. or standard deviation. If any of the coordinates do not fall within this threshold, they are removed and replaced with new values. The embodiments eliminate noise and prevent flickering. The depth threshold is predetermined. It will be obvious. It can be adjusted or configured in some cases.

After generating the depth maps, the left and right images are reprojected to align the center-line perspectives of the left and right images with the user?s left pupils and the center-line perspective for the right image with the user?s right pupils. FIG. 13 illustrates this reprojection operation. 13.”

“In particular, FIG. FIG. 13 illustrates a reprojection operation 1300. Before this reprojection operation 1300, several processes were used: (1) determining the interpupil distance between the user and the surrounding environment; (2) taking left and/or right camera images of that environment to capture left and/or right camera images. (3) Applying camera distortion corrections on the raw camera images to correct left or right images. (4) Applying epipolar transforms (which may include smoothing) to the corrected left/right images. (5) Generating a depth map by combining left and/or right images. Before the depth map generation, downsampling can also be used.

“The camera reprojection is dependent on the measurement of the user’s IPD distance. This will be explained later. The following methods can be used to measure the user’s IPD distance: direct observation of their eyes using an eye-facing camera; observation of glints off an LED array off their eyes; mechanical estimation due to how the display is placed relative to the user’s head (i.e. A mechanical sensor that determines the position of the lens and display on the user’s head.

“Although there have been many optimizations and corrections, the left and right images that were transformed are not yet ready for display as passthrough images. It is worth reiterating the earlier discussion. It was noted that stereo cameras have a baseline distance of at least 7cm. This means that the left camera must be at least 7cm from the right camera. As mentioned, interpupil distances for most people are between 55 and 69 millimeters. The stereo cameras’ baseline (i.e. The distance between the left and right cameras is larger than that of human eyes (i.e. The interpupil distance means that the images will look blurred to the user even after performing the five operations mentioned above. It is therefore necessary to “reproject?” the images. The images must reflect the perspective of the user. This perspective is affected by interpupil distance. The embodiments project the left and right images.

A reprojection operation is basically altering an image to make it appear as though it was taken by a camera in a different location. FIG. 13 illustrates this action. 13.”

“FIG. 13 shows a left camera 1305A. To correct for camera distortion, optimizations were done to the images taken by this left camera 1305A. The images were also subject to epipolar transformations. The left camera 1305A is actually angled inward, as shown by FIG. 605, the left camera 605 and FIG. 6), the epipolar transformations applied to the images of the left cam 1305A caused the center-line perspective for the left camera’s left camera to be parallel to that of the right image’s.

In other words, the epipolar transformations made the images of the left camera 1305A appear as though they were taken by a parallel camera to the right. To reflect the fact that the left camera1305A has already applied epipolar transforms to images taken by it, the left camera1305A is shown as a dashed object. The left camera 1305A, in fact, is angled outward just like FIG. 605’s left camera 605. 6.”

The images from the left camera 1305A were altered by performing a reprojection operation 1300. The images are modified so that the user can see that the camera is actually at a different location. This is the difference between pupil location 1315A or 1315B. For clarity, pupil location 1315A refers to the location of the left pupil of the user. While pupil location 1315B refers to the location of the right pupil of the user. The user’s interpupil Distance (i.e. Distance between the pupil location1315A and 1315B. The images taken by the left camera 1305A have been altered to appear as if they were actually taken by a camera (i.e. The?simulated? the?simulated,? left camera 1310) was located close to (i.e. A predetermined distance from or rather in front (or both) of pupil location 1315A.

“In some cases, the reprojected right camera 1310 aligns with the center-line perspective for the user’s left pupils (similar processes are used for the right camera or right pupil). The reprojection operation 1300 modifies a camera’s images to appear as though they were taken by another camera. The reprojection operation 1300 corrects depth disparities caused by differences in distance between users’ pupils. Similar reprojection operations can be performed with the right camera 1305B. Each dimension point (i.e. Every dimension point (i.e. pixel) in the transformed left and transformed right images is on the same horizontal scale. This is because all pixels in the transform left image are on the same horizontal scale that all pixels in the transform right image. This allows for the creation of a depth map by searching both the transformed left and transformed right images to find the corresponding pixels. Because the pixels are all on the same horizontal scale, the search can only be performed horizontally (i.e. A one-dimensional search. Because it is a one-dimensional search, it takes very little computing resources and can be done quickly.

“As shown at FIG. 13 It is recommended to choose a location for the new?reprojected? left camera 1310 (and corresponding reprojected camera right, which is shown but has not been labeled). These locations are chosen based on the location of the pupil. The disclosure will present a set of equations that can be used to locate the?reprojected’ location. cameras. The cameras aren’t actually being moved. This will be obvious. New images are created instead in a way that makes the images look as though they were taken by the camera at the new location described above.

“The first equation below shows a relationship between left camera 1305A and reprojected right camera 1310, as well as the right camera 1305B and left pupil location 1315A. [0,1]”

“Where ??? “Where????” is the ratio between (1) left camera 1305A, the left pupil location 1315A, and (2) left camera 1305A or right camera 1305B. If????, If?????? is?0?,? Then the reprojected right camera 1310 will be placed at the same spot as the left cam 1305A. If???? is?1,? then the reprojected left camera 1310 will be positioned at the same location as the left camera 1305A. If?????? is?1, then Then the left camera 1310 will project at the same spot as the right camera 1305B. Accordingly, ??? Accordingly,??? Later, ??? Later,?????? will be used for the generation of the resulting location to the reprojected right camera 1310 (i.e. The resulting location for the reprojected left camera 1310 (i.e.,.) is used to generate an image.

The second equation (shown below) is used to calculate the location of the reprojected right camera 1310. The camera images undergo various operations. These operations change the image’s center-line perspective. For example, when the first image was captured, the cameras were oriented at different angles to each other. The result was that the center-line perspectives did not align with each other. To generate a passthrough visualization that is user-friendly, it is necessary to change the center-line perspective so that they match the user’s perspective. This is determined by the pupil locations and distances. The disclosure refers to the “reprojected location”. Cameras (or camera images), are not being moved. Instead, image data is transformed to appear as though it was captured at a different place than it actually is. The following equation is used for selecting the location of the resulting projected camera (or camera picture). *C L +? *C R”

“Where Csyn represents the resulting location of the reprojected right camera 1310, CL indicates the actual position for the left camera 1305A and CR indicates the actual position for the right camera 1305B. ? ???? was previously defined. This equation determines the location of the reprojected cameras.

Once the location has been determined, the embodiments then use the depth map for the left and right passthrough visualizations. The depth map is generated by identifying the pixels that correspond to the left and right transformations. After the corresponding pixels have been identified in both images, the embodiments then calculate the displacement between the coordinates in the left and right images. This is called a “disparity”. This displacement is also known as a?disparity? The?d? represents the?ul? in the equation below. The?ul? The?ul?? and?ur? variables are related. Variables represent the x coordinates (i.e. Horizontal coordinates) for each pixel of the left-transformed image and the corresponding pixel of the right-transformed image. Accordingly, ?d? ?d? is the difference between the corresponding pixels of the two images in x-coordinates (i.e. The transformed left image and transformed right images. In other words, the computed disparity at (ul) is d. ?d? ?d? represents individual portions of the depthmap).

The fourth equation is shown below. This allows you to derive the left passthrough visualization from the transformed left picture. The following equation is an example of a depth map that can be used to generate individual portions of the left-passthrough visualization. l +? *d”

“As shown above, ul is a pixel coordinate in the left transformed image and u?l the pixel coordinate in the left passthrough visualization. The reprojected picture.

“Now, we will be focusing our attention on FIG. 14 provides a high-level overview and description of the principles discussed herein.

Summary for “IPD correction, reprojection for accurate mixed-reality object placement”