Metadata in Your Files May Reveal Hidden Personal Information You Didn’t Know About


Did you know that many files contain hidden information that may include personal data? The hidden information is called “metadata” and it may reveal more about you than you realize. Here is what metadata is and how to edit or remove it.

In addition to the actual content of a file, there is information about the contents of the file – data about the data. This type of information is called metadata. Some of it, such as file size and date of creation, is familiar but there is much else that may be stored as metadata. For example, photographers are probably familiar with the fact that an image file can have information such as the name of the photographer, when and where a picture was taken, and details of the camera used. Some kinds of metadata are hidden and you have to look for it to see it. Here are some ways to find metadata.

Viewing and editing metadata from Windows (File) Explorer

Some, but not all metadata can be edited using Explorer. Right-click a file, choose “Properties” in the context menu, and click the “Details” tab. A window with various kinds of metadata will open. The type of metadata will depend on the file type and what version of Windows you have. Some of the metadata may be editable and can be deleted if you choose. For Windows 7 and 8.x, the Properties-Details window has a link at the bottom, “Remove Properties and Personal Information”. Click this link and a dialog box will open where you can choose any personal information to delete. For many file types, of course, there may be nothing that is editable. The procedure is described at this Microsoft Technet article.

In Windows XP, it is possible to add a tag or comments to any file’s metadata. See this article for some details. Starting with Vista, Microsoft limited the availability of editable metadata to certain types of files such as images, music, and Microsoft Office files.

Metadata in Microsoft Office documents and spreadsheets

Office files are of particular interest since they can contain considerable amounts of personal information. Your name, your company, name of your computer, your collaborators, revisions, and much else may be included. Much of this is buried in the file and you may not be aware of it.

Various versions of Office have a “Document Inspector” that allows removal of personal data from Office documents and spreadsheets. The path depends somewhat on which version you have but, in Office 2010/2013, Document Inspector is opened from: File->Info->Check for Issues->Inspect Document. When the Document Inspector dialog opens, follow the instructions on how to delete personal data. You can also check out the detailed discussion of removing personal data in Office documents that is given at this Microsoft page.

Metadata in PDF files

PDF files can contain metadata that is not discoverable by using Windows Explorer. Instead, a PDF reader is required. Adobe Reader allows you to view and edit certain metadata such as the author’s name. Other PDF readers will have a way to view metadata but may or may not provide for editing or deleting it. For example, Sumatra only allows you to view metadata. The general procedure in PDF readers for revealing metadata is to open the menu File-Properties. Editing PDF metadata with Adobe Reader is described at this link.

And there you have it - how to keep your files from revealing more than you want.

Ok, so mp3 etc has IDv1 / v2... is this the same 'metadata structure' on wave, flac? what about aac, caf? ie is it a universal set of available 'records'?

I can't add a comment to certain filetypes (unless I use alternate data streams), but where's the info re which filetypes 'natively' support which 'tags'?

Presumably there's tagtypes, ie FS-tags (date created, modified....)

Does anyone know where the details (metadata available on ... types of filetypes?)

Let's say I want to make my own audio filetype... called .BOOM
is it a particular propertyhandler that needs to be registered ?
ie .mp3 has a persistent handler, {098f2470-bae0-11cd-b579-08002b30bfeb}, the 'null p.handler'

Where is the record&association between (for example) 'ID3-Friendly Audio FileTypes' & 'tagset' ?

.caf is core audio, ie apple audio, and has no tag options in windows... probs cause it's "not known", but apparently, it's a wrapper file-type...

CAF files serve as wrappers for a wide variety of audio data formats. The flexibility of the CAF file structure and the many types of metadata that can be recorded enable CAF files to be used with practically any type of audio data. Furthermore, CAF files can store any number of audio channels.

Support for many types of auxiliary data
In addition to audio data, CAF files can store text annotations, markers, channel layouts, and many other types of information that can help in the interpretation, analysis, or editing of the audio.

Support for data dependencies
Certain metadata in CAF files is linked to the audio data by an edit count value. You can use this value to determine when metadata has a dependency on the audio data and, furthermore, when the audio data has changed since the metadata was written.

CAF File Structure

CAF files begin with a file header, which identifies the file type and the CAF version, followed by a series of chunks. A chunk consists of a header, which defines the type of the chunk and indicates the size of its data section, followed by the chunk data. The nature and format of the data is specific to each type of chunk.

The only two chunk types required for every CAF file are the Audio Data chunk (which, as you might have guessed, contains the audio data) and the Audio Description chunk, which specifies the audio data format.

The Audio Description chunk must be the first chunk following the file header. The Audio Data chunk can appear anywhere else in the file, unless the size of its data section has not been determined. In that case, the size field in the Audio Data chunk header is set to -1 and the Audio Data chunk must come last in the file so that the end of the audio data chunk is the same as the end of the file. This placement allows you to determine the data section size when that information is not available in the size field.

Audio is stored in the Audio Data chunk as a sequential series of packets. An audio packet in a CAF file contains one or more frames of audio data.

CAF supports a wide range of other chunk types, which can be placed in any order in the file except first (reserved for the Audio Description chunk) or last (when the Audio Data chunk size field is set to -1). Some chunk types can be used more than once in a file. Some refer to—or are referred to by—chunks of other types.

So it's another MS-iceberg thing, where the registry only gives you the headlines (.mp3's are associated with mediamonkey and the options yadayada... nada on the 'deets on the ID3 shiz')

Is there a good freeware tool to remove all metadata from any files?

Don't forget to mention the EXIF metadata in photographs. It can identify your camera make/model/serial number. With smartphone pictures, it can contain the geolocation where the photo was taken. Most photo editor software offers save options that will rewrite your jpeg files without this information.

A fugitive executive (of an anti-virus software company, no less!) was tracked down in Central America when a news agency forgot to clean the metadata from a photo.

Metadata in photographs is mentioned in the second paragraph of the article.

This may be slightly off topic but remember, if distributing straight Office or Libre/Open Office WP files, all your deletions may still be in the file, due to the unlimited levels of undo.

I believe these programs have options to cleanse the files before distribution, but another option is to "Save As" the file to a previous version. For example, if you're using Office 2013, save the file as Office 2010. You'll often see the file size diminish by kilobytes to tens of kilobytes.

A very useful article for the slightly paranoid amongst us! For all my efforts to mask my activities, identity, and details about my life, there is always one more rock to look under. I will also check out Jojo's utility. My 5-star appreciation to you both.

Nirsoft's AlternateStreamView is another useful utility.

A good article Vic. A handy and easy-to-use tool I often like is BeCyPDFMetaEdit for editing metadata of pdf files. See Set Viewer Preferences and Metadata
Thanks, Jojo. That is a useful utility to know about.