pydicom.dataset.Dataset¶
-
class
pydicom.dataset.Dataset(*args, **kwargs)¶ Contains a collection (dictionary) of DICOM Data Elements.
Behaves like a
dict.Note
Datasetis only derived fromdictto make it work in a NumPyndarray. The parentdictclass is never called, as alldictmethods are overridden.Examples
Add an element to the
Dataset(for elements in the DICOM dictionary):>>> ds = Dataset() >>> ds.PatientName = "CITIZEN^Joan" >>> ds.add_new(0x00100020, 'LO', '12345') >>> ds[0x0010, 0x0030] = DataElement(0x00100030, 'DA', '20010101')
Add a sequence element to the
Dataset>>> ds.BeamSequence = [Dataset(), Dataset(), Dataset()] >>> ds.BeamSequence[0].Manufacturer = "Linac, co." >>> ds.BeamSequence[1].Manufacturer = "Linac and Sons, co." >>> ds.BeamSequence[2].Manufacturer = "Linac and Daughters, co."
Add private elements to the
Dataset>>> block = ds.private_block(0x0041, 'My Creator', create=True) >>> block.add_new(0x01, 'LO', '12345')
Updating and retrieving element values:
>>> ds.PatientName = "CITIZEN^Joan" >>> ds.PatientName 'CITIZEN^Joan' >>> ds.PatientName = "CITIZEN^John" >>> ds.PatientName 'CITIZEN^John'
Retrieving an element’s value from a Sequence:
>>> ds.BeamSequence[0].Manufacturer 'Linac, co.' >>> ds.BeamSequence[1].Manufacturer 'Linac and Sons, co.'
Accessing the
DataElementitems:>>> elem = ds['PatientName'] >>> elem (0010, 0010) Patient's Name PN: 'CITIZEN^John' >>> elem = ds[0x00100010] >>> elem (0010, 0010) Patient's Name PN: 'CITIZEN^John' >>> elem = ds.data_element('PatientName') >>> elem (0010, 0010) Patient's Name PN: 'CITIZEN^John'
Accessing a private
DataElementitem:>>> block = ds.private_block(0x0041, 'My Creator') >>> elem = block[0x01] >>> elem (0041, 1001) Private tag data LO: '12345' >>> elem.value '12345'
Alternatively:
>>> ds.get_private_item(0x0041, 0x01, 'My Creator').value '12345'
Deleting an element from the
Dataset>>> del ds.PatientID >>> del ds.BeamSequence[1].Manufacturer >>> del ds.BeamSequence[2]
Deleting a private element from the
Dataset>>> block = ds.private_block(0x0041, 'My Creator') >>> if 0x01 in block: ... del block[0x01]
Determining if an element is present in the
Dataset>>> 'PatientName' in ds True >>> 'PatientID' in ds False >>> (0x0010, 0x0030) in ds True >>> 'Manufacturer' in ds.BeamSequence[0] True
Iterating through the top level of a
Datasetonly (excluding Sequences):>>> for elem in ds: ... print(elem) (0010, 0010) Patient's Name PN: 'CITIZEN^John'
Iterating through the entire
Dataset(including Sequences):>>> for elem in ds.iterall(): ... print(elem) (0010, 0010) Patient's Name PN: 'CITIZEN^John'
Recursively iterate through a
Dataset(including Sequences):>>> def recurse(ds): ... for elem in ds: ... if elem.VR == 'SQ': ... [recurse(item) for item in elem] ... else: ... # Do something useful with each DataElement
Converting the
Datasetto and from JSON:>>> ds = Dataset() >>> ds.PatientName = "Some^Name" >>> jsonmodel = ds.to_json() >>> ds2 = Dataset() >>> ds2.from_json(jsonmodel) (0010, 0010) Patient's Name PN: u'Some^Name'
-
default_element_format¶ The default formatting for string display.
Type: str
-
default_sequence_element_format¶ The default formatting for string display of sequences.
Type: str
-
indent_chars¶ For string display, the characters used to indent nested Sequences. Default is
" ".Type: str
-
is_little_endian¶ Shall be set before writing with
write_like_original=False. TheDataset(excluding the pixel data) will be written using the given endianess.Type: bool
-
is_implicit_VR¶ Shall be set before writing with
write_like_original=False. TheDatasetwill be written using the transfer syntax with the given VR handling, e.g Little Endian Implicit VR ifTrue, and Little Endian Explicit VR or Big Endian Explicit VR (depending onDataset.is_little_endian) ifFalse.Type: bool
Methods
__init__(*args, **kwargs)Create a new Datasetinstance.add(data_element)Add an element to the Dataset.add_new(tag, VR, value)Create a new element and add it to the Dataset.clear()Delete all the elements from the Dataset.convert_pixel_data([handler_name])Convert pixel data to a numpy.ndarrayinternally.copy()data_element(name)Return the element corresponding to the element keyword name. decode()Apply character set decoding to the elements in the Dataset.decompress([handler_name])Decompresses Pixel Data and modifies the Datasetin-place.dir(*filters)Return an alphabetical list of element keywords in the Dataset.elements()Yield the top-level elements of the Dataset.ensure_file_meta()Create an empty Dataset.file_metaif none exists.fix_meta_info([enforce_standard])Ensure the file meta info exists and has the correct values for transfer syntax and media storage UIDs. formatted_lines([element_format, …])Iterate through the Datasetyielding formattedstrfor each element.from_json(json_dataset[, bulk_data_uri_handler])Add elements to the Datasetfrom DICOM JSON format.fromkeysCreate a new dictionary with keys from iterable and values set to value. get(key[, default])Simulate dict.get()to handle element tags and keywords.get_item(key)Return the raw data element if possible. get_private_item(group, element_offset, …)Return the data element for the given private tag group. group_dataset(group)Return a Datasetcontaining only elements of a certain group.items()Return the Datasetitems to simulatedict.items().iterall()Iterate through the Dataset, yielding all the elements.keys()Return the Datasetkeys to simulatedict.keys().overlay_array(group)Return the Overlay Data in group as a numpy.ndarray.pop(key, *args)Emulate dict.pop()with support for tags and keywords.popitem()Remove and return a (key, value) pair as a 2-tuple. private_block(group, private_creator[, create])Return the block for the given tag group and private_creator. private_creators(group)Return a list of private creator names in the given group. remove_private_tags()Remove all private elements from the Dataset.save_as(filename[, write_like_original])Write the Datasetto filename.set_original_encoding(is_implicit_vr, …)Set the values for the original transfer syntax and encoding. setdefault(key[, default])Emulate dict.setdefault()with support for tags and keywords.to_json([bulk_data_threshold, …])Return a JSON representation of the Dataset.to_json_dict([bulk_data_threshold, …])Return a dictionary representation of the Datasetconforming to the DICOM JSON Model as described in the DICOM Standard, Part 18, Annex F.top()Return a strrepresentation of the top level elements.trait_names()Return a listof valid names for auto-completion code.update(dictionary)Extend dict.update()to handle DICOM tags and keywords.values()Return the Datasetvalues to simulatedict.values().walk(callback[, recursive])Iterate through the Dataset'selements and run callback on each.Attributes
default_element_formatdefault_sequence_element_formatindent_charsis_original_encodingReturn Trueif the encoding to be used for writing is set and is the same as that used to originally encode theDataset.pixel_arrayReturn the pixel data as a numpy.ndarray.-
add(data_element)¶ Add an element to the
Dataset.Equivalent to
ds[data_element.tag] = data_elementParameters: data_element (dataelem.DataElement) – The DataElementto add.
-
add_new(tag, VR, value)¶ Create a new element and add it to the
Dataset.Parameters: - tag – The DICOM (group, element) tag in any form accepted by
Tag()such as[0x0010, 0x0010],(0x10, 0x10),0x00100010, etc. - VR (str) – The 2 character DICOM value representation (see DICOM Standard, Part 5, Section 6.2).
- value –
The value of the data element. One of the following:
- a single string or number
- a
listortuplewith all strings or all numbers - a multi-value string with backslash separator
- for a sequence element, an empty
listorlistofDataset
- tag – The DICOM (group, element) tag in any form accepted by
-
convert_pixel_data(handler_name='')¶ Convert pixel data to a
numpy.ndarrayinternally.Parameters: handler_name (str, optional) – The name of the pixel handler that shall be used to decode the data. Supported names are:
'gdcm','pillow','jpeg_ls','rle'and'numpy'. If not used (the default), a matching handler is used from the handlers configured inpixel_data_handlers.Returns: Converted pixel data is stored internally in the dataset.
Return type: None
Raises: ValueError– If handler_name is not a valid handler name.NotImplementedError– If the given handler or any handler, if none given, is unable to decompress pixel data with the current transfer syntaxRuntimeError– If the given handler, or the handler that has been selected if none given, is not available.
Notes
If the pixel data is in a compressed image format, the data is decompressed and any related data elements are changed accordingly.
-
data_element(name)¶ Return the element corresponding to the element keyword name.
Parameters: name (str) – A DICOM element keyword. Returns: For the given DICOM element keyword, return the corresponding DataElementif present,Noneotherwise.Return type: dataelem.DataElement or None
-
decode()¶ Apply character set decoding to the elements in the
Dataset.See DICOM Standard, Part 5, Section 6.1.1.
-
decompress(handler_name='')¶ Decompresses Pixel Data and modifies the
Datasetin-place.New in version 1.4: The handler_name keyword argument was added
If not a compressed transfer syntax, then pixel data is converted to a
numpy.ndarrayinternally, but not returned.If compressed pixel data, then is decompressed using an image handler, and internal state is updated appropriately:
Dataset.file_meta.TransferSyntaxUIDis updated to non-compressed formis_undefined_lengthisFalsefor the (7FE0,0010) Pixel Data element.
Changed in version 1.4: The handler_name keyword argument was added
Parameters: handler_name (str, optional) – The name of the pixel handler that shall be used to decode the data. Supported names are: 'gdcm','pillow','jpeg_ls','rle'and'numpy'. If not used (the default), a matching handler is used from the handlers configured inpixel_data_handlers.Returns: Return type: None Raises: NotImplementedError– If the pixel data was originally compressed but file is not Explicit VR Little Endian as required by the DICOM Standard.
-
dir(*filters)¶ Return an alphabetical list of element keywords in the
Dataset.Intended mainly for use in interactive Python sessions. Only lists the element keywords in the current level of the
Dataset(i.e. the contents of any sequence elements are ignored).Parameters: filters (str) – Zero or more string arguments to the function. Used for case-insensitive match to any part of the DICOM keyword. Returns: The matching element keywords in the dataset. If no filters are used then all element keywords are returned. Return type: list of str
-
elements()¶ Yield the top-level elements of the
Dataset.New in version 1.1.
Examples
>>> ds = Dataset() >>> for elem in ds.elements(): ... print(elem)
The elements are returned in the same way as in
Dataset.__getitem__().Yields: dataelem.DataElement or dataelem.RawDataElement – The unconverted elements sorted by increasing tag order.
-
ensure_file_meta()¶ Create an empty
Dataset.file_metaif none exists.New in version 1.2.
-
fix_meta_info(enforce_standard=True)¶ Ensure the file meta info exists and has the correct values for transfer syntax and media storage UIDs.
New in version 1.2.
Warning
The transfer syntax for
is_implicit_VR = Falseandis_little_endian = Trueis ambiguous and will therefore not be set.Parameters: enforce_standard (bool, optional) – If True, a check for incorrect and missing elements is performed (seevalidate_file_meta()).
-
formatted_lines(element_format='%(tag)s %(name)-35.35s %(VR)s: %(repval)s', sequence_element_format='%(tag)s %(name)-35.35s %(VR)s: %(repval)s', indent_format=None)¶ Iterate through the
Datasetyielding formattedstrfor each element.Parameters: - element_format (str) – The string format to use for non-sequence elements. Formatting uses
the attributes of
DataElement. Default is"%(tag)s %(name)-35.35s %(VR)s: %(repval)s". - sequence_element_format (str) – The string format to use for sequence elements. Formatting uses
the attributes of
DataElement. Default is"%(tag)s %(name)-35.35s %(VR)s: %(repval)s" - indent_format (str or None) – Placeholder for future functionality.
Yields: str – A string representation of an element.
- element_format (str) – The string format to use for non-sequence elements. Formatting uses
the attributes of
-
classmethod
from_json(json_dataset, bulk_data_uri_handler=None)¶ Add elements to the
Datasetfrom DICOM JSON format.New in version 1.3.
See the DICOM Standard, Part 18, Annex F.
Parameters: - json_dataset (dict or str) –
dictorstrrepresenting a DICOM Data Set formatted based on the DICOM JSON Model. - bulk_data_uri_handler (callable, optional) – Callable function that accepts the “BulkDataURI” of the JSON representation of a data element and returns the actual value of data element (retrieved via DICOMweb WADO-RS).
Returns: Return type: - json_dataset (dict or str) –
-
get(key, default=None)¶ Simulate
dict.get()to handle element tags and keywords.Parameters: - key (str or int or BaseTag) – The element keyword or tag or the class attribute name to get.
- default (obj or None, optional) – If the element or class attribute is not present, return
default (default
None).
Returns: - value – If key is the keyword for an element in the
Datasetthen return the element’s value. - dataelem.DataElement – If key is a tag for a element in the
Datasetthen return theDataElementinstance. - value – If key is a class attribute then return its value.
-
get_item(key)¶ Return the raw data element if possible.
It will be raw if the user has never accessed the value, or set their own value. Note if the data element is a deferred-read element, then it is read and converted before being returned.
Parameters: key – The DICOM (group, element) tag in any form accepted by Tag()such as[0x0010, 0x0010],(0x10, 0x10),0x00100010, etc. May also be aslicemade up of DICOM tags.Returns: The corresponding element. Return type: dataelem.DataElement
-
get_private_item(group, element_offset, private_creator)¶ Return the data element for the given private tag group.
New in version 1.3.
This is analogous to
Dataset.__getitem__(), but only for private tags. This allows to find the private tag for the correct private creator without the need to add the tag to the private dictionary first.Parameters: - group (int) – The private tag group where the item is located as a 32-bit int.
- element_offset (int) – The lower 16 bits (e.g. 2 hex numbers) of the element tag.
- private_creator (str) – The private creator for the tag. Must match the private creator for the tag to be returned.
Returns: The corresponding element.
Return type: Raises: ValueError– If group is not part of a private tag or private_creator is empty.KeyError– If the private creator tag is not found in the given group. If the private tag is not found.
-
group_dataset(group)¶ Return a
Datasetcontaining only elements of a certain group.Parameters: group (int) – The group part of a DICOM (group, element) tag. Returns: A Datasetcontaining elements of the group specified.Return type: Dataset
-
is_original_encoding¶ Return
Trueif the encoding to be used for writing is set and is the same as that used to originally encode theDataset.New in version 1.1.
This includes properties related to endianess, VR handling and the (0008,0005) Specific Character Set.
-
items()¶ Return the
Datasetitems to simulatedict.items().Returns: The top-level ( BaseTag,DataElement) items for theDataset.Return type: dict_items
-
iterall()¶ Iterate through the
Dataset, yielding all the elements.Unlike
Dataset.__iter__(), this does recurse into sequences, and so yields all elements as if the file were “flattened”.Yields: dataelem.DataElement
-
keys()¶ Return the
Datasetkeys to simulatedict.keys().Returns: The BaseTagof all the elements in theDataset.Return type: dict_keys
-
overlay_array(group)¶ Return the Overlay Data in group as a
numpy.ndarray.New in version 1.4.
Returns: The (group,3000) Overlay Data converted to a numpy.ndarray.Return type: numpy.ndarray
-
pixel_array¶ Return the pixel data as a
numpy.ndarray.Changed in version 1.4: Added support for Float Pixel Data and Double Float Pixel Data
Returns: The (7FE0,0008) Float Pixel Data, (7FE0,0009) Double Float Pixel Data or (7FE0,0010) Pixel Data converted to a numpy.ndarray.Return type: numpy.ndarray
-
pop(key, *args)¶ Emulate
dict.pop()with support for tags and keywords.Removes the element for key if it exists and returns it, otherwise returns a default value if given or raises
KeyError.Parameters: - key (int or str or 2-tuple) –
- If
tuple- the group and element number of the DICOM tag - If
int- the combined group/element number - If
str- the DICOM keyword of the tag
- If
- *args (zero or one argument) – Defines the behavior if no tag exists for key: if given,
it defines the return value, if not given,
KeyErroris raised
Returns: Return type: The element for key if it exists, or the default value if given.
Raises: KeyError– If the key is not a valid tag or keyword. If the tag does not exist and no default is given.- key (int or str or 2-tuple) –
-
popitem()¶ Remove and return a (key, value) pair as a 2-tuple.
Pairs are returned in LIFO (last-in, first-out) order. Raises KeyError if the dict is empty.
-
private_block(group, private_creator, create=False)¶ Return the block for the given tag group and private_creator.
New in version 1.3.
If create is
Trueand the private_creator does not exist, the private creator tag is added.Notes
We ignore the unrealistic case that no free block is available.
Parameters: - group (int) – The group of the private tag to be found as a 32-bit
int. Must be an odd number (e.g. a private group). - private_creator (str) – The private creator string associated with the tag.
- create (bool, optional) – If
Trueand private_creator does not exist, a new private creator tag is added at the next free block. IfFalse(the default) and private_creator does not exist,KeyErroris raised instead.
Returns: Element base for the given tag (the last 2 hex digits are always 0) as a 32-bit
int.Return type: int
Raises: ValueError– If group doesn’t belong to a private tag or private_creator is empty.KeyError– If the private creator tag is not found in the given group and the create parameter isFalse.
- group (int) – The group of the private tag to be found as a 32-bit
-
private_creators(group)¶ Return a list of private creator names in the given group.
New in version 1.3.
Examples
This can be used to check if a given private creator exists in the group of the dataset:
>>> ds = Dataset() >>> if 'My Creator' in ds.private_creators(0x0041): ... block = ds.private_block(0x0041, 'My Creator')
Parameters: group (int) – The private group as a 32-bit int. Must be an odd number.Returns: All private creator names for private blocks in the group. Return type: list of str Raises: ValueError– If group is not a private group.
Remove all private elements from the
Dataset.
-
save_as(filename, write_like_original=True)¶ Write the
Datasetto filename.Saving requires that the
Dataset.is_implicit_VRandDataset.is_little_endianattributes exist and are set appropriately. IfDataset.file_meta.TransferSyntaxUIDis present then it should be set to a consistent value to ensure conformance.Conformance with DICOM File Format
If write_like_original is
False, theDatasetwill be stored in the DICOM File Format. To do so requires that theDataset.file_metaattribute exists and contains aDatasetwith the required (Type 1) File Meta Information Group elements (seedcmwrite()andwrite_file_meta_info()for more information).If write_like_original is
Truethen theDatasetwill be written as is (after minimal validation checking) and may or may not contain all or parts of the File Meta Information (and hence may or may not be conformant with the DICOM File Format).Parameters: - filename (str or file-like) – Name of file or the file-like to write the new DICOM file to.
- write_like_original (bool, optional) –
If
True(default), preserves the following information from theDataset(and may result in a non-conformant file):- preamble – if the original file has no preamble then none will be written.
- file_meta – if the original file was missing any required File Meta Information Group elements then they will not be added or written. If (0002,0000) File Meta Information Group Length is present then it may have its value updated.
- seq.is_undefined_length – if original had delimiters, write them now too, instead of the more sensible length characters
- is_undefined_length_sequence_item – for datasets that belong to a sequence, write the undefined length delimiters if that is what the original had.
If
False, produces a file conformant with the DICOM File Format, with explicit lengths for all elements.
See also
pydicom.filewriter.write_dataset()- Write a
Datasetto a file. pydicom.filewriter.write_file_meta_info()- Write the File Meta Information Group elements to a file.
pydicom.filewriter.dcmwrite()- Write a DICOM file from a
FileDatasetinstance.
-
set_original_encoding(is_implicit_vr, is_little_endian, character_encoding)¶ Set the values for the original transfer syntax and encoding.
New in version 1.2.
Can be used for a
Datasetwith raw data elements to enable optimized writing (e.g. without decoding the data elements).
-
setdefault(key, default=None)¶ Emulate
dict.setdefault()with support for tags and keywords.Examples
>>> ds = Dataset() >>> elem = ds.setdefault((0x0010, 0x0010), "Test") >>> elem (0010, 0010) Patient's Name PN: 'Test' >>> elem.value 'Test' >>> elem = ds.setdefault('PatientSex', ... DataElement(0x00100040, 'CS', 'F')) >>> elem.value 'F'
Parameters: - key (int or str or 2-tuple) –
- If
tuple- the group and element number of the DICOM tag - If
int- the combined group/element number - If
str- the DICOM keyword of the tag
- If
- default (type, optional) – The default value that is inserted and returned if no data
element exists for the given key. If it is not of type
DataElement, one will be constructed instead for the given tag and default as value. This is only possible for known tags (e.g. tags found via the dictionary lookup).
Returns: The data element for key if it exists, or the default value if it is a
DataElementorNone, or aDataElementconstructed with default as value.Return type: type
Raises: KeyError– If the key is not a valid tag or keyword. If no tag exists for key, default is not aDataElementand notNone, and key is not a known DICOM tag.- key (int or str or 2-tuple) –
-
to_json(bulk_data_threshold=1024, bulk_data_element_handler=None, dump_handler=None)¶ Return a JSON representation of the
Dataset.New in version 1.3.
See the DICOM Standard, Part 18, Annex F.
Parameters: - bulk_data_threshold (int, optional) – Threshold for the length of a base64-encoded binary data element
above which the element should be considered bulk data and the
value provided as a URI rather than included inline (default:
1024). Ignored if no bulk data handler is given. - bulk_data_element_handler (callable, optional) – Callable function that accepts a bulk data element and returns a JSON representation of the data element (dictionary including the “vr” key and either the “InlineBinary” or the “BulkDataURI” key).
- dump_handler (callable, optional) –
Callable function that accepts a
dictand returns the serialized (dumped) JSON string (by default usesjson.dumps()).
Returns: Datasetserialized into a string based on the DICOM JSON Model.Return type: str
Examples
>>> def my_json_dumps(data): ... return json.dumps(data, indent=4, sort_keys=True) >>> ds.to_json(dump_handler=my_json_dumps)
- bulk_data_threshold (int, optional) – Threshold for the length of a base64-encoded binary data element
above which the element should be considered bulk data and the
value provided as a URI rather than included inline (default:
-
to_json_dict(bulk_data_threshold=1024, bulk_data_element_handler=None)¶ Return a dictionary representation of the
Datasetconforming to the DICOM JSON Model as described in the DICOM Standard, Part 18, Annex F.New in version 1.4.
Parameters: - bulk_data_threshold (int, optional) – Threshold for the length of a base64-encoded binary data element
above which the element should be considered bulk data and the
value provided as a URI rather than included inline (default:
1024). Ignored if no bulk data handler is given. - bulk_data_element_handler (callable, optional) – Callable function that accepts a bulk data element and returns a JSON representation of the data element (dictionary including the “vr” key and either the “InlineBinary” or the “BulkDataURI” key).
Returns: Datasetrepresentation based on the DICOM JSON Model.Return type: dict
- bulk_data_threshold (int, optional) – Threshold for the length of a base64-encoded binary data element
above which the element should be considered bulk data and the
value provided as a URI rather than included inline (default:
-
top()¶ Return a
strrepresentation of the top level elements.
-
trait_names()¶ Return a
listof valid names for auto-completion code.Used in IPython, so that data element names can be found and offered for autocompletion on the IPython command line.
-
update(dictionary)¶ Extend
dict.update()to handle DICOM tags and keywords.Parameters: dictionary (dict or Dataset) – The dictorDatasetto use when updating the current object.
-
values()¶ Return the
Datasetvalues to simulatedict.values().Returns: The DataElementsthat make up the values of theDataset.Return type: dict_keys
-
walk(callback, recursive=True)¶ Iterate through the
Dataset'selements and run callback on each.Visit all elements in the
Dataset, possibly recursing into sequences and their items. The callback function is called for eachDataElement(including elements with a VR of ‘SQ’). Can be used to perform an operation on certain types of elements.For example,
remove_private_tags()finds all elements with private tags and deletes them.The elements will be returned in order of increasing tag number within their current
Dataset.Parameters: - callback –
A callable function that takes two arguments:
- a
Dataset - a
DataElementbelonging to thatDataset
- a
- recursive (bool, optional) – Flag to indicate whether to recurse into sequences (default
True).
- callback –
-