Determine the file encoding of a file – CSV file with French accents or other exotic characters that you’re trying to import in Powershell


Bernie and I had an issue today trying to import a CSV file using Import-CSV in Powershell V2.0, as the French accents and some dashes were not imported correctly, and then we couldn’t use some of the information in the CSV to query Exchange or AD objects with other Powershell commandlets.

So it appears that this depends on the encoding of the CSV file and how Powershell’s Import-CSV is handling it.

Using Powershell v3.0, I was able to Import-CSV the CSV file with accents and then Export filtered results in a file, keeping these special characters in Powershell and in the final filtered file, because “Import-CSV” in Powershell v3.0 has the “-Encoding” parameter, which is missing in Powershell v1 and v2. So in Powershell V3, I saved my CSV file using Notepad and specifying “UTF-8” encoding. Then I did an Import-CSV using the “-Encoding UTF8” parameter (remember Powershell 3.0 only).

image

That worked ! I kept my French accents for the names in the CSV file.

But Powershell V1 and V2 Import-CSV does not have this “-Encoding” parameter. So Bernie and I had to find in which encoding we had to save our CSV file so that the Import-CSV was keeping the initial characters, so that we can handle the objects to do some stuff with them.

So we tried many, and found that saving our CSV file as “Unicode” with Notepad worked fine, and saved us lot of time for the future.

Note that by default Excel saves CSV files in the “ANSI” encoding. In my case, Powershell then imported the French accent and some other exotic characters with a “?” sign.

So the steps if you are using Excel to build your CSV are:

1- Save your Excel CSV as “.CSV” file

2- Open the .CSV again using NOTEPAD

3- Save again with NOTEPAD using the “Unicode” encoding

image

… and you’re good to Import-CSV in Powershell v1 and v2, and even v3 without needing to specify the –Encoding parameter.

Below is a nice Powershell script to enable you to check which encoding is your CSV file (or any other file but I didn’t test it for other ones):

http://poshcode.org/2059

Get-FileEncoding by Chad Miller 3 years ago
View followups from
JasonMArcher, RyanFisher, Enter your zip code here and Billy | embed code: <script type="text/javascript" src="http://PoshCode.org/embed/2059"></script>download | copy to clipboard | new post

Pasting here for my convenience – please check for the original one and check for updates on the above original location.

<#

.SYNOPSIS

Gets file encoding.

.DESCRIPTION

The Get-FileEncoding function determines encoding by looking at Byte Order Mark (BOM).

.EXAMPLE

Get-ChildItem  *.ps1 | select FullName, @{n='Encoding';e={Get-FileEncoding $_.FullName}} | where {$_.Encoding -ne 'ASCII'}

This command gets ps1 files in current directory where encoding is not ASCII

.EXAMPLE

Get-ChildItem  *.ps1 | select FullName, @{n='Encoding';e={Get-FileEncoding $_.FullName}} | where {$_.Encoding -ne 'ASCII'} | foreach {(get-content $_.FullName) | set-content $_.FullName -Encoding ASCII}

Same as previous example but fixes encoding using set-content

#>

function Get-FileEncoding

{

    [CmdletBinding()] Param (

     [Parameter(Mandatory = $True, ValueFromPipelineByPropertyName = $True)] [string]$Path

    )

 

    [byte[]]$byte = get-content -Encoding byte -ReadCount 4 -TotalCount 4 -Path $Path

 

    if ( $byte[0] -eq 0xef -and $byte[1] -eq 0xbb -and $byte[2] -eq 0xbf )

    { Write-Output 'UTF8' }

    elseif ($byte[0] -eq 0xfe -and $byte[1] -eq 0xff)

    { Write-Output 'Unicode' }

    elseif ($byte[0] -eq 0 -and $byte[1] -eq 0 -and $byte[2] -eq 0xfe -and $byte[3] -eq 0xff)

    { Write-Output 'UTF32' }

    elseif ($byte[0] -eq 0x2b -and $byte[1] -eq 0x2f -and $byte[2] -eq 0x76)

    { Write-Output 'UTF7'}

    else

    { Write-Output 'ASCII' }

}

 

 


Comments (0)

Skip to main content